About mathieu.d

mathieu.d · ‎11-03-2017

This issue just means that your shell action has exited with a error code (different from 0). If you want to know the reason then you need to add logging inside the shell script for knowing what happened. Be aware that the scipt execute localy on a data-node. The log you made with the script will be on that particular data-node.

mathieu.d · ‎11-03-2017

Alternatively you could search around "yarn queue" and ressource allocation. This will not "restrict" the number of mappers or reducers but this will control how many can run concurrently by giving access to only a subset of the available resources.

mathieu.d · ‎11-03-2017

First : save the namenode dir content. Second : can you launch the second namenode only ? Does it start ? If yes, you should be able to start the data-nodes and get access to the data.

mathieu.d · ‎11-03-2017

Hi, The concept of Hive partition do not map to HBase tables. So if you want to have HBase as the storage then you will need to workaround your use case. You could try to use "one HBase table" having a row key constructed with the partition value. That way you should be able to query your HBase table using the row key and avoid a full scan of the table. Or you could have one HBase table per "partition" (this also mean one hive table per partition). Or you could see that HBase do not answer your need and stay in Hive ? regards, Mathieu

mathieu.d · ‎10-25-2017

I think what you search is a configuration located inside the "core-site.xml" file (in HDFS configuration). search for "proxyuser" on the documentation of Cloudera. regards, Mathieu

mathieu.d · ‎09-12-2017

Not sure this information is available. You could go with the "yarn logs" command or go with the basic way using command line : - pdsh to distribute the same command on every data-node - launch a find on the container id regards, mathieu

mathieu.d · ‎09-08-2017

I believe this wait time of 30s is hard coded into the cloudera agent. I don't think we can alter it other than doing a real dirty modification which I wouldn't recommend. regards, Mathieu

mathieu.d · ‎08-11-2017

As far as I understand how Impala works, that is the expected behaviour. It is indeed intended for speeding up later queries that use the same sets of data.

mathieu.d · ‎07-25-2017

Hi, I personaly don't know of that possibility. But you can reference a morphline on a network share accessible from all nodes as a workaround (Guess you already know that). regards, Mathieu

mathieu.d · ‎06-12-2017

From my understanding when you use the Sentry HDFS synchronization plugin you only need to set the following ACLs : hive:hive / 771 https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_sg_hiveserver2_security.html#concept_vxf_pgx_nm https://www.cloudera.com/documentation/enterprise/latest/topics/sg_sentry_service_config.html#concept_z5b_42s_p4__section_lvc_4g4_rp Then it is the plugin that will manage the other permission according to permissions granted in Sentry. If you set the permissions yourself then there is not point in using the Sentry HDFS synchronization plugin.

Online	Offline
Last Visited	‎01-17-2018 02:52 AM

Member Since	‎07-16-2015 01:41 AM
Last Visited	‎01-17-2018 02:52 AM
Posts	177
Kudos received	28

Cloudera Community

Re: Unable to delete HDFS Corrupt files

Re: Hive partitions based on date from timestamp

Re: Partition Hive Table to Hbase Handler ?

Re: yarn logs location on disk

Re: Increase Flume graceful restart time

Re: Launcher ERROR, reason: Main class [org.apache...

Re: Hive limit number of mappers and reducers

Re: Cannot start an HA namenode with name dirs tha...

Re: Partition Hive Table to Hbase Handler ?

Re: Delegation UID with Hive

Re: yarn logs location on disk

Re: Increase Flume graceful restart time

Re: Should Impala release memory after use?

Re: hbase-indexer's configuration in Zookeeper?

Re: What are the ideal ACL's that need to be appli...