About Harsh J

Harsh J · ‎07-20-2014

Since Oozie lacks knowledge of where your HBase configs lie, you will need to pass the client hbase-site.xml file (placed somewhere on HDFS, by copying from /etc/hbase/conf/hbase-site.xml on any HBase gateway node) via the <job-xml>…</job-xml> option. Alternatively, try the below command instead (will not be sufficient for secured clusters, which need further properties), replacing zk-host1,zk-host2,zk-host3 with your actual 3 hosts appropriately: sqoop import -Dhbase.zookeeper.quorum=zk-host1,zk-host2,zk-host3 --connect jdbc:oracle:thin:@XXX:port/XXX --username XXX --password XXX --table XXX -m 1 --incremental lastmodified --last-value '2014-06-23' --check-column XXX --append --hbase-table XXX --column-family info --hbase-row-key XXX --hbase-bulkload

Harsh J · ‎07-20-2014

Where exactly is the hive.metastore.sasl.enabled property applied? Are you certain it is applied to the running HiveMetaStore server? Does a regular Hive CLI configured with hive.metastore.uris instead of DB properties run properly (i.e. show tables, etc. work fine)?

Harsh J · ‎07-20-2014

Please post your cluster's memory configuration, such as the resource MB offered by the NodeManagers, and individual MapReduce settings of AM, Map and Reduce task memories. It appears that the cluster's unable to schedule more than 1 or 2 containers at a time, causing the job to eternally hang cause Oozie runs 2x AMs grabbing 2x containers already.

Harsh J · ‎07-19-2014

Can you post more details on what you mean by 'multiple applications' (and how many, exactly), as well as your scheduler configuration? What behaviour do you notice exactly when you say they all 'stop'. Do you mean their AppMasters run but the actual application containers (i.e. map or reduce tasks) do not run, or do you mean they all just fail?

Harsh J · ‎07-19-2014

Do you perhaps have safety valve overrides in your Hue Configuration page in CM that is setting the default mapred_clusters to a MR1 location? If so, please remove them away since you have switched over to YARN now. Doing that will resolve it. Also ensure that Hue's MapReduce Cluster setting is set to your YARN service and not the MR1 service.

Harsh J · ‎07-19-2014

Yes, the reason of the 200k default is to warn you that you may be facing a small files issue in your cluster, or that you may be close to requiring to expand further horizontally. Having more number of blocks raises the heap requirement at the DataNodes. The threshold warning exists to also notify you about this (that you may need to soon raise the DN heap size to allow it to continue serving blocks at the same performance). With CM5 we have revised the number to 600k, given memory optimisation improvements for DNs in CDH4.6+ and CDH5.0+. You can feel free to raise the threshold via the CM -> HDFS -> Configuration -> Monitoring section fields, but do look into if your users have begun creating too many tiny files as it may hamper their job performance with overheads of too many blocks (and thereby, too many mappers).

Harsh J · ‎07-19-2014

It is difficult to say if you are hitting a bug without looking at relevant Checkpointer placed entries in the StandbyNameNode (SBN) logs. There may be issues with transferring the file between the SBN and the NN, probably cause of timeouts or otherwise.

Harsh J · ‎07-19-2014

If CM does not present a UI field for an advanced tuning property, you can rely on the Configuration Snippet (Safety Valve) fields to set them in manually. More on this at: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.8.1/Cloudera-Manager-Managing-Clusters/cmmc_safety_valve.html

Harsh J · ‎07-19-2014

You receive the error because the 'hbase' user does not have a login shell assigned to it. You can set a shell for the 'hbase' user on the machine, to allow direct 'su' based login to that user, by following http://www.cyberciti.biz/faq/howto-set-bash-as-your-default-shell/ However, if your goal is to simply use the 'hbase' user for running superuser level commands, we instead recommend using 'sudo' style commands. For example: ~> sudo -u hbase hbase hbck ~> sudo -u hbase hbase shell You can also invoke a shell as the 'hbase' user in certain cases, via: ~> sudo -u hbase /bin/bash

Harsh J · ‎07-14-2014

Your java program needs to include cluster client configs on its classpath for the Configuration class to be able to read and discover the actual MR cluster automatically. Typically you can achieve this by adding the directory /etc/hadoop/conf to your classpath, if you are not launching your custom application using the 'hadoop jar' command (which auto-sets the desirable classpath).

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Zookeeper connection with Hbase

Re: Oozie/hive-server2 not able to connect to hive...

Re: CDH-5.0.0 YARN repeat log "Ramping down all sc...

Re: Yarn applications hang foreever if run in para...

Re: HadoopAccessorException: E0900: Jobtracker [ho...

Re: DATA_NODE_BLOCK_COUNT threshold 200,00 block(s...

Re: checkpoint is not occuring

Re: /getimage: java.io.IOException: GetImage faile...

Re: hbase user iis currently not available.

Re: Sqoop runs in local mode while calling it from...