About Harsh J

Harsh J · ‎07-15-2015

I was expecting you may talk about that next, given a non-secure cluster 🙂 This is a far more trickier problem to solve. You could, for instance, enable the LinuxContainerExecutor and set yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users (as false) on the NM configs without turning on security, and this will cause your tasks to run as the submitting UID. However, for that to completely work, all your nodes will need that very same user to be locally available (such as via a regular unix account, or via LDAP/etc.). Would this be possible in your situation? If not, I'd recommend setting also the below at top of you script, which will work in non-secure clusters: export HADOOP_USER_NAME=myao

Harsh J · ‎07-15-2015

Glad to know! Please consider marking your post as a solution so it helps others with similar issues arrive at a resolving post quicker.

Harsh J · ‎07-15-2015

You are using the memory channel, which sits inside the Java Heap of the Flume process. Your capacity indicates 100k items, so the overall usage if the channel gets full (based on the source's event sizes) can exhaust the heap if there's not sufficient overhead. What is your current Flume heap size? You may want to double it, or estimate better the size based on average source event size times the channel capacity, plus some breathing room for the Flume daemon's own work.

Harsh J · ‎07-15-2015

Edit the "Kafka Broker Environment Advanced Configuration Snippet (Safety Valve)" field under CM -> Kafka -> Configuration with the below entry (For a 5g example): KAFKA_HEAP_OPTS="-Xmx5g -Xms5g" Save and restart.

Harsh J · ‎07-14-2015

Unclear from your description, but given that CDH 5.4 includes Kite 1.0.0 in its default packaging, are you using an older version manually for some other purpose?

Harsh J · ‎07-14-2015

Your error indicates a lack of proper local configuration for the Hive CLI to find and run. For avoiding this issue, please add YARN and HIVE Gateway roles to all hosts running NodeManagers in your cluster, and deploy cluster-wide client configuration [1]. Subsequently, add the below lines to the top of your script (before the Hive command or other commands are invoked): export HIVE_CONF_DIR=/etc/hive/conf export HADOOP_CONF_DIR=/etc/hive/conf For more on adding gateways via CM, see http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_mc_client_config.html [1] - https://www.youtube.com/watch?v=4S9H3wftM_0

Harsh J · ‎07-14-2015

You can use the new Java driver feature of the Oozie MR action to run your jar in an easier way. Follow this: http://archive.cloudera.com/cdh5/cdh/5/oozie/WorkflowFunctionalSpec.html#a3.2.2.2_Configuring_the_MapReduce_action_with_Java_code

Harsh J · ‎07-14-2015

Your NodeManager's offered memory resource may be too low for the amount of memory the applications/jobs are demanding. This is a common situation that leads to a job waiting in ACCEPTED state, awaiting more resources to run. You can raise the CM -> YARN -> Configuration -> "Container Memory" field values to higher numbers to resolve this. This problem is typically also only seen on small installations such as 1-3 nodes.

Harsh J · ‎07-13-2015

>From Hadoop: The Definite Guide (Tom White): """ About LazyOutputFormat ----------------------- A typical mapreduce program can produce output files that are empty, depending on your implemetation. If you want to suppress creation of empty files, you need to leverage LazyOutputFormat. Two lines in your driver will do the trick- import org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat; & LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class); """

Harsh J · ‎07-08-2015

Please check the single map task log of job_1436201602998_0002 for the actual reason behind the pig script execution failure.

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Cannot pass value from Hive query output direc...

Re: [Flume] - GC overhead limit exceeded error in ...

Re: [Flume] - GC overhead limit exceeded error in ...

Re: How do you change Kafka heap memory with CDH (...

Re: Error Sqooping data with Oozie, java.lang.NoSu...

Re: Cannot pass value from Hive query output direc...

Re: Passing Paramter to Oozie workflow

Re: JOB Stuck in Accepted State

Re: how to suppress mapper output files if the out...

Re: java.lang.RuntimeException: java.lang.ClassNot...