About Harsh J

Harsh J · ‎07-06-2015

Thank you for the additional details! at org.apache.hadoop.util.RunJar.main(RunJar.java:212) This indicates a problem in the driver-end, or as you say 'during the execution of the job controller'. The issue is that even if you do add something to the MR distributed cache classpath, your executor class also references the same class. The act of adding a jar to the distributed task's classpath does not also add it to the local one. Here's how you can ensure that, if you use 'hadoop jar' to execute your job: ~> export HADOOP_CLASSPATH=/opt/cloudera/parcels/CDH/lib/hive/lib/hive-exec.jar ~> hadoop jar your-app.jar your.main.Class [arguments] This will add it also to your local JVM classpath, while your code will further add it onto the remote execution classpaths. > Optimally, I shouldn't have to stuff this one in the distributed cache since it sits in /opt/cloudera/parcels/CDH-5.3.5-1.cdh5.3.5.p0.4/jars/hive-exec-0.13.1-cdh5.3.5.jar on all of my slave nodes, but I also can't figure out how to tell MapReduce to look there. MR remote execution classpath is governed by the classpath entry defined in the mapred-site.xml and yarn-site.xml, and the additonal elements you add to the DistributedCache. They do not use the entire /opt/cloudera/parcels/CDH/jars/* path - this is so for isolation and flexibility purposes, as that area may carry multiple versions of the same dependencies, etc. Does this help?

Harsh J · ‎07-01-2015

What form of HDFS path are you configuring in your Flume agent configs? For HA, you must use the HA service name, such as hdfs://nameservice1/user/foo instead of hdfs://namenode-host:8020/user/foo. This will protect your agents from failures during HA failovers.

Harsh J · ‎06-25-2015

You can use the CM -> HBase -> Configuration -> RegionServer Safety Valve (for hbase-site.xml) to make the HFile V3 property setting change, since there's no direct UI field for it. CM does separate client configs from server ones, to isolate and configure server specific items independently. This is better explained in the architecture docs at http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_intro_primer.html

Harsh J · ‎06-25-2015

Yes, that'd be a good idea. Glad to hear it worked! Feel free to also mark the discussion as solved so others looking at similar issues may find this thread faster.

Harsh J · ‎06-25-2015

Did you follow the guide at http://archive.cloudera.com/cdh5/cdh/5/hbase/book.html#_visibility_labels? What error do you specifically get in trying to use the feature? Also, if you did change the HFile version, also ensure to run a major compaction on all tables to make the existing data migrate to it.

Harsh J · ‎06-24-2015

Thanks for closing the loop! We do not activate v3 HFiles in CDH5.4 to avoid breaking compatibility/adding additional work for users upgrading from an earlier CDH5 release: https://github.com/cloudera/hbase/commit/c9eb03bbf2c54b8e502feef89a59484bad987ff8

Harsh J · ‎06-24-2015

What is the full stack trace? That'd be necessary to tell where the failure point lies. If it fails at the driver/client end, you will likely also need to add the jar to HADOOP_CLASSPATH env-var before the command invocation. If it fails at the MR task end, then you'll need to make sure your distributed-cache configs works (by checking job config xml to search your jar inside it)

Harsh J · ‎06-24-2015

Glad to hear - thanks for closing the loop!

Harsh J · ‎06-24-2015

CM currently lacks support to define storage types. If you'd like to use this feature at the moment, place your XML override in the "DataNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml" instead, which accepts <property/> tags.

Harsh J · ‎06-24-2015

You need to raise the client heap size. For a one-off change, you can do the below: ~> export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Xmx5g" ~> hadoop fs -copyToLocal /user/docsearch/data/DiscardedAttachments /opt/ For a more permanent change, locate Gateway Client Java Heap configs in the relevant service (HDFS, YARN or Hive) in CM, raise the value and redeploy cluster-wide configs [1]. [1] - https://www.youtube.com/watch?v=4S9H3wftM_0

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Can't get rid of NoClassDefFoundError: org/apa...

Re: Flume - HDFS HA

Re: CDH5.4 Enabling Visibility Labels

Re: How to define HDFS storage tiers and storage p...

Re: CDH5.4 Enabling Visibility Labels

Re: CDH5.4 Enabling Visibility Labels

Re: Can't get rid of NoClassDefFoundError: org/apa...

Re: Unable to disable/drop an HBase table after en...

Re: How to define HDFS storage tiers and storage p...

Re: Exception in thread "main" java.lang.OutOfMemo...