Member since
06-02-2014
17
Posts
1
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1807 | 04-28-2015 09:08 AM | |
1252 | 06-19-2014 12:59 PM |
07-06-2015
03:36 PM
This error was called during the execution of the job controller within the MapReduce job. Here's a similar one with the same root problem. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/io/orc/OrcNewOutputFormat
at com.who.bgt.logloader.schema.OrcFileLoader.run(OrcFileLoader.java:94)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.who.bgt.logloader.schema.OrcFileLoader.main(OrcFileLoader.java:45)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 8 more The specific line it is complaining about is here: job.setOutputFormatClass(OrcNewOutputFormat.class); The obvious problem is that it's failing to find the OrcNewOutputFormat class definition, which is in hive-exec-0.13.1-cdh5.3.5.jar I pushed the jar to hdfs://lib/hive-exec..., and within my main function, I call the following before I run the job: DistributedCache.addFileToClassPath(new Path("/lib/hive-exec-0.13.1-cdh5.3.5.jar"), lConfig); Can you be more explicit on how I go about making sure my distributed-cache configs work? Optimally, I shouldn't have to stuff this one in the distributed cache since it sits in /opt/cloudera/parcels/CDH-5.3.5-1.cdh5.3.5.p0.4/jars/hive-exec-0.13.1-cdh5.3.5.jar on all of my slave nodes, but I also can't figure out how to tell MapReduce to look there.
... View more
05-28-2015
04:41 PM
I'm using Java mapreduce job to write data to a directory which will be interpreted as a Hive table in RCFile format. In order to do this, I need to include org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable object, which can be found in hive-serde-0.13.1-cdh5.3.3.jar. So far, so good. I've included the jar in my command line like this: /usr/bin/hadoop jar /path/lib/awesome-mapred-0.9.6.jar com.awesome.HiveLoadController -libjars /path/lib/postgresql-8.4-702.jdbc4.jar,/path/lib/hive-serde-0.13.1-cdh5.3.3.jar I know for certain that it is loading the postgres library because it prints correctly retrieved information before it throws the error. I know that it is grabbing and transferring that jar file because it throws a fit if I move it from the /path/lib directory. I know that the object exists in the jar because I've unpacked it and looked. Is there something in the rest of the lib path that might be interfering with it finding that object in the jar?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
MapReduce
05-26-2015
06:43 PM
Thank you, mfox. My problem was that the basic install set all of HDFS's groups to "superuser" instead of "hadoop". Changing it to "hadoop" allowed mapreduce to write its history logs to the correct location.
... View more
05-18-2015
12:59 PM
I think that the "drilling down into the individual map/reduce tasks" is where this falls apart for me. When I click on the task (e.g. application_1431658373269_0170) it shows me a list of application masters. From there I can click on the node id (e.g. hadoopslave0011p1mdw1.sendgrid.net:8042) This takes me to a page where I can see all of the containers currently running, and see logs for the node itself, which isn't what I need. I can also click on "logs" for the application master. This takes me to a page that says Error getting logs for container_e23_1431658373269_0170_01_000001 Which tells me that the cluster is mis-configured in some way, and isn't even producing them. Can you recommend a next step?
... View more
05-15-2015
03:52 PM
This may be a complete noob question, but we're shifting from CDH4 MR1 to CDH5 MR2. I had no problem navigating the menus in the previous version to find the stdout and stderr output from individual mappers and/or reducers, but I can't find them anywhere, either through the interface, on the Yarn node's disks, or on HDFS. Could someone point me in the right direction?
... View more
Labels:
- Labels:
-
Apache YARN
-
HDFS
05-11-2015
10:26 AM
I'm on Cloudera 5.3.3. Here's my command line and output: [hadoop]$ hdfs dfs -du / 2298676940886 6896030822658 /output 21297905593 63893716779 /tmp 6072184915396 18216555409976 /user
... View more
05-08-2015
08:27 AM
1 Kudo
I just switched from Cloudera 4 to Cloudera 5, and the output format for hdfs dfs -du has changed. It now has two columns instead of just one. I'm guessing that the first is the actual content size and the second is the block storage consumption, but I can't find any documentation about this. Can anyone clarify and/or point me the right direction?
... View more
Labels:
- Labels:
-
HDFS
05-01-2015
09:24 AM
I'm transferring files using distcp on Cloudera 5.3.x, and I can't get it to distribute the transfer using MR2. I don't have MR1 installed at all. I'd rather not because it'll hide issues. My command line looks like this, and it runs just fine, it just copies every file in series: mapred distcp s3n://key:secret@logs.space.com/source/2015/04/28/ hdfs://nameservice/target/dir/2015/04/28 Is there a configuration item that I missed?
... View more
Labels:
- Labels:
-
HDFS
04-28-2015
09:08 AM
This turned out to be a configuration setting issue. "dfs.namenode.shared.edits.dir" had a directory value in it, and needed to be cleared.
... View more