Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Implementing WordCount with Cascading on HDP 2.1 Sandbox

avatar
Rising Star

Hi,

Could you help me to resolve this error.

[root@sandbox ~]# yarn jar /tmp/MRJar/WordCount.jar com.denmark.danskeBank.vo.WordCount /tmp/data/hamlet.txt /tmp/output                             
Not a valid JAR: /tmp/MRJar/WordCount.jar

This is what I have done: 1. Created the WordCount.jar file in eclipse with Hadoop1.x jars 2. Uploaded to HDFS dir - /tmp/MRJar 3. I got this error. Then I tried -

[root@sandbox ~]# hadoop fs -copyToLocal /tmp/MRJar/WordCount.jar /MapReduce                                                                         

16/02/21 05:46:07 WARN hdfs.DFSClient: DFSInputStream has been closed already

I also tried the steps given to run through gradle.

1. While executing - ~/gradle-1.9/bin/gradle clean jar, I got an error:

[cascade@sandbox part2]$ ~/gradle-1.9/bin/gradle clean jar
ERROR: JAVA_HOME is set to an invalid directory: /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.91.x8

I would wanted to create my MapReduce and try executing for practising - I am not a java developer.

Could you guide me!!!

Thanks.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Revathy Mourouguessane You can ignore this warning WARN hdfs.DFSClient: DFSInputStream has been closed already. There is a jira opened already to address this

I am sure you are following this http://hortonworks.com/hadoop-tutorial/cascading-hortonworks-data-platform-2-1/

Check your JAVA_HOME setting.

What's is the output of echo $JAVA_HOME?

View solution in original post

7 REPLIES 7

avatar

@Revathy Mourouguessane, See this thread. It provides guidance for running "word count" program on HDP Sandbox.

https://www.youtube.com/watch?v=5MYv8usiMnE

Hope this will help you solving this error.

avatar
Master Mentor

Your jae needs to be on local filesysyem and preferably not /tmp. If you are logged on as root leavw the jar in /root and just run it like so

yarn jar jarname

avatar
Master Mentor

For Java homecerror run one of my jdk scripts in administration folder https://github.com/dbist/scripts

avatar
Master Mentor

@Revathy Mourouguessane You can ignore this warning WARN hdfs.DFSClient: DFSInputStream has been closed already. There is a jira opened already to address this

I am sure you are following this http://hortonworks.com/hadoop-tutorial/cascading-hortonworks-data-platform-2-1/

Check your JAVA_HOME setting.

What's is the output of echo $JAVA_HOME?

avatar
Master Mentor

2304-screen-shot-2016-02-21-at-70701-am.png

[guest@sandbox dataprocessing]$ find / -name java

[guest@sandbox dataprocessing]$ export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.95.x86_64/jre

[guest@sandbox dataprocessing]$ ~/gradle-1.9/bin/gradle clean jar

Build file '/home/guest/examples/dataprocessing/build.gradle': line 9

The RepositoryHandler.mavenRepo() method has been deprecated and is scheduled to be removed in Gradle 2.0. Please use the maven() method instead.

:clean UP-TO-DATE

:compileJava

avatar
Master Mentor

2305-screen-shot-2016-02-21-at-71036-am.png

BUILD SUCCESSFUL

avatar
Rising Star

I was successful in executing a MapReduce Job. Since the method Job.setBy.JarName(WordCount.class) was missing it was unable to find out the Mapper class. Thanks!!!