Support Questions

Find answers, ask questions, and share your expertise

Implementing WordCount with Cascading on HDP 2.1 Sandbox

avatar
Rising Star

Hi,

Could you help me to resolve this error.

[root@sandbox ~]# yarn jar /tmp/MRJar/WordCount.jar com.denmark.danskeBank.vo.WordCount /tmp/data/hamlet.txt /tmp/output                             
Not a valid JAR: /tmp/MRJar/WordCount.jar

This is what I have done: 1. Created the WordCount.jar file in eclipse with Hadoop1.x jars 2. Uploaded to HDFS dir - /tmp/MRJar 3. I got this error. Then I tried -

[root@sandbox ~]# hadoop fs -copyToLocal /tmp/MRJar/WordCount.jar /MapReduce                                                                         

16/02/21 05:46:07 WARN hdfs.DFSClient: DFSInputStream has been closed already

I also tried the steps given to run through gradle.

1. While executing - ~/gradle-1.9/bin/gradle clean jar, I got an error:

[cascade@sandbox part2]$ ~/gradle-1.9/bin/gradle clean jar
ERROR: JAVA_HOME is set to an invalid directory: /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.91.x8

I would wanted to create my MapReduce and try executing for practising - I am not a java developer.

Could you guide me!!!

Thanks.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Revathy Mourouguessane You can ignore this warning WARN hdfs.DFSClient: DFSInputStream has been closed already. There is a jira opened already to address this

I am sure you are following this http://hortonworks.com/hadoop-tutorial/cascading-hortonworks-data-platform-2-1/

Check your JAVA_HOME setting.

What's is the output of echo $JAVA_HOME?

View solution in original post

7 REPLIES 7

avatar

@Revathy Mourouguessane, See this thread. It provides guidance for running "word count" program on HDP Sandbox.

https://www.youtube.com/watch?v=5MYv8usiMnE

Hope this will help you solving this error.

avatar
Master Mentor

Your jae needs to be on local filesysyem and preferably not /tmp. If you are logged on as root leavw the jar in /root and just run it like so

yarn jar jarname

avatar
Master Mentor

For Java homecerror run one of my jdk scripts in administration folder https://github.com/dbist/scripts

avatar
Master Mentor

@Revathy Mourouguessane You can ignore this warning WARN hdfs.DFSClient: DFSInputStream has been closed already. There is a jira opened already to address this

I am sure you are following this http://hortonworks.com/hadoop-tutorial/cascading-hortonworks-data-platform-2-1/

Check your JAVA_HOME setting.

What's is the output of echo $JAVA_HOME?

avatar
Master Mentor

2304-screen-shot-2016-02-21-at-70701-am.png

[guest@sandbox dataprocessing]$ find / -name java

[guest@sandbox dataprocessing]$ export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.95.x86_64/jre

[guest@sandbox dataprocessing]$ ~/gradle-1.9/bin/gradle clean jar

Build file '/home/guest/examples/dataprocessing/build.gradle': line 9

The RepositoryHandler.mavenRepo() method has been deprecated and is scheduled to be removed in Gradle 2.0. Please use the maven() method instead.

:clean UP-TO-DATE

:compileJava

avatar
Master Mentor

2305-screen-shot-2016-02-21-at-71036-am.png

BUILD SUCCESSFUL

avatar
Rising Star

I was successful in executing a MapReduce Job. Since the method Job.setBy.JarName(WordCount.class) was missing it was unable to find out the Mapper class. Thanks!!!