Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

KiteSDK HDP 2.5

avatar
Contributor

I recently swapped sandboxes from HDP 2.4 to HDP 2.5 and I'm running into all sorts of issues with the KiteSDK. I created the directory /hdp/apps/2.5.0.0-1245/mapreduce/ and copied in mapreduce.tar.gz which got me a little further, but now I'm running into a "org.kitesdk.tools.CopyTask: Kite(dataset:file:/tmp/413a41a2-8813-4056-9433-3c5e073d80... ID=1 (1/1)(1): java.io.FileNotFoundException: File does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/crunch-283520469/p1/REDUCE" that I can't seem to overcome. Has anyone successfully gotten KiteAPI to work on HDP 2.5? I can't figure out what I'm doing wrong here. I'd be happy to go back to 2.4 but I can't seem to find a download for it.

1 ACCEPTED SOLUTION

avatar
Contributor
7 REPLIES 7

avatar
Master Mentor

@Daniel Rolls I'd be happy to debug your issues if you could provide RM and nodemanager logs for the tasks that fail. As a side note, Sandbox archives are available from the same page as the one you use to download 2.5 release. You have to click on EXPAND button next to Hortonworks Sandbox Archive where you'll be able to find any previous release up to Sandbox 1.3

8656-sandbox.png

avatar
Contributor
logs.zip

I think I zipped up the requested logs. Thanks for your help, I'm somewhat new to hortonworks and trying to flush out a POC. Here's the full text of my error message:

[hdfs@sandbox bin]$ ./kite-dataset csv-import /home/hdfs/bin/ingest/Payor_1_Claims.txt Payor_1_Claims --delimiter '|' 1 job failure(s) occurred: org.kitesdk.tools.CopyTask: Kite(dataset:file:/tmp/b138551a-23e0-49ee-a51e-d9dd0773f1... ID=1 (1/1)(1): java.io.FileNotFoundException: File does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/crunch-380116631/p1/REDUCE at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1427) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1419) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1419) at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:766) at org.apache.hadoop.mapreduce.v2.util.MRApps.parseDistributedCacheArtifacts(MRApps.java:600) at org.apache.hadoop.mapreduce.v2.util.MRApps.setupDistributedCache(MRApps.java:490) at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:93) at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:163) at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.submit(CrunchControlledJob.java:329) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.startReadyJobs(CrunchJobControl.java:204) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:238) at org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:112) at org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:55) at org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:83) at java.lang.Thread.run(Thread.java:745)

avatar
Master Guru

looks like a permissions issue

make sure that this exists and the current user has access

hdfs dfs -mkdir /tmp

hdfs dfs -chmod -R 777 /tmp

are you in as root?

http://kitesdk.org/docs/1.0.0/Using-the-Kite-CLI-to-Create-a-Dataset.html

Could be a kite error.

http://mail-archives.apache.org/mod_mbox/crunch-dev/201303.mbox/%3CBLU0-SMTP1468E6756741B738043FB98A...

Nifi, Hive or Pig may be a better option.

http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/

Looks like it may be Kite, may need to upgrade that

https://issues.cloudera.org/browse/KITE-874

avatar
Contributor

I'm not sure if it's the docker implementation of HDP 2.5 on sandbox or what the story is. I've got the most recent version of the Kite API installed. Perms look good:

[root@sandbox ~]# hdfs dfs -ls /

Found 12 items

drwxrwxrwx - yarn hadoop 0 2016-10-17 19:51 /app-logs drwxr-xr-x - hdfs hdfs 0 2016-09-13 11:01 /apps drwxr-xr-x - yarn hadoop 0 2016-09-13 10:56 /ats drwxr-xr-x - hdfs hdfs 0 2016-09-13 11:08 /demo drwxr-xr-x - hdfs hdfs 0 2016-09-13 10:56 /hdp drwxr-xr-x - mapred hdfs 0 2016-09-13 10:56 /mapred drwxrwxrwx - mapred hadoop 0 2016-09-13 10:56 /mr-history drwxr-xr-x - hdfs hdfs 0 2016-10-12 15:05 /ranger drwxrwxrwx - spark hadoop 0 2016-10-19 19:46 /spark-history drwxrwxrwx - spark hadoop 0 2016-09-13 11:20 /spark2-history drwxrwxrwx - hdfs hdfs 0 2016-10-17 17:23 /tmp drwxr-xr-x - hdfs hdfs 0 2016-10-12 15:18 /user

[root@sandbox ~]# hdfs dfs -ls /tmp Found 13 items

-rwxrwxrwx 3 raj_ops hdfs 6676440 2016-10-17 17:23 /tmp/Payor_1_Claims.txt -rwxrwxrwx 3 raj_ops hdfs 2803 2016-10-17 17:23 /tmp/Payor_1_Eligibility.txt -rwxrwxrwx 3 raj_ops hdfs 21015 2016-10-17 17:23 /tmp/Payor_1_Glucose_Results.txt -rwxrwxrwx 3 raj_ops hdfs 2317192 2016-10-17 17:22 /tmp/Payor_2_Additional_Dx_Codes.txt -rwxrwxrwx 3 raj_ops hdfs 7866129 2016-10-17 17:23 /tmp/Payor_2_Claims.txt -rwxrwxrwx 3 raj_ops hdfs 8626 2016-10-17 17:23 /tmp/Payor_2_Eligibility.txt -rwxrwxrwx 3 raj_ops hdfs 22969 2016-10-17 17:23 /tmp/Payor_2_Glucose_Results.txt -rwxrwxrwx 3 raj_ops hdfs 8474653 2016-10-17 17:23 /tmp/Payor_3_Claims.txt -rwxrwxrwx 3 raj_ops hdfs 995712 2016-10-17 17:23 /tmp/Payor_3_Dx_Codes.txt -rwxrwxrwx 3 raj_ops hdfs 88106 2016-10-17 17:23 /tmp/Payor_3_Eligibility.txt -rwxrwxrwx 3 raj_ops hdfs 23125 2016-10-17 17:23 /tmp/Payor_3_Glucose_Results.txt drwxrwxrwx - hdfs hdfs 0 2016-09-13 10:56 /tmp/entity-file-history drwxrwxrwx - ambari-qa hdfs 0 2016-10-17 19:52 /tmp/hive

avatar
Contributor

Fixed the error by using an earlier version of KiteAPI:

curl http://central.maven.org/maven2/org/kitesdk/kite-tools/0.17.0/kite-tools-0.17.0-binary.jar -o kite-dataset

avatar
Master Mentor

@Daniel Rolls thank you for the solution, I urge you to post an article on HCC describing your use case. I'm going to escalate this issue to Sandbox team. That way inconsistencies with versions of KiteSDK will be addressed going forward.

avatar
Master Mentor