Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

KiteSDK HDP 2.5

Solved Go to solution
Highlighted

KiteSDK HDP 2.5

I recently swapped sandboxes from HDP 2.4 to HDP 2.5 and I'm running into all sorts of issues with the KiteSDK. I created the directory /hdp/apps/2.5.0.0-1245/mapreduce/ and copied in mapreduce.tar.gz which got me a little further, but now I'm running into a "org.kitesdk.tools.CopyTask: Kite(dataset:file:/tmp/413a41a2-8813-4056-9433-3c5e073d80... ID=1 (1/1)(1): java.io.FileNotFoundException: File does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/crunch-283520469/p1/REDUCE" that I can't seem to overcome. Has anyone successfully gotten KiteAPI to work on HDP 2.5? I can't figure out what I'm doing wrong here. I'd be happy to go back to 2.4 but I can't seem to find a download for it.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: KiteSDK HDP 2.5

7 REPLIES 7
Highlighted

Re: KiteSDK HDP 2.5

Mentor

@Daniel Rolls I'd be happy to debug your issues if you could provide RM and nodemanager logs for the tasks that fail. As a side note, Sandbox archives are available from the same page as the one you use to download 2.5 release. You have to click on EXPAND button next to Hortonworks Sandbox Archive where you'll be able to find any previous release up to Sandbox 1.3

8656-sandbox.png

Highlighted

Re: KiteSDK HDP 2.5

logs.zip

I think I zipped up the requested logs. Thanks for your help, I'm somewhat new to hortonworks and trying to flush out a POC. Here's the full text of my error message:

[hdfs@sandbox bin]$ ./kite-dataset csv-import /home/hdfs/bin/ingest/Payor_1_Claims.txt Payor_1_Claims --delimiter '|' 1 job failure(s) occurred: org.kitesdk.tools.CopyTask: Kite(dataset:file:/tmp/b138551a-23e0-49ee-a51e-d9dd0773f1... ID=1 (1/1)(1): java.io.FileNotFoundException: File does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/crunch-380116631/p1/REDUCE at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1427) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1419) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1419) at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:766) at org.apache.hadoop.mapreduce.v2.util.MRApps.parseDistributedCacheArtifacts(MRApps.java:600) at org.apache.hadoop.mapreduce.v2.util.MRApps.setupDistributedCache(MRApps.java:490) at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:93) at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:163) at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.submit(CrunchControlledJob.java:329) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.startReadyJobs(CrunchJobControl.java:204) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:238) at org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:112) at org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:55) at org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:83) at java.lang.Thread.run(Thread.java:745)

Highlighted

Re: KiteSDK HDP 2.5

Super Guru

looks like a permissions issue

make sure that this exists and the current user has access

hdfs dfs -mkdir /tmp

hdfs dfs -chmod -R 777 /tmp

are you in as root?

http://kitesdk.org/docs/1.0.0/Using-the-Kite-CLI-to-Create-a-Dataset.html

Could be a kite error.

http://mail-archives.apache.org/mod_mbox/crunch-dev/201303.mbox/%3CBLU0-SMTP1468E6756741B738043FB98A...

Nifi, Hive or Pig may be a better option.

http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/

Looks like it may be Kite, may need to upgrade that

https://issues.cloudera.org/browse/KITE-874

Highlighted

Re: KiteSDK HDP 2.5

I'm not sure if it's the docker implementation of HDP 2.5 on sandbox or what the story is. I've got the most recent version of the Kite API installed. Perms look good:

[root@sandbox ~]# hdfs dfs -ls /

Found 12 items

drwxrwxrwx - yarn hadoop 0 2016-10-17 19:51 /app-logs drwxr-xr-x - hdfs hdfs 0 2016-09-13 11:01 /apps drwxr-xr-x - yarn hadoop 0 2016-09-13 10:56 /ats drwxr-xr-x - hdfs hdfs 0 2016-09-13 11:08 /demo drwxr-xr-x - hdfs hdfs 0 2016-09-13 10:56 /hdp drwxr-xr-x - mapred hdfs 0 2016-09-13 10:56 /mapred drwxrwxrwx - mapred hadoop 0 2016-09-13 10:56 /mr-history drwxr-xr-x - hdfs hdfs 0 2016-10-12 15:05 /ranger drwxrwxrwx - spark hadoop 0 2016-10-19 19:46 /spark-history drwxrwxrwx - spark hadoop 0 2016-09-13 11:20 /spark2-history drwxrwxrwx - hdfs hdfs 0 2016-10-17 17:23 /tmp drwxr-xr-x - hdfs hdfs 0 2016-10-12 15:18 /user

[root@sandbox ~]# hdfs dfs -ls /tmp Found 13 items

-rwxrwxrwx 3 raj_ops hdfs 6676440 2016-10-17 17:23 /tmp/Payor_1_Claims.txt -rwxrwxrwx 3 raj_ops hdfs 2803 2016-10-17 17:23 /tmp/Payor_1_Eligibility.txt -rwxrwxrwx 3 raj_ops hdfs 21015 2016-10-17 17:23 /tmp/Payor_1_Glucose_Results.txt -rwxrwxrwx 3 raj_ops hdfs 2317192 2016-10-17 17:22 /tmp/Payor_2_Additional_Dx_Codes.txt -rwxrwxrwx 3 raj_ops hdfs 7866129 2016-10-17 17:23 /tmp/Payor_2_Claims.txt -rwxrwxrwx 3 raj_ops hdfs 8626 2016-10-17 17:23 /tmp/Payor_2_Eligibility.txt -rwxrwxrwx 3 raj_ops hdfs 22969 2016-10-17 17:23 /tmp/Payor_2_Glucose_Results.txt -rwxrwxrwx 3 raj_ops hdfs 8474653 2016-10-17 17:23 /tmp/Payor_3_Claims.txt -rwxrwxrwx 3 raj_ops hdfs 995712 2016-10-17 17:23 /tmp/Payor_3_Dx_Codes.txt -rwxrwxrwx 3 raj_ops hdfs 88106 2016-10-17 17:23 /tmp/Payor_3_Eligibility.txt -rwxrwxrwx 3 raj_ops hdfs 23125 2016-10-17 17:23 /tmp/Payor_3_Glucose_Results.txt drwxrwxrwx - hdfs hdfs 0 2016-09-13 10:56 /tmp/entity-file-history drwxrwxrwx - ambari-qa hdfs 0 2016-10-17 19:52 /tmp/hive

Highlighted

Re: KiteSDK HDP 2.5

Highlighted

Re: KiteSDK HDP 2.5

Mentor

@Daniel Rolls thank you for the solution, I urge you to post an article on HCC describing your use case. I'm going to escalate this issue to Sandbox team. That way inconsistencies with versions of KiteSDK will be addressed going forward.

Highlighted

Re: KiteSDK HDP 2.5

Mentor
Don't have an account?
Coming from Hortonworks? Activate your account here