Created 10-17-2016 09:10 PM
I recently swapped sandboxes from HDP 2.4 to HDP 2.5 and I'm running into all sorts of issues with the KiteSDK. I created the directory /hdp/apps/2.5.0.0-1245/mapreduce/ and copied in mapreduce.tar.gz which got me a little further, but now I'm running into a "org.kitesdk.tools.CopyTask: Kite(dataset:file:/tmp/413a41a2-8813-4056-9433-3c5e073d80... ID=1 (1/1)(1): java.io.FileNotFoundException: File does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/crunch-283520469/p1/REDUCE" that I can't seem to overcome. Has anyone successfully gotten KiteAPI to work on HDP 2.5? I can't figure out what I'm doing wrong here. I'd be happy to go back to 2.4 but I can't seem to find a download for it.
Created 10-21-2016 04:29 PM
Fixed the error by using an earlier version of KiteAPI:
curl http://central.maven.org/maven2/org/kitesdk/kite-tools/0.17.0/kite-tools-0.17.0-binary.jar -o kite-dataset
Created on 10-18-2016 03:57 PM - edited 08-19-2019 01:51 AM
@Daniel Rolls I'd be happy to debug your issues if you could provide RM and nodemanager logs for the tasks that fail. As a side note, Sandbox archives are available from the same page as the one you use to download 2.5 release. You have to click on EXPAND button next to Hortonworks Sandbox Archive where you'll be able to find any previous release up to Sandbox 1.3
Created 10-18-2016 04:40 PM
I think I zipped up the requested logs. Thanks for your help, I'm somewhat new to hortonworks and trying to flush out a POC. Here's the full text of my error message:
[hdfs@sandbox bin]$ ./kite-dataset csv-import /home/hdfs/bin/ingest/Payor_1_Claims.txt Payor_1_Claims --delimiter '|' 1 job failure(s) occurred: org.kitesdk.tools.CopyTask: Kite(dataset:file:/tmp/b138551a-23e0-49ee-a51e-d9dd0773f1... ID=1 (1/1)(1): java.io.FileNotFoundException: File does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/crunch-380116631/p1/REDUCE at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1427) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1419) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1419) at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:766) at org.apache.hadoop.mapreduce.v2.util.MRApps.parseDistributedCacheArtifacts(MRApps.java:600) at org.apache.hadoop.mapreduce.v2.util.MRApps.setupDistributedCache(MRApps.java:490) at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:93) at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:163) at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.submit(CrunchControlledJob.java:329) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.startReadyJobs(CrunchJobControl.java:204) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:238) at org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:112) at org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:55) at org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:83) at java.lang.Thread.run(Thread.java:745)
Created 10-18-2016 07:51 PM
looks like a permissions issue
make sure that this exists and the current user has access
hdfs dfs -mkdir /tmp
hdfs dfs -chmod -R 777 /tmp
are you in as root?
http://kitesdk.org/docs/1.0.0/Using-the-Kite-CLI-to-Create-a-Dataset.html
Could be a kite error.
Nifi, Hive or Pig may be a better option.
http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/
Looks like it may be Kite, may need to upgrade that
Created 10-19-2016 07:49 PM
I'm not sure if it's the docker implementation of HDP 2.5 on sandbox or what the story is. I've got the most recent version of the Kite API installed. Perms look good:
[root@sandbox ~]# hdfs dfs -ls /
Found 12 items
drwxrwxrwx - yarn hadoop 0 2016-10-17 19:51 /app-logs drwxr-xr-x - hdfs hdfs 0 2016-09-13 11:01 /apps drwxr-xr-x - yarn hadoop 0 2016-09-13 10:56 /ats drwxr-xr-x - hdfs hdfs 0 2016-09-13 11:08 /demo drwxr-xr-x - hdfs hdfs 0 2016-09-13 10:56 /hdp drwxr-xr-x - mapred hdfs 0 2016-09-13 10:56 /mapred drwxrwxrwx - mapred hadoop 0 2016-09-13 10:56 /mr-history drwxr-xr-x - hdfs hdfs 0 2016-10-12 15:05 /ranger drwxrwxrwx - spark hadoop 0 2016-10-19 19:46 /spark-history drwxrwxrwx - spark hadoop 0 2016-09-13 11:20 /spark2-history drwxrwxrwx - hdfs hdfs 0 2016-10-17 17:23 /tmp drwxr-xr-x - hdfs hdfs 0 2016-10-12 15:18 /user
[root@sandbox ~]# hdfs dfs -ls /tmp Found 13 items
-rwxrwxrwx 3 raj_ops hdfs 6676440 2016-10-17 17:23 /tmp/Payor_1_Claims.txt -rwxrwxrwx 3 raj_ops hdfs 2803 2016-10-17 17:23 /tmp/Payor_1_Eligibility.txt -rwxrwxrwx 3 raj_ops hdfs 21015 2016-10-17 17:23 /tmp/Payor_1_Glucose_Results.txt -rwxrwxrwx 3 raj_ops hdfs 2317192 2016-10-17 17:22 /tmp/Payor_2_Additional_Dx_Codes.txt -rwxrwxrwx 3 raj_ops hdfs 7866129 2016-10-17 17:23 /tmp/Payor_2_Claims.txt -rwxrwxrwx 3 raj_ops hdfs 8626 2016-10-17 17:23 /tmp/Payor_2_Eligibility.txt -rwxrwxrwx 3 raj_ops hdfs 22969 2016-10-17 17:23 /tmp/Payor_2_Glucose_Results.txt -rwxrwxrwx 3 raj_ops hdfs 8474653 2016-10-17 17:23 /tmp/Payor_3_Claims.txt -rwxrwxrwx 3 raj_ops hdfs 995712 2016-10-17 17:23 /tmp/Payor_3_Dx_Codes.txt -rwxrwxrwx 3 raj_ops hdfs 88106 2016-10-17 17:23 /tmp/Payor_3_Eligibility.txt -rwxrwxrwx 3 raj_ops hdfs 23125 2016-10-17 17:23 /tmp/Payor_3_Glucose_Results.txt drwxrwxrwx - hdfs hdfs 0 2016-09-13 10:56 /tmp/entity-file-history drwxrwxrwx - ambari-qa hdfs 0 2016-10-17 19:52 /tmp/hive
Created 10-21-2016 04:29 PM
Fixed the error by using an earlier version of KiteAPI:
curl http://central.maven.org/maven2/org/kitesdk/kite-tools/0.17.0/kite-tools-0.17.0-binary.jar -o kite-dataset
Created 10-21-2016 04:36 PM
@Daniel Rolls thank you for the solution, I urge you to post an article on HCC describing your use case. I'm going to escalate this issue to Sandbox team. That way inconsistencies with versions of KiteSDK will be addressed going forward.
Created 10-21-2016 04:37 PM
@rmolina @Rafael Coss FYI