- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
KiteSDK HDP 2.5
- Labels:
-
Hortonworks Data Platform (HDP)
Created 10-17-2016 09:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I recently swapped sandboxes from HDP 2.4 to HDP 2.5 and I'm running into all sorts of issues with the KiteSDK. I created the directory /hdp/apps/2.5.0.0-1245/mapreduce/ and copied in mapreduce.tar.gz which got me a little further, but now I'm running into a "org.kitesdk.tools.CopyTask: Kite(dataset:file:/tmp/413a41a2-8813-4056-9433-3c5e073d80... ID=1 (1/1)(1): java.io.FileNotFoundException: File does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/crunch-283520469/p1/REDUCE" that I can't seem to overcome. Has anyone successfully gotten KiteAPI to work on HDP 2.5? I can't figure out what I'm doing wrong here. I'd be happy to go back to 2.4 but I can't seem to find a download for it.
Created 10-21-2016 04:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Fixed the error by using an earlier version of KiteAPI:
curl http://central.maven.org/maven2/org/kitesdk/kite-tools/0.17.0/kite-tools-0.17.0-binary.jar -o kite-dataset
Created on 10-18-2016 03:57 PM - edited 08-19-2019 01:51 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Daniel Rolls I'd be happy to debug your issues if you could provide RM and nodemanager logs for the tasks that fail. As a side note, Sandbox archives are available from the same page as the one you use to download 2.5 release. You have to click on EXPAND button next to Hortonworks Sandbox Archive where you'll be able to find any previous release up to Sandbox 1.3
Created 10-18-2016 04:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think I zipped up the requested logs. Thanks for your help, I'm somewhat new to hortonworks and trying to flush out a POC. Here's the full text of my error message:
[hdfs@sandbox bin]$ ./kite-dataset csv-import /home/hdfs/bin/ingest/Payor_1_Claims.txt Payor_1_Claims --delimiter '|' 1 job failure(s) occurred: org.kitesdk.tools.CopyTask: Kite(dataset:file:/tmp/b138551a-23e0-49ee-a51e-d9dd0773f1... ID=1 (1/1)(1): java.io.FileNotFoundException: File does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/crunch-380116631/p1/REDUCE at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1427) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1419) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1419) at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:766) at org.apache.hadoop.mapreduce.v2.util.MRApps.parseDistributedCacheArtifacts(MRApps.java:600) at org.apache.hadoop.mapreduce.v2.util.MRApps.setupDistributedCache(MRApps.java:490) at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:93) at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:163) at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.submit(CrunchControlledJob.java:329) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.startReadyJobs(CrunchJobControl.java:204) at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:238) at org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:112) at org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:55) at org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:83) at java.lang.Thread.run(Thread.java:745)
Created 10-18-2016 07:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
looks like a permissions issue
make sure that this exists and the current user has access
hdfs dfs -mkdir /tmp
hdfs dfs -chmod -R 777 /tmp
are you in as root?
http://kitesdk.org/docs/1.0.0/Using-the-Kite-CLI-to-Create-a-Dataset.html
Could be a kite error.
Nifi, Hive or Pig may be a better option.
http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-hive/
Looks like it may be Kite, may need to upgrade that
Created 10-19-2016 07:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm not sure if it's the docker implementation of HDP 2.5 on sandbox or what the story is. I've got the most recent version of the Kite API installed. Perms look good:
[root@sandbox ~]# hdfs dfs -ls /
Found 12 items
drwxrwxrwx - yarn hadoop 0 2016-10-17 19:51 /app-logs drwxr-xr-x - hdfs hdfs 0 2016-09-13 11:01 /apps drwxr-xr-x - yarn hadoop 0 2016-09-13 10:56 /ats drwxr-xr-x - hdfs hdfs 0 2016-09-13 11:08 /demo drwxr-xr-x - hdfs hdfs 0 2016-09-13 10:56 /hdp drwxr-xr-x - mapred hdfs 0 2016-09-13 10:56 /mapred drwxrwxrwx - mapred hadoop 0 2016-09-13 10:56 /mr-history drwxr-xr-x - hdfs hdfs 0 2016-10-12 15:05 /ranger drwxrwxrwx - spark hadoop 0 2016-10-19 19:46 /spark-history drwxrwxrwx - spark hadoop 0 2016-09-13 11:20 /spark2-history drwxrwxrwx - hdfs hdfs 0 2016-10-17 17:23 /tmp drwxr-xr-x - hdfs hdfs 0 2016-10-12 15:18 /user
[root@sandbox ~]# hdfs dfs -ls /tmp Found 13 items
-rwxrwxrwx 3 raj_ops hdfs 6676440 2016-10-17 17:23 /tmp/Payor_1_Claims.txt -rwxrwxrwx 3 raj_ops hdfs 2803 2016-10-17 17:23 /tmp/Payor_1_Eligibility.txt -rwxrwxrwx 3 raj_ops hdfs 21015 2016-10-17 17:23 /tmp/Payor_1_Glucose_Results.txt -rwxrwxrwx 3 raj_ops hdfs 2317192 2016-10-17 17:22 /tmp/Payor_2_Additional_Dx_Codes.txt -rwxrwxrwx 3 raj_ops hdfs 7866129 2016-10-17 17:23 /tmp/Payor_2_Claims.txt -rwxrwxrwx 3 raj_ops hdfs 8626 2016-10-17 17:23 /tmp/Payor_2_Eligibility.txt -rwxrwxrwx 3 raj_ops hdfs 22969 2016-10-17 17:23 /tmp/Payor_2_Glucose_Results.txt -rwxrwxrwx 3 raj_ops hdfs 8474653 2016-10-17 17:23 /tmp/Payor_3_Claims.txt -rwxrwxrwx 3 raj_ops hdfs 995712 2016-10-17 17:23 /tmp/Payor_3_Dx_Codes.txt -rwxrwxrwx 3 raj_ops hdfs 88106 2016-10-17 17:23 /tmp/Payor_3_Eligibility.txt -rwxrwxrwx 3 raj_ops hdfs 23125 2016-10-17 17:23 /tmp/Payor_3_Glucose_Results.txt drwxrwxrwx - hdfs hdfs 0 2016-09-13 10:56 /tmp/entity-file-history drwxrwxrwx - ambari-qa hdfs 0 2016-10-17 19:52 /tmp/hive
Created 10-21-2016 04:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Fixed the error by using an earlier version of KiteAPI:
curl http://central.maven.org/maven2/org/kitesdk/kite-tools/0.17.0/kite-tools-0.17.0-binary.jar -o kite-dataset
Created 10-21-2016 04:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Daniel Rolls thank you for the solution, I urge you to post an article on HCC describing your use case. I'm going to escalate this issue to Sandbox team. That way inconsistencies with versions of KiteSDK will be addressed going forward.
Created 10-21-2016 04:37 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@rmolina @Rafael Coss FYI
