Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Issue when using parquet org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist

Solved Go to solution
Highlighted

Issue when using parquet org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist

Contributor

I am getting this issue when using sqoop with parquet

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Issue when using parquet org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist

New Contributor

Then tried changing the dependency version for kite-sdk from 1.0.0 to 1.1.0 , and the issue gone . It worked !!! Issue resolved.

7 REPLIES 7

Re: Issue when using parquet org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist

@bandhu gupta

Can you please share the complete error along with sqoop command being used?

The issue might be when HIVE_HOME/HCAT_HOME is not set as Sqoop will use HIVE_HOME/HCAT_HOME to find hive libs, which are needed in hive import as Parquet file.

Thanks and Regards,

Sindhu

Re: Issue when using parquet org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist

Contributor

sqoop import --connect jdbc:oracle:thin:@XXX:XXXX/YYYY --username YYYYY --password YYYYY --table A.BBBB --hive-import --hive-database default --hive-table test15 --as-parquetfile -m 1

Job job_1465371735536_0055 failed with state FAILED due to: Job commit failed: java.lang.IllegalArgumentException: Wrong FS: ____file:/tmp/default/.temp/job_1465371735536_0055/mr/job_1465371735536_0055/b402d4ba-1a16-46bc-92c6-91fe141070d2.parquet, expected: hdfs://lxapp5524.dc.corp.telstra.com:8020 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:646)

Re: Issue when using parquet org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist

Contributor

We are only getting the above error if we are using parquet otherwise my table get pulled in hive easily .Please keep in mind that we have installed the HDP without internet so it's highly possible that we missed some thing.

Re: Issue when using parquet org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist

Contributor

We are facing the above error while we were using the below query

sqoop import --connect jdbc:oracle:thin:@XX:1521/DATABASENAME --username USER --password PWD --table SCHEMANAME.TABLENAME --hive-import --hive-table TABLENAME --hive-overwrite --num-mappers 1 --as-parquetfile

It's an issue when we are using Parquet and trying to ingest data in hive only because if we do the ingestion in hdfs with parquet , it gets completed.

Re: Issue when using parquet org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist

New Contributor

I am also getting same wrror on hdp 2.4 , while doing sqoop hive-import with parquet . Without parquet it is working fine .

16/06/09 21:12:11 INFO mapreduce.Job: Job job_1465467652802_0011 failed with state FAILED due to: Job commit failed: java.lang.IllegalArgumentException: Wrong FS: _______file:/tmp/default/.temp/job_1465467652802_0011/mr/job_1465467652802_0011/dc944213-b925-4e5b-ac2c-736e5fa8610f.parquet, expected: hdfs://lxapp5524.dc.corp.hdp.com:8020 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:646) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194) at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:636) at org.kitesdk.data.spi.filesystem.FileSystemDataset.merge(FileSystemDataset.java:327) at org.kitesdk.data.spi.filesystem.FileSystemDataset.merge(FileSystemDataset.java:56) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$MergeOutputCommitter.commitJob(DatasetKeyOutputFormat.java:370) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

Re: Issue when using parquet org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist

New Contributor

Then tried changing the dependency version for kite-sdk from 1.0.0 to 1.1.0 , and the issue gone . It worked !!! Issue resolved.

Re: Issue when using parquet org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist

New Contributor
Don't have an account?
Coming from Hortonworks? Activate your account here