Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Issue when using parquet org.kitesdk.data.DatasetNotFoundException: Descriptor location does not exist

avatar
Rising Star

I am getting this issue when using sqoop with parquet

1 ACCEPTED SOLUTION

avatar
New Contributor

Then tried changing the dependency version for kite-sdk from 1.0.0 to 1.1.0 , and the issue gone . It worked !!! Issue resolved.

View solution in original post

7 REPLIES 7

avatar
@bandhu gupta

Can you please share the complete error along with sqoop command being used?

The issue might be when HIVE_HOME/HCAT_HOME is not set as Sqoop will use HIVE_HOME/HCAT_HOME to find hive libs, which are needed in hive import as Parquet file.

Thanks and Regards,

Sindhu

avatar
Rising Star

sqoop import --connect jdbc:oracle:thin:@XXX:XXXX/YYYY --username YYYYY --password YYYYY --table A.BBBB --hive-import --hive-database default --hive-table test15 --as-parquetfile -m 1

Job job_1465371735536_0055 failed with state FAILED due to: Job commit failed: java.lang.IllegalArgumentException: Wrong FS: ____file:/tmp/default/.temp/job_1465371735536_0055/mr/job_1465371735536_0055/b402d4ba-1a16-46bc-92c6-91fe141070d2.parquet, expected: hdfs://lxapp5524.dc.corp.telstra.com:8020 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:646)

avatar
Rising Star

We are only getting the above error if we are using parquet otherwise my table get pulled in hive easily .Please keep in mind that we have installed the HDP without internet so it's highly possible that we missed some thing.

avatar
Rising Star

We are facing the above error while we were using the below query

sqoop import --connect jdbc:oracle:thin:@XX:1521/DATABASENAME --username USER --password PWD --table SCHEMANAME.TABLENAME --hive-import --hive-table TABLENAME --hive-overwrite --num-mappers 1 --as-parquetfile

It's an issue when we are using Parquet and trying to ingest data in hive only because if we do the ingestion in hdfs with parquet , it gets completed.

avatar
New Contributor

I am also getting same wrror on hdp 2.4 , while doing sqoop hive-import with parquet . Without parquet it is working fine .

16/06/09 21:12:11 INFO mapreduce.Job: Job job_1465467652802_0011 failed with state FAILED due to: Job commit failed: java.lang.IllegalArgumentException: Wrong FS: _______file:/tmp/default/.temp/job_1465467652802_0011/mr/job_1465467652802_0011/dc944213-b925-4e5b-ac2c-736e5fa8610f.parquet, expected: hdfs://lxapp5524.dc.corp.hdp.com:8020 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:646) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194) at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:636) at org.kitesdk.data.spi.filesystem.FileSystemDataset.merge(FileSystemDataset.java:327) at org.kitesdk.data.spi.filesystem.FileSystemDataset.merge(FileSystemDataset.java:56) at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$MergeOutputCommitter.commitJob(DatasetKeyOutputFormat.java:370) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

avatar
New Contributor

Then tried changing the dependency version for kite-sdk from 1.0.0 to 1.1.0 , and the issue gone . It worked !!! Issue resolved.

avatar
New Contributor