Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Importing data from oracle using sqoop into a partitioned hive table as parquet

Importing data from oracle using sqoop into a partitioned hive table as parquet

Hi 

 

I'm trying to import data from oracle to hive as parquest file, it works fine when the hive table is not partitioned, the same is failing when i choose hive-partition-key options

 

Works fine

==========

sqoop import  --connect jdbc:oracle:thin:@//xxxx --username xxx --password xxxx --table xxxx  --columns "col1,"col2" ..."colx" -m 1 --hive-import --hive-database sandbox --hive-table parq_test --as-parquetfile --null-string '\\N' --null-non-string '\\N' --hive-drop-import-delims --target-dir /tmp/sqp_xxxx --verbose

 

 

Fails

====

sqoop import --connect jdbc:oracle:thin:@//xxxxx --username xxxxx --password xxxxx --table xxxx --columns "xol1","col2",..."coln" -m 1 --hive-import --hive-database xxx --hive-table parq_test_partitions --hive-partition-key run_id --hive-partition-value "111" --as-parquetfile --null-string '\\N' --null-non-string '\\N' --hive-drop-import-delims --target-dir /tmp/sqp_xxx --verbose

 

Error message

 

Error: java.lang.IllegalArgumentException: Cannot construct key, missing provided value: run_id
at org.kitesdk.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:115)
at org.kitesdk.data.spi.EntityAccessor.partitionValue(EntityAccessor.java:128)
at org.kitesdk.data.spi.EntityAccessor.keyFor(EntityAccessor.java:111)
at org.kitesdk.data.spi.filesystem.PartitionedDatasetWriter.write(PartitionedDatasetWriter.java:158)
at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$DatasetRecordWriter.write(DatasetKeyOutputFormat.java:325)
at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$DatasetRecordWriter.write(DatasetKeyOutputFormat.java:304)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.sqoop.mapreduce.ParquetImportMapper.map(ParquetImportMapper.java:70)
at org.apache.sqoop.mapreduce.ParquetImportMapper.map(ParquetImportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

 

 

Can someone hlep me with this? 

 

Regards

Suresh

6 REPLIES 6

Re: Importing data from oracle using sqoop into a partitioned hive table as parquet

New Contributor

I hive encountered same issue. It seems that --hive-partition-key doesn't coexists with --as-parquetfile. Does anyone know why and how to fix it ?

Re: Importing data from oracle using sqoop into a partitioned hive table as parquet

New Contributor

facing the same problem and can not fix it

Re: Importing data from oracle using sqoop into a partitioned hive table as parquet

Explorer

Hi, I am having the same problem. @suresh.sethu, @mkquant, @JackHe did you manage to solve the issue?

 

Thanks!

Re: Importing data from oracle using sqoop into a partitioned hive table as parquet

New Contributor

Hi

 

Is there any update on this issue?  I am facing the same issue still.  Please let me know if there are any updates.

Re: Importing data from oracle using sqoop into a partitioned hive table as parquet

Explorer

Unfortunately, the only thing I was able to do is to first manually create the table using PARTITIONED BY and STORED AS PARQUET.

 

 

Highlighted

Re: Importing data from oracle using sqoop into a partitioned hive table as parquet

New Contributor

 Hi,

 

Sqoop import as partition to a Hive Partitioned table works without the parquet options in the sqoop import command. 

 

Keep in mind the destination Hive table definition must not be "STORE AS parquet" either. The Sqoop import will work, but you end up with the Hive table throwing an error on select.

 

sqoop import --connect jdbc:oracle:thin:@//xxxxx --username xxxxx --password xxxxx --table xxxx --columns "xol1","col2",..."coln" -m 1 --hive-import --hive-database xxx --hive-table parq_test_partitions --hive-partition-key run_id --hive-partition-value "111" --null-string '\\N' --null-non-string '\\N' --hive-drop-import-delims --target-dir /tmp/sqp_xxx --verbose