Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Sqoop Fails to Import from Postgres to S3

avatar
New Contributor

I import data from Postgresql to hdfs and hdfs to S3 in my daily operation. (sqoop import [postgres to hdfs] & distcp [from hdfs to s3])

I wanted to remove intermediate step (hdfs) and directly import data to S3 bucket by using sqoop.

However, same sqoop string fails in the end of the import operation.

 

 

sqoop import 
-Dmapreduce.map.memory.mb="8192" 
-Dmapreduce.map.java.opts="-Xmx7200m" 
-Dmapreduce.task.timeout=0 
-Dmapreduce.task.io.sort.mb="2400" 
--connect $conn_string$ 
--fetch-size=20000 
--username $user_name$ 
--p $password$ 
--num-mappers 20 
--query "SELECT * FROM table1 WHERE table1.id > 10000000 and table1.id < 20000000 and \$CONDITIONS" 
--hcatalog-database $schema_name$ 
--hcatalog-table $table_name$ 
--hcatalog-storage-stanza "STORED AS PARQUET LOCATION s3a://path/to/destination"
--split-by table1.id 

 

 

 

I also tried 

 

 

--target-dir s3a://path/to/destination

 

 

 instead of 

 

 

....... LOCATION s3a://path/to/destination

 

 

 

After "mapping: %100 completed" it throws error message below:

 

 

 

Error: java.io.IOException: Could not clean up TaskAttemptID:attempt_1571557098082_15536_m_000004_0@s3a://path/to/destination_DYN0.6894861001907555/ingest_day=__HIVE_DEFAULT_PARTITION__
        at org.apache.hive.hcatalog.mapreduce.TaskCommitContextRegistry.commitTask(TaskCommitContextRegistry.java:83)
        at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitTask(FileOutputCommitterContainer.java:145)
        at org.apache.hadoop.mapred.Task.commit(Task.java:1200)
        at org.apache.hadoop.mapred.Task.done(Task.java:1062)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.io.IOException: Could not rename 
s3a://path/to/destination/_DYN0.6894861001907555/ingest_day=20180522/_temporary/1/_temporary/attempt_1571557098082_15536_m_000004_0 
to 
s3a://path/to/destination/_DYN0.6894861001907555/ingest_day=20180522/_temporary/1/task_1571557098082_15536_m_000004
        at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:579)
        at org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:172)
        at org.apache.hadoop.mapred.OutputCommitter.commitTask(OutputCommitter.java:343)
        at org.apache.hive.hcatalog.mapreduce.DynamicPartitionFileRecordWriterContainer$1.commitTask(DynamicPartitionFileRecordWriterContainer.java:125)
        at org.apache.hive.hcatalog.mapreduce.TaskCommitContextRegistry.commitTask(TaskCommitContextRegistry.java:80)
        ... 9 more```

 

 

 

I know renaming in S3 is not possible but is there a walkaround to import data directly to the S3?

 

1 ACCEPTED SOLUTION

avatar
Super Guru
Hi @ckk,

I believe that we started supporting Sqoop import into S3A since CDH6.1

https://docs.cloudera.com/documentation/enterprise/6/6.1/topics/admin_sqoop_s3_import.html

and you need to use "--target-dir". Can you share the full Sqoop command you used when using "--target-dir"? Can you also attach "--verbose" output file for review?

Thanks
Eric

View solution in original post

1 REPLY 1

avatar
Super Guru
Hi @ckk,

I believe that we started supporting Sqoop import into S3A since CDH6.1

https://docs.cloudera.com/documentation/enterprise/6/6.1/topics/admin_sqoop_s3_import.html

and you need to use "--target-dir". Can you share the full Sqoop command you used when using "--target-dir"? Can you also attach "--verbose" output file for review?

Thanks
Eric