Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Sqoop Fails to Import from Postgres to S3

Solved Go to solution
Highlighted

Sqoop Fails to Import from Postgres to S3

New Contributor

I import data from Postgresql to hdfs and hdfs to S3 in my daily operation. (sqoop import [postgres to hdfs] & distcp [from hdfs to s3])

I wanted to remove intermediate step (hdfs) and directly import data to S3 bucket by using sqoop.

However, same sqoop string fails in the end of the import operation.

 

 

sqoop import 
-Dmapreduce.map.memory.mb="8192" 
-Dmapreduce.map.java.opts="-Xmx7200m" 
-Dmapreduce.task.timeout=0 
-Dmapreduce.task.io.sort.mb="2400" 
--connect $conn_string$ 
--fetch-size=20000 
--username $user_name$ 
--p $password$ 
--num-mappers 20 
--query "SELECT * FROM table1 WHERE table1.id > 10000000 and table1.id < 20000000 and \$CONDITIONS" 
--hcatalog-database $schema_name$ 
--hcatalog-table $table_name$ 
--hcatalog-storage-stanza "STORED AS PARQUET LOCATION s3a://path/to/destination"
--split-by table1.id 

 

 

 

I also tried 

 

 

--target-dir s3a://path/to/destination

 

 

 instead of 

 

 

....... LOCATION s3a://path/to/destination

 

 

 

After "mapping: %100 completed" it throws error message below:

 

 

 

Error: java.io.IOException: Could not clean up TaskAttemptID:attempt_1571557098082_15536_m_000004_0@s3a://path/to/destination_DYN0.6894861001907555/ingest_day=__HIVE_DEFAULT_PARTITION__
        at org.apache.hive.hcatalog.mapreduce.TaskCommitContextRegistry.commitTask(TaskCommitContextRegistry.java:83)
        at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitTask(FileOutputCommitterContainer.java:145)
        at org.apache.hadoop.mapred.Task.commit(Task.java:1200)
        at org.apache.hadoop.mapred.Task.done(Task.java:1062)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.io.IOException: Could not rename 
s3a://path/to/destination/_DYN0.6894861001907555/ingest_day=20180522/_temporary/1/_temporary/attempt_1571557098082_15536_m_000004_0 
to 
s3a://path/to/destination/_DYN0.6894861001907555/ingest_day=20180522/_temporary/1/task_1571557098082_15536_m_000004
        at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:579)
        at org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:172)
        at org.apache.hadoop.mapred.OutputCommitter.commitTask(OutputCommitter.java:343)
        at org.apache.hive.hcatalog.mapreduce.DynamicPartitionFileRecordWriterContainer$1.commitTask(DynamicPartitionFileRecordWriterContainer.java:125)
        at org.apache.hive.hcatalog.mapreduce.TaskCommitContextRegistry.commitTask(TaskCommitContextRegistry.java:80)
        ... 9 more```

 

 

 

I know renaming in S3 is not possible but is there a walkaround to import data directly to the S3?

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Sqoop Fails to Import from Postgres to S3

Guru
Hi @ckk,

I believe that we started supporting Sqoop import into S3A since CDH6.1

https://docs.cloudera.com/documentation/enterprise/6/6.1/topics/admin_sqoop_s3_import.html

and you need to use "--target-dir". Can you share the full Sqoop command you used when using "--target-dir"? Can you also attach "--verbose" output file for review?

Thanks
Eric
1 REPLY 1

Re: Sqoop Fails to Import from Postgres to S3

Guru
Hi @ckk,

I believe that we started supporting Sqoop import into S3A since CDH6.1

https://docs.cloudera.com/documentation/enterprise/6/6.1/topics/admin_sqoop_s3_import.html

and you need to use "--target-dir". Can you share the full Sqoop command you used when using "--target-dir"? Can you also attach "--verbose" output file for review?

Thanks
Eric
Don't have an account?
Coming from Hortonworks? Activate your account here