Sqoop Fails to Import from Postgres to S3

ckk — Mon, 11 Nov 2019 13:28:03 GMT

I import data from Postgresql to hdfs and hdfs to S3 in my daily operation. (sqoop import [postgres to hdfs] & distcp [from hdfs to s3])

I wanted to remove intermediate step (hdfs) and directly import data to S3 bucket by using sqoop.

However, same sqoop string fails in the end of the import operation.

sqoop import -Dmapreduce.map.memory.mb="8192" -Dmapreduce.map.java.opts="-Xmx7200m" -Dmapreduce.task.timeout=0 -Dmapreduce.task.io.sort.mb="2400" --connect $conn_string$ --fetch-size=20000 --username $user_name$ --p $password$ --num-mappers 20 --query "SELECT * FROM table1 WHERE table1.id > 10000000 and table1.id < 20000000 and \$CONDITIONS" --hcatalog-database $schema_name$ --hcatalog-table $table_name$ --hcatalog-storage-stanza "STORED AS PARQUET LOCATION s3a://path/to/destination" --split-by table1.id

I also tried

--target-dir s3a://path/to/destination

instead of

....... LOCATION s3a://path/to/destination

After "mapping: %100 completed" it throws error message below:

Error: java.io.IOException: Could not clean up TaskAttemptID:attempt_1571557098082_15536_m_000004_0@s3a://path/to/destination_DYN0.6894861001907555/ingest_day=__HIVE_DEFAULT_PARTITION__ at org.apache.hive.hcatalog.mapreduce.TaskCommitContextRegistry.commitTask(TaskCommitContextRegistry.java:83) at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitTask(FileOutputCommitterContainer.java:145) at org.apache.hadoop.mapred.Task.commit(Task.java:1200) at org.apache.hadoop.mapred.Task.done(Task.java:1062) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164) Caused by: java.io.IOException: Could not rename s3a://path/to/destination/_DYN0.6894861001907555/ingest_day=20180522/_temporary/1/_temporary/attempt_1571557098082_15536_m_000004_0 to s3a://path/to/destination/_DYN0.6894861001907555/ingest_day=20180522/_temporary/1/task_1571557098082_15536_m_000004 at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:579) at org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:172) at org.apache.hadoop.mapred.OutputCommitter.commitTask(OutputCommitter.java:343) at org.apache.hive.hcatalog.mapreduce.DynamicPartitionFileRecordWriterContainer$1.commitTask(DynamicPartitionFileRecordWriterContainer.java:125) at org.apache.hive.hcatalog.mapreduce.TaskCommitContextRegistry.commitTask(TaskCommitContextRegistry.java:80) ... 9 more```

I know renaming in S3 is not possible but is there a walkaround to import data directly to the S3?

Re: Sqoop Fails to Import from Postgres to S3

EricL — Wed, 13 Nov 2019 06:48:36 GMT

Hi @ckk,

I believe that we started supporting Sqoop import into S3A since CDH6.1

https://docs.cloudera.com/documentation/enterprise/6/6.1/topics/admin_sqoop_s3_import.html

and you need to use "--target-dir". Can you share the full Sqoop command you used when using "--target-dir"? Can you also attach "--verbose" output file for review?

Thanks
Eric

question Re: Sqoop Fails to Import from Postgres to S3 in Support Questions

Sqoop Fails to Import from Postgres to S3

Re: Sqoop Fails to Import from Postgres to S3