Support Questions

Find answers, ask questions, and share your expertise

Is it possibe to import data from AWS S3 to Hadoop using SQOOP?


I got a requiremrnt to load data from AWS S3 to HDFS on incremental basis in CDH 5.8 platfrom. Please help on this. Below link says it is possible. Guide me on this.


I have never used Sqoop for this but if there is a connector it should work.

If the data is not large I would use the hdfs command to do it as it is less complex.

hdfs dfs -cp s3a://bucket/path/to/object hdfs:///hdfs/path.

What is your specific question on the link provided? Have you tried it or don't understand a portion of it?

I got a customer request to use sqoop to pull data from S3 to hdfs on incremental basis. As per my knowledge in sqoop, it is only used for importing/exporting data between RDMS and Hadoop.

Since I found in Sqoop documentation that data import is possible from S3 to HDFS. I need your guidance on this. As said in your response which connector I need to have here?

When I am trying to create the hdfs connector(create link -c hdfs-connector) as per the sqoop2 doc via sqoop cli, its always giving me connection refused error. But my sqoop2 server is up and running fine. When I give "show connector" it displays avaliable connectors( hdfs-connector & generic-jdbc-connector). Don't know why I am getting connection refused error.