Support Questions
Find answers, ask questions, and share your expertise

Merge-key in Sqoop to import RDBMS data to S3 not working

Highlighted

Merge-key in Sqoop to import RDBMS data to S3 not working

New Contributor

For importing DB data from Postgresql to Amazon S3 using EMR with Sqoop, we followed the below video. https://www.youtube.com/watch?v=3YJwDJOyDE0

and used the below command to do the import

./sqoop import -D org.apache.sqoop.splitter.allow_text_splitter=true -D mapreduce.output.basename=$name`date +%Y%m%d%H%M%S%Z` --connect $jdbcPath --username $username --password $password --map-column-java content=String --query "select * from tablename $CONDITIONS" --fields-terminated-by '|' --split-by updated --incremental lastmodified --check-column updated --target-dir s3://path -m $mapper --input-null-string '\\N'   --input-null-non-string '\\N' --merge-key keyname 

If we try the command with out merge-key attribute,Import is working fine and when we try it for the second time it says file already exists. When we try with “merge-key in command”, we are getting the below error.

“2018-03-29 08:35:37,717 ERROR org.apache.sqoop.tool.BaseSqoopTool (main): Error parsing arguments for import: 2018-03-29 08:35:37,717 ERROR org.apache.sqoop.tool.BaseSqoopTool (main): Unrecognized argument: --merge-key 2018-03-29 08:35:37,717 ERROR org.apache.sqoop.tool.BaseSqoopTool (main): Unrecognized argument: updated”

Later on we upgraded the Sqoop to a newer version. Now merge import is happening but the data are not getting stored to the specified path with the following logs:

2018-04-03 11:56:59,907 INFO org.apache.sqoop.mapreduce.ImportJobBase (main): Retrieved 15 records.
2018-04-03 11:57:06,027 INFO org.apache.sqoop.tool.ImportTool (main): Incremental import complete! To run another incremental import of all data following this import, supply the following arguments:
2018-04-03 11:57:06,027 INFO org.apache.sqoop.tool.ImportTool (main):  --incremental lastmodified
2018-04-03 11:57:06,027 INFO org.apache.sqoop.tool.ImportTool (main):   --check-column updated 
2018-04-03 11:57:06,027 INFO org.apache.sqoop.tool.ImportTool (main):   --last-value 2018-04-03 11:56:36.823885 

Could you please help us with the above query.

2 REPLIES 2
Highlighted

Re: Merge-key in Sqoop to import RDBMS data to S3 not working

@VIJAYA SEETHARAMAN

Whenever you are using --merge-key you need to be performing sqoop merge. --merge-key is not a valid argument for sqoop import. Refer Sqoop guide.

Re: Merge-key in Sqoop to import RDBMS data to S3 not working

New Contributor

We can use --merge-key while using sqoop import.We dont have to use Sqoop merge seperately.