Member since
04-04-2018
3
Posts
0
Kudos Received
0
Solutions
07-13-2018
10:02 AM
We can use --merge-key while using sqoop import.We dont have to use Sqoop merge seperately.
... View more
07-02-2018
11:56 AM
My dataset named dbData contains set of data. I want to update postgres with that data regularly where the dbData changes regularly on everyday purpose. dbData.map(newWrite()).
output(JDBCOutputFormat.buildJDBCOutputFormat()
.setDrivername(Utils.properties_fetch("drivername"))
.setDBUrl(Utils.properties_fetch("dbURL"))
.setUsername(Utils.properties_fetch("username"))
.setPassword(Utils.properties_fetch("password"))
.setQuery(Write.updatequery).finish()); My "Write" class looks like the following: publicclassWriteimplementsMapFunction<Tuple7<String,String,String,String,String,String,String>,Row>
{
staticString updatequery ;privatestaticfinallong serialVersionUID =1L;
publicRow map(Tuple7<String,String,String,String,String,String,String> value)throwsException
{ Row obj =newRow(7);
obj.setField(0, value.f0);
obj.setField(1, value.f1);
obj.setField(2, value.f2);
obj.setField(3, value.f3);
obj.setField(4, value.f4);
obj.setField(5, value.f5);
obj.setField(6, value.f6);
Write.updatequery=putdatainDb(obj);
return obj;
}
publicString putdatainDb(Row obj)
{
String updateQuery="UPDATE dashboard SET metric_result
='"+obj.getField(2)+"' ,metric_executed_on ='"+obj.getField(5)+"' ::date
where metric_orgid ='"+obj.getField(6)+"' and date(metric_from)
='"+obj.getField(3)+"'::date and date(metric_to) ='"+obj.getField(4)+"' ::date and metric_topic ='"+obj.getField(1)+"' ;";return updateQuery;
}} In set query I want the query to change every time with the new row so that I can update my database regularly. Suggest some ways to achieve this.
... View more
04-04-2018
11:46 AM
For importing DB data from Postgresql to Amazon S3 using EMR with Sqoop, we followed the below video.
https://www.youtube.com/watch?v=3YJwDJOyDE0
and used the below command to do the import ./sqoop import -D org.apache.sqoop.splitter.allow_text_splitter=true -D mapreduce.output.basename=$name`date +%Y%m%d%H%M%S%Z` --connect $jdbcPath --username $username --password $password --map-column-java content=String --query "select * from tablename $CONDITIONS" --fields-terminated-by '|' --split-by updated --incremental lastmodified --check-column updated --target-dir s3://path -m $mapper --input-null-string '\\N' --input-null-non-string '\\N' --merge-key keyname
If we try the command with out merge-key attribute,Import is working fine and when we try it for the second time it says file already exists.
When we try with “merge-key in command”, we are getting the below error.
“2018-03-29 08:35:37,717 ERROR org.apache.sqoop.tool.BaseSqoopTool (main): Error parsing arguments for import:
2018-03-29 08:35:37,717 ERROR org.apache.sqoop.tool.BaseSqoopTool (main): Unrecognized argument: --merge-key
2018-03-29 08:35:37,717 ERROR org.apache.sqoop.tool.BaseSqoopTool (main): Unrecognized argument: updated”
Later on we upgraded the Sqoop to a newer version. Now merge import is happening but the data are not getting stored to the specified path with the following logs:
2018-04-03 11:56:59,907 INFO org.apache.sqoop.mapreduce.ImportJobBase (main): Retrieved 15 records.
2018-04-03 11:57:06,027 INFO org.apache.sqoop.tool.ImportTool (main): Incremental import complete! To run another incremental import of all data following this import, supply the following arguments:
2018-04-03 11:57:06,027 INFO org.apache.sqoop.tool.ImportTool (main): --incremental lastmodified
2018-04-03 11:57:06,027 INFO org.apache.sqoop.tool.ImportTool (main): --check-column updated
2018-04-03 11:57:06,027 INFO org.apache.sqoop.tool.ImportTool (main): --last-value 2018-04-03 11:56:36.823885
Could you please help us with the above query.
... View more
Labels: