Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to store time stamp for lastmodified column in sqoop metastore in GMT format

How to store time stamp for lastmodified column in sqoop metastore in GMT format

Expert Contributor

I need to store timestamps in GMT format for certain sqoop jobs.My typical sqoop job looks like this:

sqoop job -Duser.timezone=GMT  --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop --create JOB_NAME -- import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToNull --username root --password 'PASSWORD' --table TABLE_NAME --incremental lastmodified --check-column updated_at --last-value 0 --merge-key entity_id --split-by entity_id --target-dir PATH_TO_DIRECTORY --hive-database DB_NAME --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N' --fields-terminated-by '\001' --input-null-string '\\N' --input-null-non-string '\\N' --input-null-non-string '\\N' --input-fields-terminated-by '\001'

I have tried using

-Doracle.sessionTimeZone=GMT
 -Duser.timezone=GMT
-Dmapred.child.java.opts="-Duser.timezone=GMT"

but none of these use GMT and continues to use local time stamp. How do I fix this?

I have all the dates in the database in GMT format and having local time stamp for sqoop metastore creates obvious problems.

5 REPLIES 5

Re: How to store time stamp for lastmodified column in sqoop metastore in GMT format

Rising Star
@Simran Kaur

workaround from this BUg - use

-Duser.timezone=GMT

https://issues.apache.org/jira/browse/SQOOP-423

A workaround you can try is passing the user.timezone property as a mapreduce child JVM option. We were seeing a similar error with Oracle Date mappings in Avro and we "resolved" it by specifying the system timezone that was being used to create the java.sql.TimeStamp's.

The HADOOP_OPTS setting only makes the change in the job submission process, but as the actual code is being run on the cluster, you need to modify the settings there.

  
  -D mapred.child.java.opts=" -Duser.timezone=GMT"

Re: How to store time stamp for lastmodified column in sqoop metastore in GMT format

Expert Contributor

@Dileep Kumar Chiguruvada: Hey, I have added my sqoop job in the question. I added exactly that in my sqoop job but that did not work. Am I adding that at a wrong place/position? Because that did not really work

Re: How to store time stamp for lastmodified column in sqoop metastore in GMT format

Rising Star

@Simran Kaur You can tweak this by adding/appending this HADOOP_OPTS = -D mapred.child.java.opts=" -Duser.timezone=GMT" in hadoop.distro file in the end.. Path: /usr/hdp/<version>/hadoop/bin/hadoop.distro

append like below

HADOOP_OPTS="$HADOOP_OPTS -D mapred.child.java.opts=" -Duser.timezone=GMT"

Re: How to store time stamp for lastmodified column in sqoop metastore in GMT format

Expert Contributor

@Dileep Kumar Chiguruvada : But in that case it would take GMT for all the sqoop jobs, I need GMT only for some jobs. Why would my sqoop job not take GMT when I have specified. I see that oracle.sessionTimezone is mentioned in the official documentation as well.

sqoop import -D user.timezone=GMT -driver com.mql.jdbc.Driver --connect jdbc:mysql://10.10.11.2/stalkbuy_magento_new?zeroDateTimeBehavior=convertToNull --username root --password 'perconaMYSQL' --table sales_flat_order --incremental lastmodified --check-column updated_at --last-value 0--merge-key entity_id --split-by entity_id --target-dir /user/hue/Magento/salesFlatOrder

I tried it like above as well but why should it not work?

Re: How to store time stamp for lastmodified column in sqoop metastore in GMT format

Expert Contributor

I also tried the direct import using :

sqoop import -Duser.timezone=GMT --driver com.mysql.jdbc.Driver --connect jdbc:mysql://IP/db?zeroDateTimeBehavior=convertToNull --username root --password 'password' --table TABLENAME --incremental lastmodified --check-column updated_at --last-value 0 --merge-key entity_id --split-by entity_id --target-dir DIR --hive-database DB --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N' --fields-terminated-by '\001' --input-null-string '\\N' --input-null-non-string '\\N' --input-null-non-string '\\N' --input-fields-terminated-by '\001'

but even then,I see it importing data using the local timestamp:

16/06/30 16:32:04 INFO tool.ImportTool: Lower bound value: '0'
16/06/30 16:32:04 INFO tool.ImportTool: Upper bound value: '2016-06-30 16:32:05.0'