Created on 07-04-2016 06:15 AM - edited 09-16-2022 03:28 AM
The database has all its dates in GMT while sqoop automatically takes local (Asia/kolkata) for incremental updates.
It probably picks that up from JVM but I need it to use GMT for certain jobs and local for others. How do I get about this?
The link https://community.cloudera.com/t5/Data-Ingestion-Integration/Sqoop-s-metastore-timezone/td-p/16306
discusses the same issue. Is there an actual workaround for this? The solution given in the thread did not really work for me .
Here's what I have for a sqoop job:
sqoop job -D oracle.sessionTimeZone=GMT -D mapred.child.java.opts=" -Duser.timezone=GMT" --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop --create JOB_NAME -- import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToNull --username root --password 'PASSWORD' --table TABLE_NAME--incremental lastmodified --check-column updated_at --last-value 0 --merge-key entity_id --split-by entity_id --target-dir LOCATION_SPECIFIED --hive-database Magento --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N' --fields-terminated-by '\001' --input-null-string '\\N' --input-null-non-string '\\N' --input-null-non-string '\\N' --input-fields-terminated-by '\001'
logs:
5459 [uber-SubtaskRunner] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 5497 [uber-SubtaskRunner] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.6-cdh5.7.0 5817 [uber-SubtaskRunner] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead. 5832 [uber-SubtaskRunner] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 5859 [uber-SubtaskRunner] WARN org.apache.sqoop.ConnFactory - Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time. 5874 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Using default fetchSize of 1000 5874 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.CodeGenTool - Beginning code generation 6306 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM sales_flat_order AS t WHERE 1=0 6330 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM sales_flat_order AS t WHERE 1=0 6434 [uber-SubtaskRunner] INFO org.apache.sqoop.orm.CompilationManager - HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/hadoop-mapreduce 9911 [uber-SubtaskRunner] INFO org.apache.sqoop.orm.CompilationManager - Writing jar file: /tmp/sqoop-yarn/compile/51c9a7f9e76b0547825eb7a852721bf9/sales_flat_order.jar 9928 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM sales_flat_order AS t WHERE 1=0 9941 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Incremental import based on column updated_at 9941 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Lower bound value: '0' 9941 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Upper bound value: '2016-06-30 11:40:36.0' 9943 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of sales_flat_order 9962 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM sales_flat_order AS t WHERE 1=0 10007 [uber-SubtaskRunner] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies. 10672 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation 10674 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - BoundingValsQuery: SELECT MIN(entity_id), MAX(entity_id) FROM sales_flat_order WHERE ( updated_at >= '0' AND updated_at < '2016-06-30 11:40:36.0' ) 11667 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.IntegerSplitter - Split size: 86592; Num splits: 4 from: 1 to: 346372 Heart beat 42986 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Transferred 300.3027 MB in 32.9683 seconds (9.1088 MB/sec) 42995 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Retrieved 339510 records. 43008 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Saving incremental import state to the metastore 43224 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Updated data for job: sales_flat_order
Created 07-19-2016 06:34 PM
Sorry for the delay in response!
Based on below 2 parameters:
--driver com.mysql.jdbc.Driver --connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToN
It looks like that you are importing data from MySQL instead of Oracle. As a result, the workaround mentioned in another thread (-D oracle.sessionTimeZone=GMT) won't work, because it is actaully an Oracle database parameter. Sqoop just passes it to Oracle JDBC driver.
I am not very familiar with MySQL JDBC driver, but according to its document[1], maybe we can achieve your goal with JDBC parameter "useTimezone".
To use GMT time, please try below command:
sqoop job --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop \
--create JOB_NAME -- import --driver com.mysql.jdbc.Driver \
--connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToull&useTimezone=false \
--username root --password 'PASSWORD' --table TABLE_NAME \
--incremental lastmodified --check-column updated_at --last-value 0 \
--merge-key entity_id --split-by entity_id \
--target-dir LOCATION_SPECIFIED --hive-database Magento \
--hive-drop-import-delims --null-string '\\N' \
--null-non-string '\\N' --fields-terminated-by '\001' \
--input-null-string '\\N' --input-null-non-string '\\N' \
--input-null-non-string '\\N' --input-fields-terminated-by '\001'
To use your local time, please try below command:
sqoop job --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop \
--create JOB_NAME -- import --driver com.mysql.jdbc.Driver \
--connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToull&useTimezone=true&serverTimezone=GMT \
--username root --password 'PASSWORD' --table TABLE_NAME \
--incremental lastmodified --check-column updated_at --last-value 0 \
--merge-key entity_id --split-by entity_id \
--target-dir LOCATION_SPECIFIED --hive-database Magento \
--hive-drop-import-delims --null-string '\\N' \
--null-non-string '\\N' --fields-terminated-by '\001' \
--input-null-string '\\N' --input-null-non-string '\\N' \
--input-null-non-string '\\N' --input-fields-terminated-by '\001'
Notice:
1) Please use the latest MySQL jdbc driver: 5.1.34
2) Please make sure that the timezone used in database is GMT
And please contact your MySQL team to seek some suggestions from them. This is more a MySQL problem than a Sqoop problem.
Created 07-19-2016 06:34 PM
Sorry for the delay in response!
Based on below 2 parameters:
--driver com.mysql.jdbc.Driver --connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToN
It looks like that you are importing data from MySQL instead of Oracle. As a result, the workaround mentioned in another thread (-D oracle.sessionTimeZone=GMT) won't work, because it is actaully an Oracle database parameter. Sqoop just passes it to Oracle JDBC driver.
I am not very familiar with MySQL JDBC driver, but according to its document[1], maybe we can achieve your goal with JDBC parameter "useTimezone".
To use GMT time, please try below command:
sqoop job --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop \
--create JOB_NAME -- import --driver com.mysql.jdbc.Driver \
--connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToull&useTimezone=false \
--username root --password 'PASSWORD' --table TABLE_NAME \
--incremental lastmodified --check-column updated_at --last-value 0 \
--merge-key entity_id --split-by entity_id \
--target-dir LOCATION_SPECIFIED --hive-database Magento \
--hive-drop-import-delims --null-string '\\N' \
--null-non-string '\\N' --fields-terminated-by '\001' \
--input-null-string '\\N' --input-null-non-string '\\N' \
--input-null-non-string '\\N' --input-fields-terminated-by '\001'
To use your local time, please try below command:
sqoop job --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop \
--create JOB_NAME -- import --driver com.mysql.jdbc.Driver \
--connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToull&useTimezone=true&serverTimezone=GMT \
--username root --password 'PASSWORD' --table TABLE_NAME \
--incremental lastmodified --check-column updated_at --last-value 0 \
--merge-key entity_id --split-by entity_id \
--target-dir LOCATION_SPECIFIED --hive-database Magento \
--hive-drop-import-delims --null-string '\\N' \
--null-non-string '\\N' --fields-terminated-by '\001' \
--input-null-string '\\N' --input-null-non-string '\\N' \
--input-null-non-string '\\N' --input-fields-terminated-by '\001'
Notice:
1) Please use the latest MySQL jdbc driver: 5.1.34
2) Please make sure that the timezone used in database is GMT
And please contact your MySQL team to seek some suggestions from them. This is more a MySQL problem than a Sqoop problem.
Created 07-24-2016 09:44 PM
@yshi : Hi, Thanks for your response but somehow using & multiple times in the connection string always give me trouble in sqoop commands.
It isn't able to parse the connection string that way and ends up giving socket connection error while creating the job only.
Created 07-24-2016 09:57 PM
Created 07-24-2016 10:16 PM
@Harsh J: I thought so too but that still gives the error exception creating sql connection
Created on 07-24-2016 10:30 PM - edited 07-24-2016 10:31 PM
Awesome 🙂 Thank worked.
Just let me know if I can help you with a review may be? I would love to do that. Thanks much.
Created 07-25-2016 05:09 AM
Created 07-27-2016 03:13 AM
Oh, That sounds cool. I am available post 7:00 p.m IST. Thanks
Created 07-27-2016 09:05 PM
Sorry, I cannot make that time. Could you please paste the information here?