Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

change sqoop metastore timezone to GMT

avatar
Expert Contributor

The database has all its dates in GMT while sqoop automatically takes local (Asia/kolkata) for incremental updates.

It probably picks that up from JVM but I need it to use GMT for certain jobs and local for others. How do I get about this?

The link https://community.cloudera.com/t5/Data-Ingestion-Integration/Sqoop-s-metastore-timezone/td-p/16306

discusses the same issue. Is there an actual workaround for this? The solution given in the thread did not really work for me .

Here's what I have for a sqoop job:

sqoop job -D oracle.sessionTimeZone=GMT -D mapred.child.java.opts=" -Duser.timezone=GMT" --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop --create JOB_NAME -- import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToNull --username root --password 'PASSWORD' --table TABLE_NAME--incremental lastmodified --check-column updated_at --last-value 0 --merge-key entity_id --split-by entity_id --target-dir LOCATION_SPECIFIED --hive-database Magento --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N' --fields-terminated-by '\001' --input-null-string '\\N' --input-null-non-string '\\N' --input-null-non-string '\\N' --input-fields-terminated-by '\001'

logs:

5459 [uber-SubtaskRunner] WARN  org.apache.sqoop.tool.SqoopTool  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
5497 [uber-SubtaskRunner] INFO  org.apache.sqoop.Sqoop  - Running Sqoop version: 1.4.6-cdh5.7.0
5817 [uber-SubtaskRunner] WARN  org.apache.sqoop.tool.BaseSqoopTool  - Setting your password on the command-line is insecure. Consider using -P instead.
5832 [uber-SubtaskRunner] WARN  org.apache.sqoop.ConnFactory  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
5859 [uber-SubtaskRunner] WARN  org.apache.sqoop.ConnFactory  - Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
5874 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Using default fetchSize of 1000
5874 [uber-SubtaskRunner] INFO  org.apache.sqoop.tool.CodeGenTool  - Beginning code generation
6306 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM sales_flat_order AS t WHERE 1=0
6330 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM sales_flat_order AS t WHERE 1=0
6434 [uber-SubtaskRunner] INFO  org.apache.sqoop.orm.CompilationManager  - HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/hadoop-mapreduce
9911 [uber-SubtaskRunner] INFO  org.apache.sqoop.orm.CompilationManager  - Writing jar file: /tmp/sqoop-yarn/compile/51c9a7f9e76b0547825eb7a852721bf9/sales_flat_order.jar
9928 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM sales_flat_order AS t WHERE 1=0
9941 [uber-SubtaskRunner] INFO  org.apache.sqoop.tool.ImportTool  - Incremental import based on column updated_at
9941 [uber-SubtaskRunner] INFO  org.apache.sqoop.tool.ImportTool  - Lower bound value: '0'
9941 [uber-SubtaskRunner] INFO  org.apache.sqoop.tool.ImportTool  - Upper bound value: '2016-06-30 11:40:36.0'
9943 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Beginning import of sales_flat_order
9962 [uber-SubtaskRunner] INFO  org.apache.sqoop.manager.SqlManager  - Executing SQL statement: SELECT t.* FROM sales_flat_order AS t WHERE 1=0
10007 [uber-SubtaskRunner] WARN  org.apache.sqoop.mapreduce.JobBase  - SQOOP_HOME is unset. May not be able to find all job dependencies.
10672 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.db.DBInputFormat  - Using read commited transaction isolation
10674 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat  - BoundingValsQuery: SELECT MIN(entity_id), MAX(entity_id) FROM sales_flat_order WHERE ( updated_at >= '0' AND updated_at < '2016-06-30 11:40:36.0' )
11667 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.db.IntegerSplitter  - Split size: 86592; Num splits: 4 from: 1 to: 346372
Heart beat
42986 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Transferred 300.3027 MB in 32.9683 seconds (9.1088 MB/sec)
42995 [uber-SubtaskRunner] INFO  org.apache.sqoop.mapreduce.ImportJobBase  - Retrieved 339510 records.
43008 [uber-SubtaskRunner] INFO  org.apache.sqoop.tool.ImportTool  - Saving incremental import state to the metastore
43224 [uber-SubtaskRunner] INFO  org.apache.sqoop.tool.ImportTool  - Updated data for job: sales_flat_order
1 ACCEPTED SOLUTION

avatar
Rising Star

@sim6

Sorry for the delay in response!

 

Based on below 2 parameters:

--driver com.mysql.jdbc.Driver --connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToN

It looks like that you are importing data from MySQL instead of Oracle. As a result, the workaround mentioned in another thread (-D oracle.sessionTimeZone=GMT) won't work, because it is actaully an Oracle database parameter. Sqoop just passes it to Oracle JDBC driver.

 

I am not very familiar with MySQL JDBC driver, but according to its document[1], maybe we can achieve your goal with JDBC parameter "useTimezone".

 

To use GMT time, please try below command:

sqoop job --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop \
--create JOB_NAME -- import --driver com.mysql.jdbc.Driver \
--connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToull&useTimezone=false \
--username root --password 'PASSWORD' --table TABLE_NAME \
--incremental lastmodified --check-column updated_at --last-value 0 \
--merge-key entity_id --split-by entity_id \
--target-dir LOCATION_SPECIFIED --hive-database Magento \
--hive-drop-import-delims --null-string '\\N' \
--null-non-string '\\N' --fields-terminated-by '\001' \
--input-null-string '\\N' --input-null-non-string '\\N' \
--input-null-non-string '\\N' --input-fields-terminated-by '\001'

To use your local time, please try below command:

sqoop job --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop \
--create JOB_NAME -- import --driver com.mysql.jdbc.Driver \
--connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToull&useTimezone=true&serverTimezone=GMT \
--username root --password 'PASSWORD' --table TABLE_NAME \
--incremental lastmodified --check-column updated_at --last-value 0 \
--merge-key entity_id --split-by entity_id \
--target-dir LOCATION_SPECIFIED --hive-database Magento \
--hive-drop-import-delims --null-string '\\N' \
--null-non-string '\\N' --fields-terminated-by '\001' \
--input-null-string '\\N' --input-null-non-string '\\N' \
--input-null-non-string '\\N' --input-fields-terminated-by '\001'

 

Notice:

1) Please use the latest MySQL jdbc driver: 5.1.34

2) Please make sure that the timezone used in database is GMT

 

And please contact your MySQL team to seek some suggestions from them. This is more a MySQL problem than a Sqoop problem.

 

 

View solution in original post

8 REPLIES 8

avatar
Rising Star

@sim6

Sorry for the delay in response!

 

Based on below 2 parameters:

--driver com.mysql.jdbc.Driver --connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToN

It looks like that you are importing data from MySQL instead of Oracle. As a result, the workaround mentioned in another thread (-D oracle.sessionTimeZone=GMT) won't work, because it is actaully an Oracle database parameter. Sqoop just passes it to Oracle JDBC driver.

 

I am not very familiar with MySQL JDBC driver, but according to its document[1], maybe we can achieve your goal with JDBC parameter "useTimezone".

 

To use GMT time, please try below command:

sqoop job --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop \
--create JOB_NAME -- import --driver com.mysql.jdbc.Driver \
--connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToull&useTimezone=false \
--username root --password 'PASSWORD' --table TABLE_NAME \
--incremental lastmodified --check-column updated_at --last-value 0 \
--merge-key entity_id --split-by entity_id \
--target-dir LOCATION_SPECIFIED --hive-database Magento \
--hive-drop-import-delims --null-string '\\N' \
--null-non-string '\\N' --fields-terminated-by '\001' \
--input-null-string '\\N' --input-null-non-string '\\N' \
--input-null-non-string '\\N' --input-fields-terminated-by '\001'

To use your local time, please try below command:

sqoop job --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop \
--create JOB_NAME -- import --driver com.mysql.jdbc.Driver \
--connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToull&useTimezone=true&serverTimezone=GMT \
--username root --password 'PASSWORD' --table TABLE_NAME \
--incremental lastmodified --check-column updated_at --last-value 0 \
--merge-key entity_id --split-by entity_id \
--target-dir LOCATION_SPECIFIED --hive-database Magento \
--hive-drop-import-delims --null-string '\\N' \
--null-non-string '\\N' --fields-terminated-by '\001' \
--input-null-string '\\N' --input-null-non-string '\\N' \
--input-null-non-string '\\N' --input-fields-terminated-by '\001'

 

Notice:

1) Please use the latest MySQL jdbc driver: 5.1.34

2) Please make sure that the timezone used in database is GMT

 

And please contact your MySQL team to seek some suggestions from them. This is more a MySQL problem than a Sqoop problem.

 

 

avatar
Expert Contributor

@yshi : Hi, Thanks for your response but somehow using & multiple times in the connection string always give me trouble in sqoop commands. 

 

It isn't able to parse the connection string that way and ends up giving socket connection error while creating the job only.

avatar
Mentor
Using a single quote around the value will help it get evaluated properly
in shell. The & is otherwise taken as a token to fork the process. An ex.
of quoting in shell:


--connect 'jdbc:mysql://IP/DB?zeroDateTimeBehavior=
convertToull&useTimezone=true&serverTimezone=GMT' \



avatar
Expert Contributor

@Harsh J: I thought so too but that still gives the error exception creating sql connection

avatar
Expert Contributor

Awesome 🙂 Thank worked. 

 

Just let me know if I can help you with a review may be? I would love to do that. Thanks much.

 

@yshi

avatar
Rising Star

@Harsh J

Thanks! I made a mistake with the command -- forgot the single quote  

 

@sim6

Thanks for your offer! I would like to hear a review from you! Would you please let me know your availability?

avatar
Expert Contributor

Oh, That sounds cool. I am available post 7:00 p.m IST. Thanks

avatar
Rising Star

@sim6

Sorry, I cannot make that time. Could you please paste the information here?