Support Questions
Find answers, ask questions, and share your expertise

Sqoop's metastore timezone


Sqoop's metastore timezone


I'm importing my data with sqoop's job with a workflow.
Mi timezone is Europe/Rome so UTC+2.
My source is a oracle's DB with timezone UTC.
I don't know what happens in sqoop's metastore, but i understand that my last-value is different, and i can't bring the spread.

From the log i have this configuration:

org.apache.sqoop.manager.OracleManager - Time zone has been set to GMT
Lower bound value: TO_TIMESTAMP('2014-07-31 17:50:48.0', 'YYYY-MM-DD HH24:MI:SS.FF')
Upper bound value: TO_TIMESTAMP('2014-07-31 17:53:04.0', 'YYYY-MM-DD HH24:MI:SS.FF')


These are my tries:
1)settings in properties:
name: oracle.sessionTimeZone
value: Europe/Rome

2)settings in string-connession:

And other tries, but now i want understand how sqoop's work, and don't try to guess.


Re: Sqoop's metastore timezone


By guide of sqoop:


Oracle also includes the additional date/time types TIMESTAMP WITH TIMEZONE and TIMESTAMP WITH LOCAL TIMEZONE. To support these types, the user’s session timezone must be specified. By default, Sqoop will specify the timezone "GMT" to Oracle. You can override this setting by specifying a Hadoop property oracle.sessionTimeZone on the command-line when running a Sqoop job. For example:

$ sqoop import -D oracle.sessionTimeZone=America/Los_Angeles \

 Ii suppose when i put this code into my string:

-D oracle.sessionTimeZone=Europe/Rome


Automatically in the metastore sqoop saved the last-value data in UTC time or another settings with timezone (in this example utc+2)
But i dont fix this problem, is right my opinion?

Re: Sqoop's metastore timezone



I am also working on the same problem. I am using CDH 5.0.2, with Sqoop 1.4.4-cdh5.0.2, I have to export an Oracle table in incremental way on a timestamp column.


Looking at the source code of Sqoop, it seems that to calculate the upper bound (lower bound at next iteration) for the query, the program gets the current time in millisecond (getCurrentDbTimestamp() in org.apache.sqoop.manager.ConnManager) and creates a java.sql.Timestamp object from it.Then, org.apache.sqoop.tool.ImportTool calls datetimeToQueryString(String datetime, int columnType) of org.apache.sqoop.manager.OracleManager passing the timestamp as a string, and get back the sql filter to put in the final query. 


The problem seems to be that the toString() method of a java.sql.Timestamp object consider the user.timezone of the JVM (Europe/Rome in my case), so I get a String representing a date that is in the future if considered as GMT. Once the query is evaluated from the database I then get no row back (I look for updates every 15 minutes, Europe/Rome is GMT+2 now).


A possible solution is to change the user.timezone to GMT for the JVM executing the Sqoop command. However, I have not found a way to do it (something like -Duser.timezone=GMT should work). Anyone has some hint on how to do it? I tried to set the parameter on the Sqoop command and also in Cloudera Manager but without results.


Also, in the future I will launch the Sqoop import from Oozie, so I also need to understand how to pass the same parameter when using Oozie.





Re: Sqoop's metastore timezone

Expert Contributor

Did you find a solution? I am facing the exact same issue. It should be a fairly simple thing to do but I could not find a solution to this. Setting -Duser.timezone to GMT did not work either,