Support Questions

Find answers, ask questions, and share your expertise

Sqoop split-by date wants to compare a timestamp with milliseconds to oracle date

avatar
Explorer

I'm trying to use SQOOP to pull data from ORACLE into HDFS while using multiple mappers.

The problem I'm running into is when SQOOP is creating the boundaries for the mappers it's generating them with milliseconds and then tries to compare that to a DATE field in ORACLE, which doesn't work and throws errors.

 

The query from a shell file:

 

sqoop import --connect jdbc:oracle:thin:{CONNECTION ADDRESS} \
--username  {USERNAME} \
--password  {PASSWORD} \
--verbose \
--table {ORACLE TABLE} \
--where "orig_dt >= to_date('2015-12-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and orig_dt < to_date('2015-12-01 01:00:00', 'YYYY-MM-DD HH24:MI:SS')" \
-z \
--compression-codec org.apache.hadoop.io.compress.SnappyCodec \
--as-parquetfile \
--target-dir {HDFS LOCATION} \
--split-by ORIG_DT \
-m 2

 

The where clause that SQOOP generates for the first mapper:

 

WHERE ( orig_dt >= to_date('2015-12-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and orig_dt < to_date('2015-12-01 01:00:00', 'YYYY-MM-DD HH24:MI:SS') ) AND ( ORIG_DT >= '2015-12-01 00:00:00.0' ) AND ( ORIG_DT < '2015-12-01 00:29:59.5' )

The error I'm receiving from this is:

ORA-01861: literal does not match format string

 

After doing some research I found this SQOOP issue:

https://issues.apache.org/jira/browse/SQOOP-1946

Which seems to match the problem I'm having.

 

I've implemented the workaround they recommend but it now generates a different error:

ORA-01830: date format picture ends before converting entire input string

 

This new error seems to be caused by the milliseconds that SQOOP keeps generating with the splits because I've tried running the where clause that SQOOP has generated against the ORACLE server directly and it throws the ORA-01830 error with milliseconds BUT runs properly when I take off the milliseconds.

(Side note: Before adding the workaround from SQOOP-1946 issue if I was to run the Where clause SQOOP generated directly against the ORACLE server without milliseconds it would still throw the ORA-01861 error)

 

So the main question is this: Is there any way to prevent SQOOP from using milliseconds when generating splits for a date column when moving data from ORACLE to HDFS?

OR is there some other way to solve this problem?

 

 

 

It may be worth noting that this query worked fine when I was on CDH5.4.8 and SQOOP 1.4.5 but now after upgrading to CDH 5.5.1 and SQOOP 1.4.6 these errors are getting thrown.

Also, this query does work with a single mapper but using multiple mappers throw these errors.

6 REPLIES 6

avatar
New Contributor

same issue on CHD 5.5.1

avatar
New Contributor

We had a similar problem and where we need to split by modified date and we got it working as below

 

--split-by "to_number(to_char(modified_date, 'YYYYMMDDHHMISS'))"

  

avatar
Explorer

@SathisKumar I tried the same idea but getting 

 

ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: java.sql.SQLException: Invalid SQL type: sqlKind = UNINITIALIZED

 

Kindly suggest how you maneged to solve this. 

avatar
New Contributor

Can you provide the sqoop command and full error message?

avatar
Explorer

Hi guys i am trying to use the below command i get column not found error on the --split-by 

 

sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--connect jdbc:mysql://host/jslice \
--username=*** --password *** --table orders --fields-terminated-by '|' \
--lines-terminated-by '\n' --null-non-string "\\\\N" --null-string "\\\\N" --escaped-by '\' \
--optionally-enclosed-by '\"' --map-column-java dwh_last_modified=String --hive-drop-import-delims \
--as-parquetfile -m 16 --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --delete-target-dir \
--target-dir hdfs:///hive/warehouse/jslice/orders/text3/ --split-by 'cast(order_number as UNSIGNED)' 

 

Note I have tried the same with " "

cast(order_number as UNSIGNED)

that dint work either

 

When i use '`cast(order_number as UNSIGNED)`' with a back tick i get sql syntax error else with above command i get 

 

 

INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`cast(order_number as UNSIGNED)`), MAX(`cast(order_number as UNSIGNED)`) FROM `archive_orders`
18/07/06 20:50:25 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1530637372013_0116
18/07/06 20:50:25 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: java.sql.SQLSyntaxErrorException: (conn=472029) Unknown column 'cast(order_number as UNSIGNED)' in 'field list'
        at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:174)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:303)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320)

 

 

Caused by: java.sql.SQLSyntaxErrorException: (conn=472029) Unknown column 'cast(order_number as UNSIGNED)' in 'field list'
        at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.get(ExceptionMapper.java:177)
        at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.getException(ExceptionMapper.java:110)
        at org.mariadb.jdbc.MariaDbStatement.executeExceptionEpilogue(MariaDbStatement.java:228)
        at org.mariadb.jdbc.MariaDbStatement.executeInternal(MariaDbStatement.java:334)
        at org.mariadb.jdbc.MariaDbStatement.executeQuery(MariaDbStatement.java:505)
        at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:149)
        ... 23 more
Caused by: java.sql.SQLException: Unknown column 'cast(order_number as UNSIGNED)' in 'field list'
Query is: SELECT MIN(`cast(order_number as UNSIGNED)`), MAX(`cast(order_number as UNSIGNED)`) FROM `archive_orders`
        at org.mariadb.jdbc.internal.util.LogQueryTool.exceptionWithQuery(LogQueryTool.java:119)

Any help is appreciated @SathisKumar

 

avatar
New Contributor

I had found the cause. This issue occurred when using --query, the ConnManager and InputFormat of ImportJobContext not set to Oracle ones. I had submitted a patch to SQOOP-1946 (https://issues.apache.org/jira/browse/SQOOP-1946) and hope Cloudera will include in next version of CDH.