Created on 01-19-2016 10:02 AM - edited 09-16-2022 02:58 AM
I'm trying to use SQOOP to pull data from ORACLE into HDFS while using multiple mappers.
The problem I'm running into is when SQOOP is creating the boundaries for the mappers it's generating them with milliseconds and then tries to compare that to a DATE field in ORACLE, which doesn't work and throws errors.
The query from a shell file:
sqoop import --connect jdbc:oracle:thin:{CONNECTION ADDRESS} \
--username {USERNAME} \
--password {PASSWORD} \
--verbose \
--table {ORACLE TABLE} \
--where "orig_dt >= to_date('2015-12-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and orig_dt < to_date('2015-12-01 01:00:00', 'YYYY-MM-DD HH24:MI:SS')" \
-z \
--compression-codec org.apache.hadoop.io.compress.SnappyCodec \
--as-parquetfile \
--target-dir {HDFS LOCATION} \
--split-by ORIG_DT \
-m 2
The where clause that SQOOP generates for the first mapper:
WHERE ( orig_dt >= to_date('2015-12-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and orig_dt < to_date('2015-12-01 01:00:00', 'YYYY-MM-DD HH24:MI:SS') ) AND ( ORIG_DT >= '2015-12-01 00:00:00.0' ) AND ( ORIG_DT < '2015-12-01 00:29:59.5' )
The error I'm receiving from this is:
ORA-01861: literal does not match format string
After doing some research I found this SQOOP issue:
https://issues.apache.org/jira/browse/SQOOP-1946
Which seems to match the problem I'm having.
I've implemented the workaround they recommend but it now generates a different error:
ORA-01830: date format picture ends before converting entire input string
This new error seems to be caused by the milliseconds that SQOOP keeps generating with the splits because I've tried running the where clause that SQOOP has generated against the ORACLE server directly and it throws the ORA-01830 error with milliseconds BUT runs properly when I take off the milliseconds.
(Side note: Before adding the workaround from SQOOP-1946 issue if I was to run the Where clause SQOOP generated directly against the ORACLE server without milliseconds it would still throw the ORA-01861 error)
So the main question is this: Is there any way to prevent SQOOP from using milliseconds when generating splits for a date column when moving data from ORACLE to HDFS?
OR is there some other way to solve this problem?
It may be worth noting that this query worked fine when I was on CDH5.4.8 and SQOOP 1.4.5 but now after upgrading to CDH 5.5.1 and SQOOP 1.4.6 these errors are getting thrown.
Also, this query does work with a single mapper but using multiple mappers throw these errors.
Created 02-01-2016 02:20 PM
same issue on CHD 5.5.1
Created 04-22-2016 05:17 PM
We had a similar problem and where we need to split by modified date and we got it working as below
--split-by "to_number(to_char(modified_date, 'YYYYMMDDHHMISS'))"
Created 05-24-2017 10:00 PM
@SathisKumar I tried the same idea but getting
ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: java.sql.SQLException: Invalid SQL type: sqlKind = UNINITIALIZED
Kindly suggest how you maneged to solve this.
Created on 05-24-2017 10:25 PM - edited 05-24-2017 10:26 PM
Can you provide the sqoop command and full error message?
Created 07-06-2018 02:01 PM
Hi guys i am trying to use the below command i get column not found error on the --split-by
sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true \ --connect jdbc:mysql://host/jslice \ --username=*** --password *** --table orders --fields-terminated-by '|' \ --lines-terminated-by '\n' --null-non-string "\\\\N" --null-string "\\\\N" --escaped-by '\' \ --optionally-enclosed-by '\"' --map-column-java dwh_last_modified=String --hive-drop-import-delims \ --as-parquetfile -m 16 --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --delete-target-dir \ --target-dir hdfs:///hive/warehouse/jslice/orders/text3/ --split-by 'cast(order_number as UNSIGNED)'
Note I have tried the same with " "
cast(order_number as UNSIGNED)
that dint work either
When i use '`cast(order_number as UNSIGNED)`' with a back tick i get sql syntax error else with above command i get
INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`cast(order_number as UNSIGNED)`), MAX(`cast(order_number as UNSIGNED)`) FROM `archive_orders` 18/07/06 20:50:25 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1530637372013_0116 18/07/06 20:50:25 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: java.sql.SQLSyntaxErrorException: (conn=472029) Unknown column 'cast(order_number as UNSIGNED)' in 'field list' at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:174) at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:303) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320)
Caused by: java.sql.SQLSyntaxErrorException: (conn=472029) Unknown column 'cast(order_number as UNSIGNED)' in 'field list' at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.get(ExceptionMapper.java:177) at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.getException(ExceptionMapper.java:110) at org.mariadb.jdbc.MariaDbStatement.executeExceptionEpilogue(MariaDbStatement.java:228) at org.mariadb.jdbc.MariaDbStatement.executeInternal(MariaDbStatement.java:334) at org.mariadb.jdbc.MariaDbStatement.executeQuery(MariaDbStatement.java:505) at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:149) ... 23 more Caused by: java.sql.SQLException: Unknown column 'cast(order_number as UNSIGNED)' in 'field list' Query is: SELECT MIN(`cast(order_number as UNSIGNED)`), MAX(`cast(order_number as UNSIGNED)`) FROM `archive_orders` at org.mariadb.jdbc.internal.util.LogQueryTool.exceptionWithQuery(LogQueryTool.java:119)
Any help is appreciated @SathisKumar
Created 01-31-2019 05:48 AM
I had found the cause. This issue occurred when using --query, the ConnManager and InputFormat of ImportJobContext not set to Oracle ones. I had submitted a patch to SQOOP-1946 (https://issues.apache.org/jira/browse/SQOOP-1946) and hope Cloudera will include in next version of CDH.