Member since
07-06-2018
10
Posts
0
Kudos Received
0
Solutions
07-11-2018
04:06 PM
good idea. The cast on split-by is still not working so the above will not work and its just not for 1 tables i am working on i will eventually have to implement the solution on 250 + tables. So doing an 1 time import with 1 mapper and doing an incremental load after would be a better solution. Anyway doing a dump everyday is not scalable
... View more
07-11-2018
02:09 PM
Its the PK that sqoop is duplicating and this only happens on tables which has millions of rows basically large tables more than 45GB of data.
... View more
07-09-2018
06:22 PM
No the Boundary query does work, but i figured that even though we do split by using a cast on a varchar column, once the split is identified sqoop internally does a sort of the column on split by which is a varchar according to Mysql and it brings in Duplicate on the target system. I found unless you use a integer column or do an incremental load (after load to staging) will not solve the issues of duplicate on target with Varchar datatype with your split-by column
... View more
07-09-2018
03:23 AM
Thanks for the reply, as i mentioned above all my columns are varchar on the source mysql tables and if i just mention those column name on the --split-by it brings me duplicate records. This is a new setup of sqoop jobs so this never worked earlier as well. The select stmt on the DB side does work fine.
... View more
07-09-2018
01:29 AM
For huge number of row the above options will cause duplicates in the results set.
... View more
07-08-2018
04:04 AM
I want to use the third option on the split-by but i am not seeing example of how to pass in that value. I have tried with single quote 'cast(id as unsigned)' and it throws error saying column not found on the sqoop run.
... View more
07-07-2018
03:12 PM
My sqoop command above does have that sqoop import-Dorg.apache.sqoop.splitter.allow_text_splitter=true \
... View more
07-06-2018
10:41 PM
We already have a post on this but the option given is not working and throwing my error and it was more than a year old previous post https://community.hortonworks.com/questions/146261/sql-function-in-split-by.html sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--connect jdbc:mysql://hostname:3306/jslice_orders \
--username=** -P --table archive_orders --fields-terminated-by '|' \
--lines-terminated-by '\n' --null-non-string "\\\\N" --null-string "\\\\N" --escaped-by '\' \
--optionally-enclosed-by '\"' --map-column-java dwh_last_modified=String --hive-drop-import-delims \
--as-parquetfile -m 16 --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --delete-target-dir \
--target-dir hdfs:///hive/warehouse/jslice_orders/archive_orders/text3/ --split-by 'cast(order_number as UNSIGNED)
fails with error saying column not found while executing the sqoop command INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT
MIN(`cast(order_number as UNSIGNED)`), MAX(`cast(order_number as
UNSIGNED)`) FROM `archive_orders`
ERROR tool.ImportTool: Encountered IOException running import job:
java.io.IOException: java.sql.SQLSyntaxErrorException: (conn=472029)
Unknown column 'cast(order_number as UNSIGNED)' in 'field list'
how can i get this to working since my PK on source tables are all in varchar and tables are off 50GB in size
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Sqoop
07-06-2018
02:01 PM
Hi guys i am trying to use the below command i get column not found error on the --split-by sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true \
--connect jdbc:mysql://host/jslice \
--username=*** --password *** --table orders --fields-terminated-by '|' \
--lines-terminated-by '\n' --null-non-string "\\\\N" --null-string "\\\\N" --escaped-by '\' \
--optionally-enclosed-by '\"' --map-column-java dwh_last_modified=String --hive-drop-import-delims \
--as-parquetfile -m 16 --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --delete-target-dir \
--target-dir hdfs:///hive/warehouse/jslice/orders/text3/ --split-by 'cast(order_number as UNSIGNED)' Note I have tried the same with " " cast(order_number as UNSIGNED) that dint work either When i use '`cast(order_number as UNSIGNED)`' with a back tick i get sql syntax error else with above command i get INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`cast(order_number as UNSIGNED)`), MAX(`cast(order_number as UNSIGNED)`) FROM `archive_orders`
18/07/06 20:50:25 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1530637372013_0116
18/07/06 20:50:25 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: java.sql.SQLSyntaxErrorException: (conn=472029) Unknown column 'cast(order_number as UNSIGNED)' in 'field list'
at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:174)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:303)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320) Caused by: java.sql.SQLSyntaxErrorException: (conn=472029) Unknown column 'cast(order_number as UNSIGNED)' in 'field list'
at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.get(ExceptionMapper.java:177)
at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.getException(ExceptionMapper.java:110)
at org.mariadb.jdbc.MariaDbStatement.executeExceptionEpilogue(MariaDbStatement.java:228)
at org.mariadb.jdbc.MariaDbStatement.executeInternal(MariaDbStatement.java:334)
at org.mariadb.jdbc.MariaDbStatement.executeQuery(MariaDbStatement.java:505)
at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:149)
... 23 more
Caused by: java.sql.SQLException: Unknown column 'cast(order_number as UNSIGNED)' in 'field list'
Query is: SELECT MIN(`cast(order_number as UNSIGNED)`), MAX(`cast(order_number as UNSIGNED)`) FROM `archive_orders`
at org.mariadb.jdbc.internal.util.LogQueryTool.exceptionWithQuery(LogQueryTool.java:119) Any help is appreciated @SathisKumar
... View more