About elkrish

elkrish · ‎07-11-2018

good idea. The cast on split-by is still not working so the above will not work and its just not for 1 tables i am working on i will eventually have to implement the solution on 250 + tables. So doing an 1 time import with 1 mapper and doing an incremental load after would be a better solution. Anyway doing a dump everyday is not scalable

elkrish · ‎07-11-2018

Its the PK that sqoop is duplicating and this only happens on tables which has millions of rows basically large tables more than 45GB of data.

elkrish · ‎07-09-2018

No the Boundary query does work, but i figured that even though we do split by using a cast on a varchar column, once the split is identified sqoop internally does a sort of the column on split by which is a varchar according to Mysql and it brings in Duplicate on the target system. I found unless you use a integer column or do an incremental load (after load to staging) will not solve the issues of duplicate on target with Varchar datatype with your split-by column

elkrish · ‎07-09-2018

@vmurakami please see my reply above

elkrish · ‎07-09-2018

Thanks for the reply, as i mentioned above all my columns are varchar on the source mysql tables and if i just mention those column name on the --split-by it brings me duplicate records. This is a new setup of sqoop jobs so this never worked earlier as well. The select stmt on the DB side does work fine.

elkrish · ‎07-09-2018

For huge number of row the above options will cause duplicates in the results set.

elkrish · ‎07-08-2018

I want to use the third option on the split-by but i am not seeing example of how to pass in that value. I have tried with single quote 'cast(id as unsigned)' and it throws error saying column not found on the sqoop run.

elkrish · ‎07-07-2018

My sqoop command above does have that sqoop import-Dorg.apache.sqoop.splitter.allow_text_splitter=true \

elkrish · ‎07-06-2018

We already have a post on this but the option given is not working and throwing my error and it was more than a year old previous post https://community.hortonworks.com/questions/146261/sql-function-in-split-by.html sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true \ --connect jdbc:mysql://hostname:3306/jslice_orders \ --username=** -P --table archive_orders --fields-terminated-by '|' \ --lines-terminated-by '\n' --null-non-string "\\\\N" --null-string "\\\\N" --escaped-by '\' \ --optionally-enclosed-by '\"' --map-column-java dwh_last_modified=String --hive-drop-import-delims \ --as-parquetfile -m 16 --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --delete-target-dir \ --target-dir hdfs:///hive/warehouse/jslice_orders/archive_orders/text3/ --split-by 'cast(order_number as UNSIGNED) fails with error saying column not found while executing the sqoop command INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`cast(order_number as UNSIGNED)`), MAX(`cast(order_number as UNSIGNED)`) FROM `archive_orders` ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: java.sql.SQLSyntaxErrorException: (conn=472029) Unknown column 'cast(order_number as UNSIGNED)' in 'field list' how can i get this to working since my PK on source tables are all in varchar and tables are off 50GB in size

elkrish · ‎07-06-2018

Hi guys i am trying to use the below command i get column not found error on the --split-by sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true \ --connect jdbc:mysql://host/jslice \ --username=*** --password *** --table orders --fields-terminated-by '|' \ --lines-terminated-by '\n' --null-non-string "\\\\N" --null-string "\\\\N" --escaped-by '\' \ --optionally-enclosed-by '\"' --map-column-java dwh_last_modified=String --hive-drop-import-delims \ --as-parquetfile -m 16 --compress --compression-codec org.apache.hadoop.io.compress.SnappyCodec --delete-target-dir \ --target-dir hdfs:///hive/warehouse/jslice/orders/text3/ --split-by 'cast(order_number as UNSIGNED)' Note I have tried the same with " " cast(order_number as UNSIGNED) that dint work either When i use '`cast(order_number as UNSIGNED)`' with a back tick i get sql syntax error else with above command i get INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`cast(order_number as UNSIGNED)`), MAX(`cast(order_number as UNSIGNED)`) FROM `archive_orders` 18/07/06 20:50:25 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1530637372013_0116 18/07/06 20:50:25 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: java.sql.SQLSyntaxErrorException: (conn=472029) Unknown column 'cast(order_number as UNSIGNED)' in 'field list' at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:174) at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:303) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320) Caused by: java.sql.SQLSyntaxErrorException: (conn=472029) Unknown column 'cast(order_number as UNSIGNED)' in 'field list' at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.get(ExceptionMapper.java:177) at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.getException(ExceptionMapper.java:110) at org.mariadb.jdbc.MariaDbStatement.executeExceptionEpilogue(MariaDbStatement.java:228) at org.mariadb.jdbc.MariaDbStatement.executeInternal(MariaDbStatement.java:334) at org.mariadb.jdbc.MariaDbStatement.executeQuery(MariaDbStatement.java:505) at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:149) ... 23 more Caused by: java.sql.SQLException: Unknown column 'cast(order_number as UNSIGNED)' in 'field list' Query is: SELECT MIN(`cast(order_number as UNSIGNED)`), MAX(`cast(order_number as UNSIGNED)`) FROM `archive_orders` at org.mariadb.jdbc.internal.util.LogQueryTool.exceptionWithQuery(LogQueryTool.java:119) Any help is appreciated @SathisKumar

Online	Offline
Last Visited	‎07-18-2018 03:02 PM

Member Since	‎07-06-2018 01:55 PM
Last Visited	‎07-18-2018 03:02 PM
Posts	10

Cloudera Community

Re: Sqoop Import --split-by with sql function

Re: Sqoop Import --split-by with sql function

Re: Sqoop Import --split-by with sql function

Re: Sqoop Import --split-by with sql function

Re: Sqoop Import --split-by with sql function

Re: Sqoop --split-by on a string /varchar column

Re: How do I split columns on table that has varch...

Re: Sqoop Import --split-by with sql function

Sqoop Import --split-by with sql function

Re: Sqoop split-by date wants to compare a timesta...