Created on 04-10-2017 08:05 PM - edited 09-16-2022 04:26 AM
Hi All,
I am using sqoop 1.4.6-chd5.10.0 to transfer data between hive and MySQL. The sqoop-import job was succeed, but the sqoop-export job was failed with the error "Can't parse input data".
The main reason I think is that there are some special characters(i.e. ^@) in the HDFS plain file which was imported by sqoop-import as shown in below.
The sqoop import command I used is :
sqoop import --connect jdbc:mysql://******:3306/smart_voice_data --table sqoop_import_test --username ****** --password ****** --verbose --hive-database hive_smart_voice_data --hive-table hive_sqoop_import_test --lines-terminated-by "\n" --hive-drop-import-delims --split-by id --null-non-string NULL --null-string NULL --fields-terminated-by "|" --delete-target-dir --hive-overwrite
The sqoop export command I used is :
sqoop export --connect "jdbc:mysql://******:3306/smart_voice_data?useUnicode=true&characterEncoding=gbk" --username ****** --password ****** --table sqoop_hi_filog_goi_vidaa4_normalized --export-dir hdfs://cdh5node1:8020/user/hive/warehouse/hive_smart_voice_data.db/hive_sqoop_import_test/ --input-null-string "\\\\N" --input-null-non-string "\\\\N" --fields-terminated-by '|' --input-lines-terminated-by '\n'
Questions:
1 、Is there any way to escape the special character ^@ (\000) when using sqoop to import data into HDFS?
2、 Is there any way to skip the special character ^@(\000) when using sqoop to export data to mysql , or is there any solution to make sqoop-export can parse the special character ^@ ?
3、Is there any way to skip the error and continue the remaining job when doing sqoop-export ?
Any help is really appreciated , thanks a lot !
Pengcheng
Created 04-10-2017 11:05 PM
Hi
You can do two things .
1 . You can perform import using certain column using --column skipping those coloumn with special character
--columns <col,col,col…> Columns to import from table
2. Would you consider runining a shell script to remove the special characters before runining sqoop export .
Created 04-11-2017 07:09 PM
@csguna, Thanks for your reply.
1. the column with special character is needed in the production environment, and only some of rows in the column contain the special character. So the whole cloumn cannot be skipped.
2. I also thought about that, but the tables are imported into HDFS frequently, so every time when the tables get updated, I need to run the shell script to remove the special character.
And the 'special character' may not be able to be predicted, for example, this time it is ^@ , next time, it may become ^A or ^H or something else, as a result, it's hard to use the shell script to remove all special character
Created 04-11-2017 07:02 AM
I would recommend you to use
--query 'select col1, string_function(col2), etc from sqoop_hi_filog_goi_vidaa4_normalized where $CONDITIONS'
instead of
--table sqoop_hi_filog_goi_vidaa4_normalized
Note:
1. String_function is not an actual function name. You can use the applicable function to remove unwanted characters
2. Your query will end with "where $CONDITIONS" by default and this is syntax
Created 04-11-2017 06:32 PM
@saranvisa, Thanks for your reply, and it's really a good idea to use the 'string_function' to filter the special character.
but from my understanding, the applicable string_functions you mentioned are MySQL string functions, right? Is it possible that we write a customize function for specific case?
Created 04-11-2017 07:17 PM
In fact I was referring to Hive funtions . In the mean time I understand your point that either source or target will be MySQL, but I believe the priority goes to the environment where the command get executed (need to double check)
Yes you can customize the functions (UDF).
But I would say ignore the complex part now, just start with basic funtions and mostly the string functions are common accorss the DBs.. you will get idea over the period.
https://www.tutorialspoint.com/hive/hive_built_in_functions.htm
Created 04-11-2017 08:32 PM
@saranvisa, Yes, either source and target are MySQL.
I will try some basic functions , It's really kind of you !
But during my investigation, I found there is an extra argument called --ctrl-chars , but it seems it is only applicable for Netezza , do you have any ideas if this extra argument can also be applied to MySQL ?
http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_extra_arguments_3
thanks again for your help.
Created 04-12-2017 11:00 AM
I never used --ctrl-chars option before, but it has the following limitations
1. As you mentioned, Currently supported by Netezza Connector
2. You should have sqoop version 1.4.6
3. It will work with direct mode and you have to be very careful when using direct as it may bring down source db depends upon your usage
Created 01-22-2018 12:37 AM
I am having the same issue Can't parse input data: ' 460V'.
i tis because of the space in the value. the actual value is 'ABCD30LYZ02 460V'
Created 04-16-2018 10:04 PM
Hi,
Did you get around the issue? We are also facing the similar issue