Reply
Explorer
Posts: 14
Registered: ‎02-14-2017

Sqoop - Can't parse input data

[ Edited ]

Hi All, 

 

I am using sqoop 1.4.6-chd5.10.0 to transfer data between hive and MySQL. The sqoop-import job was succeed, but the sqoop-export job was failed with the error "Can't parse input data". 

error.PNG

 

The main reason I think is that there are some special characters(i.e. ^@) in the HDFS plain file which was imported by sqoop-import as shown in below.

part-00000.PNG

 

The sqoop import command I used is : 

sqoop import --connect jdbc:mysql://******:3306/smart_voice_data --table sqoop_import_test --username ****** --password ****** --verbose --hive-database hive_smart_voice_data --hive-table hive_sqoop_import_test --lines-terminated-by "\n" --hive-drop-import-delims --split-by id  --null-non-string NULL --null-string NULL --fields-terminated-by "|"  --delete-target-dir --hive-overwrite

The sqoop export command I used is : 

sqoop export --connect "jdbc:mysql://******:3306/smart_voice_data?useUnicode=true&characterEncoding=gbk" --username ****** --password ****** --table sqoop_hi_filog_goi_vidaa4_normalized --export-dir hdfs://cdh5node1:8020/user/hive/warehouse/hive_smart_voice_data.db/hive_sqoop_import_test/  --input-null-string "\\\\N" --input-null-non-string "\\\\N" --fields-terminated-by '|' --input-lines-terminated-by '\n'

Questions: 

1 、Is there any way to escape the special character ^@ (\000) when using sqoop to import data into HDFS?

2、 Is there any way to skip the special character ^@(\000) when using sqoop to export data to mysql , or is there any solution to make sqoop-export can parse the special character ^@ ?

3、Is there any way to skip the error and continue the remaining job when doing sqoop-export ?

 

Any help is really appreciated , thanks a lot ! 

 

Pengcheng

Champion
Posts: 543
Registered: ‎05-16-2016

Re: Sqoop - Can't parse input data

Hi 

 

You can do two things . 

 

1  . You can perform import using certain column using --column skipping those coloumn with special character 

--columns <col,col,col…>	Columns to import from table

 2.  Would you consider runining a shell script to remove the special characters before runining sqoop export . 

 

 

Posts: 362
Topics: 11
Kudos: 56
Solutions: 30
Registered: ‎09-02-2016

Re: Sqoop - Can't parse input data

@PengchengFang

 

I would recommend you to use


--query 'select col1, string_function(col2), etc from sqoop_hi_filog_goi_vidaa4_normalized where $CONDITIONS'

 

instead of


--table sqoop_hi_filog_goi_vidaa4_normalized

Note:
1. String_function is not an actual function name. You can use the applicable function to remove unwanted characters
2. Your query will end with "where $CONDITIONS" by default and this is syntax

Explorer
Posts: 14
Registered: ‎02-14-2017

Re: Sqoop - Can't parse input data

@saranvisa, Thanks for your reply, and it's really a good idea to use the 'string_function' to filter the special character. 

but from my understanding, the applicable string_functions you mentioned are MySQL string functions, right?  Is it possible that we write a customize function for specific case?

Explorer
Posts: 14
Registered: ‎02-14-2017

Re: Sqoop - Can't parse input data

@csguna, Thanks for your reply. 

1. the column with special character is needed in the production environment, and only some of rows in the column contain the special character. So the whole cloumn cannot be skipped. 

 

2. I also thought about that, but the tables are imported into HDFS frequently, so every time when the tables get updated, I need to run the shell script to remove the special character.  

And the 'special character' may not be able to be predicted, for example, this time it is ^@ , next time, it may become ^A or ^H or something else, as a result, it's hard to use the shell script to remove all special character

Posts: 362
Topics: 11
Kudos: 56
Solutions: 30
Registered: ‎09-02-2016

Re: Sqoop - Can't parse input data

@PengchengFang

 

In fact I was referring to Hive funtions . In the mean time I understand your point that either source or target will be MySQL, but I believe the priority goes to the environment where the command get executed (need to double check)

 

Yes you can customize the functions (UDF). 

 

But I would say ignore the complex part now, just start with basic funtions and mostly the string functions are common accorss the DBs.. you will get idea over the period.

 

https://www.tutorialspoint.com/hive/hive_built_in_functions.htm

Explorer
Posts: 14
Registered: ‎02-14-2017

Re: Sqoop - Can't parse input data

@saranvisa, Yes, either source and target are MySQL. 

 

I will try some basic functions , It's really kind of you ! 

 

But during my investigation, I found there is an extra argument called --ctrl-chars , but it seems it is only applicable for Netezza , do you have any ideas if this extra argument can also be applied to MySQL ? 

 

http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_extra_arguments_3

 

thanks again for your help.

Highlighted
Posts: 362
Topics: 11
Kudos: 56
Solutions: 30
Registered: ‎09-02-2016

Re: Sqoop - Can't parse input data

@PengchengFang

 

I never used --ctrl-chars option before, but it has the following limitations

1. As you mentioned, Currently supported by Netezza Connector 

2. You should have sqoop version 1.4.6

3. It will work with direct mode and you have to be very careful when using direct as it may bring down source db depends upon your usage

Announcements