Created 01-04-2017 12:28 PM
While trying to import data from mysql into hdfs using delimiter '||' is not supporting and single '|' character is supported. Is there any way to perfom the same. we have data in the columns with | character so wants to go with '||' as delimiter
Created 01-05-2017 05:42 PM
Sqoop doesn't support multiple-characters delimiters. You have to use single char.
I would suggest to use a character that is native to Hive text file format - ^A
#!/bin/bash # ... delim_char=$( printf "\x01" ) sqoop import ... --input-fields-terminated-by ${delim_char} ... # ...
Created 01-05-2017 05:42 PM
Sqoop doesn't support multiple-characters delimiters. You have to use single char.
I would suggest to use a character that is native to Hive text file format - ^A
#!/bin/bash # ... delim_char=$( printf "\x01" ) sqoop import ... --input-fields-terminated-by ${delim_char} ... # ...
Created 01-06-2017 05:08 AM
@Ed Berezitsky Thank you and currently we are using '\001' as the delimiter in place of '||'
Created 01-05-2017 06:12 PM
If HDFS is just an intermediate destination before loading into hive, you can skip the step and directly load into Hive using the hcatalog-table option in sqoop which provides better fidelity of data and removes one step (and supports all Hive data types also)
Please see https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_sqoop_hcatalog_integration
Created 01-05-2017 07:56 PM
Small correction: if you use hcatalog, but your table is still textfile format with "|" field delimiter, you'll still have the same issue. You, probably, mean to use HCat import with ORC formatted table - that will definitely work.
Created 01-06-2017 05:07 AM
We are not looking at HDFS as an intermediate storage as we will be processsing the files using SPARK SQL .
Created 01-06-2017 07:17 AM
>> Small correction: if you use hcatalog, but your table is still textfile format with "|" field delimiter, you'll still have the same issue
The output file field delimiters are only needed for HDFS imports. In the case of Hcatalog imports, you tell the text file format properties as part of the storage stanza and the defaults for hive will be used. Essentially, the default storage format should be ok to handle this. BTW, hcatalog import works with most storage formats, not just ORC
@Krishna Srinivas
You should be able to use a Hive table using Spark SQL also - but may be you have other requirements also. Glad to see that @Ed Berezitsky's solution worked for you