Support Questions

klksrinivas · ‎01-04-2017

While trying to import data from mysql into hdfs using delimiter '||' is not supporting and single '|' character is supported. Is there any way to perfom the same. we have data in the columns with | character so wants to go with '||' as delimiter

eberezitsky · ‎01-05-2017

@Krishna Srinivas

Sqoop doesn't support multiple-characters delimiters. You have to use single char.

I would suggest to use a character that is native to Hive text file format - ^A

#!/bin/bash

# ...
delim_char=$( printf "\x01" )

sqoop import ...  --input-fields-terminated-by ${delim_char}  ...
# ...

View solution in original post

eberezitsky · ‎01-05-2017

@Krishna Srinivas

Sqoop doesn't support multiple-characters delimiters. You have to use single char.

I would suggest to use a character that is native to Hive text file format - ^A

#!/bin/bash

# ...
delim_char=$( printf "\x01" )

sqoop import ...  --input-fields-terminated-by ${delim_char}  ...
# ...

klksrinivas · ‎01-06-2017

@Ed Berezitsky Thank you and currently we are using '\001' as the delimiter in place of '||'

vranganathan · ‎01-05-2017

If HDFS is just an intermediate destination before loading into hive, you can skip the step and directly load into Hive using the hcatalog-table option in sqoop which provides better fidelity of data and removes one step (and supports all Hive data types also)

Please see https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_sqoop_hcatalog_integration

eberezitsky · ‎01-05-2017

@Venkat Ranganathan,

Small correction: if you use hcatalog, but your table is still textfile format with "|" field delimiter, you'll still have the same issue. You, probably, mean to use HCat import with ORC formatted table - that will definitely work.

klksrinivas · ‎01-06-2017

We are not looking at HDFS as an intermediate storage as we will be processsing the files using SPARK SQL .

@Venkat Ranganathan

vranganathan · ‎01-06-2017

@Ed Berezitsky

>> Small correction: if you use hcatalog, but your table is still textfile format with "|" field delimiter, you'll still have the same issue

The output file field delimiters are only needed for HDFS imports. In the case of Hcatalog imports, you tell the text file format properties as part of the storage stanza and the defaults for hive will be used. Essentially, the default storage format should be ok to handle this. BTW, hcatalog import works with most storage formats, not just ORC

@Krishna Srinivas

You should be able to use a Hive table using Spark SQL also - but may be you have other requirements also. Glad to see that @Ed Berezitsky's solution worked for you

Cloudera Community

Support Questions

WARN sqoop.SqoopOptions: Character argument || has multiple characters; only the first will be used.