Support Questions

Find answers, ask questions, and share your expertise

WARN sqoop.SqoopOptions: Character argument || has multiple characters; only the first will be used.

avatar
Super Collaborator

While trying to import data from mysql into hdfs using delimiter '||' is not supporting and single '|' character is supported. Is there any way to perfom the same. we have data in the columns with | character so wants to go with '||' as delimiter

1 ACCEPTED SOLUTION

avatar
Super Collaborator

@Krishna Srinivas

Sqoop doesn't support multiple-characters delimiters. You have to use single char.

I would suggest to use a character that is native to Hive text file format - ^A

#!/bin/bash

# ...
delim_char=$( printf "\x01" )

sqoop import ...  --input-fields-terminated-by ${delim_char}  ...
# ...

View solution in original post

6 REPLIES 6

avatar
Super Collaborator

@Krishna Srinivas

Sqoop doesn't support multiple-characters delimiters. You have to use single char.

I would suggest to use a character that is native to Hive text file format - ^A

#!/bin/bash

# ...
delim_char=$( printf "\x01" )

sqoop import ...  --input-fields-terminated-by ${delim_char}  ...
# ...

avatar
Super Collaborator

@Ed Berezitsky Thank you and currently we are using '\001' as the delimiter in place of '||'

avatar
Expert Contributor

If HDFS is just an intermediate destination before loading into hive, you can skip the step and directly load into Hive using the hcatalog-table option in sqoop which provides better fidelity of data and removes one step (and supports all Hive data types also)

Please see https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_sqoop_hcatalog_integration

avatar
Super Collaborator

@Venkat Ranganathan,

Small correction: if you use hcatalog, but your table is still textfile format with "|" field delimiter, you'll still have the same issue. You, probably, mean to use HCat import with ORC formatted table - that will definitely work.

avatar
Super Collaborator

We are not looking at HDFS as an intermediate storage as we will be processsing the files using SPARK SQL .

@Venkat Ranganathan

avatar
Expert Contributor

@Ed Berezitsky

>> Small correction: if you use hcatalog, but your table is still textfile format with "|" field delimiter, you'll still have the same issue

The output file field delimiters are only needed for HDFS imports. In the case of Hcatalog imports, you tell the text file format properties as part of the storage stanza and the defaults for hive will be used. Essentially, the default storage format should be ok to handle this. BTW, hcatalog import works with most storage formats, not just ORC

@Krishna Srinivas

You should be able to use a Hive table using Spark SQL also - but may be you have other requirements also. Glad to see that @Ed Berezitsky's solution worked for you