- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
WARN sqoop.SqoopOptions: Character argument || has multiple characters; only the first will be used.
- Labels:
-
Apache Sqoop
Created ‎01-04-2017 12:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
While trying to import data from mysql into hdfs using delimiter '||' is not supporting and single '|' character is supported. Is there any way to perfom the same. we have data in the columns with | character so wants to go with '||' as delimiter
Created ‎01-05-2017 05:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sqoop doesn't support multiple-characters delimiters. You have to use single char.
I would suggest to use a character that is native to Hive text file format - ^A
#!/bin/bash # ... delim_char=$( printf "\x01" ) sqoop import ... --input-fields-terminated-by ${delim_char} ... # ...
Created ‎01-05-2017 05:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sqoop doesn't support multiple-characters delimiters. You have to use single char.
I would suggest to use a character that is native to Hive text file format - ^A
#!/bin/bash # ... delim_char=$( printf "\x01" ) sqoop import ... --input-fields-terminated-by ${delim_char} ... # ...
Created ‎01-06-2017 05:08 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Ed Berezitsky Thank you and currently we are using '\001' as the delimiter in place of '||'
Created ‎01-05-2017 06:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If HDFS is just an intermediate destination before loading into hive, you can skip the step and directly load into Hive using the hcatalog-table option in sqoop which provides better fidelity of data and removes one step (and supports all Hive data types also)
Please see https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_sqoop_hcatalog_integration
Created ‎01-05-2017 07:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Small correction: if you use hcatalog, but your table is still textfile format with "|" field delimiter, you'll still have the same issue. You, probably, mean to use HCat import with ORC formatted table - that will definitely work.
Created ‎01-06-2017 05:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are not looking at HDFS as an intermediate storage as we will be processsing the files using SPARK SQL .
Created ‎01-06-2017 07:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>> Small correction: if you use hcatalog, but your table is still textfile format with "|" field delimiter, you'll still have the same issue
The output file field delimiters are only needed for HDFS imports. In the case of Hcatalog imports, you tell the text file format properties as part of the storage stanza and the defaults for hive will be used. Essentially, the default storage format should be ok to handle this. BTW, hcatalog import works with most storage formats, not just ORC
@Krishna Srinivas
You should be able to use a Hive table using Spark SQL also - but may be you have other requirements also. Glad to see that @Ed Berezitsky's solution worked for you
