- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to handle delimiters, if they are part of the data, while exporting the data from the hive table to a flat file.
- Labels:
-
Apache Hive
Created on ‎02-08-2017 06:35 AM - edited ‎09-16-2022 04:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In hive, One of my column data contains Pipe as the part of the data('|'), however, while exporting data from this table, we need to export the data as the pipe('|') as the delimiter between each fields, How to handle if the delimiters as part of the data while creating the flat file from the hive table.
Created ‎02-08-2017 07:40 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you enclose your data fields in double quotes, "", when exporting to HDFS?
For example:
"Field1|Test"|"Field2|Test"|"Field3|Test"
Notice how I have pipe (|) as the delimiter and it is also part of the column data, but my columns are enclosed in double quotes.
Then use the OpenCSV Serde when creating the DDL for your Hive table.
For example, if I wanted to create a Hive table for the sample record above, and the file is already in a directory named 'test' in HDFS, then my DDL statement would be:
create external table my_table (
field1 string,
field2 string,
field3 string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES (
"separatorChar" = "|",
"quoteChar" = "\""
)
LOCATION '/test/';
Now you can query the table and Hive will correctly display the data.
Created ‎02-09-2017 03:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks and Yah .. Open Csv serde will do it.However, i am looking if there any other alternatives.
Created ‎02-11-2017 12:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you let me know if the data is not in Quto's, How to Handle it and below is the example
column 1| column 2
first|second|last
In the above example the first|second are actually one column.
Could you let me know how to handle if the data is not in quoto's and if the delimiter is part of the data.
Any suggestion or help is appreciated.
Created ‎03-20-2017 07:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Reddy.. Choose a delimiter which will not used easily in a data. Choose unicode as delimiter it will solve your issue. 90% of the data will not contain unicode. (row format delimited Fields terminated by '/u0001') . In your case export the the data with '/u0001' as delimiter and then insert into a hive table which has delimiter as '|'
