Created on 02-08-2017 06:35 AM - edited 09-16-2022 04:02 AM
In hive, One of my column data contains Pipe as the part of the data('|'), however, while exporting data from this table, we need to export the data as the pipe('|') as the delimiter between each fields, How to handle if the delimiters as part of the data while creating the flat file from the hive table.
Created 02-08-2017 07:40 AM
Can you enclose your data fields in double quotes, "", when exporting to HDFS?
For example:
"Field1|Test"|"Field2|Test"|"Field3|Test"
Notice how I have pipe (|) as the delimiter and it is also part of the column data, but my columns are enclosed in double quotes.
Then use the OpenCSV Serde when creating the DDL for your Hive table.
For example, if I wanted to create a Hive table for the sample record above, and the file is already in a directory named 'test' in HDFS, then my DDL statement would be:
create external table my_table (
field1 string,
field2 string,
field3 string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES (
"separatorChar" = "|",
"quoteChar" = "\""
)
LOCATION '/test/';
Now you can query the table and Hive will correctly display the data.
Created 02-09-2017 03:03 AM
Thanks and Yah .. Open Csv serde will do it.However, i am looking if there any other alternatives.
Created 02-11-2017 12:21 AM
Could you let me know if the data is not in Quto's, How to Handle it and below is the example
column 1| column 2
first|second|last
In the above example the first|second are actually one column.
Could you let me know how to handle if the data is not in quoto's and if the delimiter is part of the data.
Any suggestion or help is appreciated.
Created 03-20-2017 07:14 PM
Hi Reddy.. Choose a delimiter which will not used easily in a data. Choose unicode as delimiter it will solve your issue. 90% of the data will not contain unicode. (row format delimited Fields terminated by '/u0001') . In your case export the the data with '/u0001' as delimiter and then insert into a hive table which has delimiter as '|'