Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to handle delimiters, if they are part of the data, while exporting the data from the hive table to a flat file.

avatar
Expert Contributor

In hive, One of my column data contains Pipe as the part of the data('|'), however, while exporting data from this table, we need to export the data as the pipe('|') as the delimiter between each fields, How to handle if the delimiters as part of the data while creating the flat file from the hive table.

4 REPLIES 4

avatar

Can you enclose your data fields in double quotes, "", when exporting to HDFS?

For example:

"Field1|Test"|"Field2|Test"|"Field3|Test"

Notice how I have pipe (|) as the delimiter and it is also part of the column data, but my columns are enclosed in double quotes.

Then use the OpenCSV Serde when creating the DDL for your Hive table.

For example, if I wanted to create a Hive table for the sample record above, and the file is already in a directory named 'test' in HDFS, then my DDL statement would be:

create external table my_table (

field1 string,

field2 string,

field3 string)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES (

"separatorChar" = "|",

"quoteChar" = "\""

)

LOCATION '/test/';

Now you can query the table and Hive will correctly display the data.

avatar
Expert Contributor

Thanks and Yah .. Open Csv serde will do it.However, i am looking if there any other alternatives.

avatar
Expert Contributor

Could you let me know if the data is not in Quto's, How to Handle it and below is the example

column 1| column 2

first|second|last

In the above example the first|second are actually one column.

Could you let me know how to handle if the data is not in quoto's and if the delimiter is part of the data.

Any suggestion or help is appreciated.

avatar

Hi Reddy.. Choose a delimiter which will not used easily in a data. Choose unicode as delimiter it will solve your issue. 90% of the data will not contain unicode. (row format delimited Fields terminated by '/u0001') . In your case export the the data with '/u0001' as delimiter and then insert into a hive table which has delimiter as '|'