Support Questions

Find answers, ask questions, and share your expertise

Write Hive/Beeline output with Unicode delimiter into file.

avatar
New Contributor

Hi,

Trying to give Unicode value (\u001c) as a delimiter in Hive/Beeline output file, but not able to write the data with the special character value as a delimiter into file.

In the output file, its taking the \ as a delimiter (i.e its considering the first character from '\u001c' as delimiter)

Below are the commands using.

What exactly the issue here? Is there any workaround to achieve this?

Command:

beeline -u "jdbc:hive2://master:10000/;principal=hive/master@DOMAIN.NET"

--silent=true --showHeader=false --outputformat=dsv --delimiterForDSV='\u001c' -e select * from emp; | hadoop fs -appendToFile - /<HDFS_Path>/data.dat

Note: If I give single character as delimiter (eg: ~) its working fine, but with multiple characters as delimiter...its not working as expected.

2 REPLIES 2

avatar
@sasidhar Kari

The support for non-ASCII / Unicode characters for field delimiter and confirmed that characters outside of the basic ASCII character set are not well-supported as field delimiters. You will need to reformat your input so that it uses one of the first 128 characters in the unicode list.

Characters from \0-\177 (http://asecuritysite.com/coding/asc2) should work well.

http://asecuritysite.com/coding/asc2 (second to last column - Oct).

Or, you could use custom serde MultiDelimitSerDe while creating the table.

avatar
New Contributor
Hi ,

Here I need to write the Hive table query result into a file with "\u001c" Or "\034" as a delimiter into HDFS using Beeline.

I could achieve the same using Hive CLI.

I am trying to execute the below query from Beeline, but in the output file all zunk characters are coming....is there any other way of achieving this in correct way?

From Beeline : Not working

beeline -u "jdbc:hive2://master:10000/;principal=hive/master@DOMAIN.NET" -e "INSERT OVERWRITE DIRECTORY '<HDFS_Path>/data.dat' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\034' select * from emp ;"

From Hive CLI : Working fine:

Hive> "INSERT OVERWRITE DIRECTORY '<HDFS_Path>/data.dat' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\034' select * from emp ;"