Trying to give Unicode value (\u001c) as a delimiter in Hive/Beeline output file, but not able to write the data with the special character value as a delimiter into file.
In the output file, its taking the \ as a delimiter (i.e its considering the first character from '\u001c' as delimiter)
Below are the commands using.
What exactly the issue here? Is there any workaround to achieve this?
beeline -u "jdbc:hive2://master:10000/;principal=hive/master@DOMAIN.NET"
--silent=true --showHeader=false --outputformat=dsv --delimiterForDSV='\u001c' -e select * from emp; | hadoop fs -appendToFile - /<HDFS_Path>/data.dat
Note: If I give single character as delimiter (eg: ~) its working fine, but with multiple characters as delimiter...its not working as expected.
The support for non-ASCII / Unicode characters for field delimiter and confirmed that characters outside of the basic ASCII character set are not well-supported as field delimiters. You will need to reformat your input so that it uses one of the first 128 characters in the unicode list.
Characters from \0-\177 (http://asecuritysite.com/coding/asc2) should work well.
http://asecuritysite.com/coding/asc2 (second to last column - Oct).
Or, you could use custom serde MultiDelimitSerDe while creating the table.
Here I need to write the Hive table query result into a file with "\u001c" Or "\034" as a delimiter into HDFS using Beeline.
I could achieve the same using Hive CLI.
I am trying to execute the below query from Beeline, but in the output file all zunk characters are coming....is there any other way of achieving this in correct way?
From Beeline : Not working
beeline -u "jdbc:hive2://master:10000/;principal=hive/master@DOMAIN.NET" -e "INSERT OVERWRITE DIRECTORY '<HDFS_Path>/data.dat' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\034' select * from emp ;"
From Hive CLI : Working fine:
Hive> "INSERT OVERWRITE DIRECTORY '<HDFS_Path>/data.dat' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\034' select * from emp ;"