I am trying to store multiline character fields in hive table. I tried using csv serde but the data is been shows as multiple records.
CREATE EXTERNAL TABLE `serde_test1`(
`num` string COMMENT 'from deserializer',
`name` string COMMENT 'from deserializer')
ROW FORMAT SERDE
WITH SERDEPROPERTIES (
STORED AS INPUTFORMAT
LOCATION 'HDFS location';
Data that is stored in hive:
"300","test newline add testings"
If I select from hive table it shows as NULLs.
serde_test1.num serde_test1.name 300 (null) (null) (null)
Anybody has any idea on how to store multi line fields in hive table?
Thanks Ravi and Divakar for your response.
Right now my current field delimiter is Control A(\001) and I cannot use comma or tab because my data may contain these standard delimiters.
Is there any other delimiter I can use it for lines termination ?
You can use any special character that is not part of your data. like '|' for a delimiter. But make sure, you get your raw data in that format. (like fields terminated by some special character (Control A which is default) and lines terminated by another special character) when you generate this raw data.
Another option is if you are using sqoop to import this data, you can explicitly drops delimiters that are part of the data. (--hive-drop-import-delims and --hive-delims-replacement)
I have tried creating table with Lines terminated by as Control A to see how it works but it got failed with below error message.
create table multiline_test ( num int, desc string ) row format delimited fields terminated by ',' lines terminated by '\001' location '/test/multiline_test';
Error while compiling statement: FAILED: SemanticException 8:20 LINES TERMINATED BY only supports newline '\n' right now. Error encountered near token ''\001''
Seems like hive only supports '\n' as Lines terminated though it says configurable in the DDLs. I did see an open HIVE JIRA issue for the same.