Created 04-25-2016 03:35 PM
Hi,
I am trying to store multiline character fields in hive table. I tried using csv serde but the data is been shows as multiple records.
Table description
CREATE EXTERNAL TABLE `serde_test1`(
`num` string COMMENT 'from deserializer',
`name` string COMMENT 'from deserializer')
ROW FORMAT SERDE
'com.bizo.hive.serde.csv.CSVSerde'
WITH SERDEPROPERTIES (
'quoteChar'='\"',
'separatorChar'=',')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 'HDFS location';
Data that is stored in hive:
"300","test newline add testings"
If I select from hive table it shows as NULLs.
serde_test1.num serde_test1.name 300 (null) (null) (null)
Anybody has any idea on how to store multi line fields in hive table?
Created 04-25-2016 04:30 PM
You can have a different character and 'lines terminated by'. You can put a special character for that instead of \n which is the default.
Created 04-26-2016 02:16 PM
Thanks Ravi and Divakar for your response.
Right now my current field delimiter is Control A(\001) and I cannot use comma or tab because my data may contain these standard delimiters.
Is there any other delimiter I can use it for lines termination ?
Created 04-26-2016 02:32 PM
You can use any special character that is not part of your data. like '|' for a delimiter. But make sure, you get your raw data in that format. (like fields terminated by some special character (Control A which is default) and lines terminated by another special character) when you generate this raw data.
Another option is if you are using sqoop to import this data, you can explicitly drops delimiters that are part of the data. (--hive-drop-import-delims and --hive-delims-replacement)
Created 04-25-2016 04:37 PM
I hope this will help for you. ---row format delimited fields terminated by '\001'
Created 04-26-2016 02:32 PM
I have tried creating table with Lines terminated by as Control A to see how it works but it got failed with below error message.
Create stmt:
create table multiline_test ( num int, desc string ) row format delimited fields terminated by ',' lines terminated by '\001' location '/test/multiline_test';
Error :
Error while compiling statement: FAILED: SemanticException 8:20 LINES TERMINATED BY only supports newline '\n' right now. Error encountered near token ''\001''
Seems like hive only supports '\n' as Lines terminated though it says configurable in the DDLs. I did see an open HIVE JIRA issue for the same.
Created 09-06-2017 08:27 PM
@Aruna dadi How did you resolve this issue ?