Support Questions

Find answers, ask questions, and share your expertise

Storing Multiline character fields in hive table​

avatar
Contributor

Hi,

I am trying to store multiline character fields in hive table. I tried using csv serde but the data is been shows as multiple records.

Table description

CREATE EXTERNAL TABLE `serde_test1`(

`num` string COMMENT 'from deserializer',

`name` string COMMENT 'from deserializer')

ROW FORMAT SERDE

'com.bizo.hive.serde.csv.CSVSerde'

WITH SERDEPROPERTIES (

'quoteChar'='\"',

'separatorChar'=',')

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION 'HDFS location';

Data that is stored in hive:

"300","test newline add testings"

If I select from hive table it shows as NULLs.

serde_test1.num serde_test1.name 300 (null) (null) (null)

Anybody has any idea on how to store multi line fields in hive table?

6 REPLIES 6

avatar
Guru

You can have a different character and 'lines terminated by'. You can put a special character for that instead of \n which is the default.

avatar
Contributor

Thanks Ravi and Divakar for your response.

Right now my current field delimiter is Control A(\001) and I cannot use comma or tab because my data may contain these standard delimiters.

Is there any other delimiter I can use it for lines termination ?

avatar
Guru

You can use any special character that is not part of your data. like '|' for a delimiter. But make sure, you get your raw data in that format. (like fields terminated by some special character (Control A which is default) and lines terminated by another special character) when you generate this raw data.

Another option is if you are using sqoop to import this data, you can explicitly drops delimiters that are part of the data. (--hive-drop-import-delims and --hive-delims-replacement)

avatar
@Aruna dadi

I hope this will help for you. ---row format delimited fields terminated by '\001'

avatar
Contributor

I have tried creating table with Lines terminated by as Control A to see how it works but it got failed with below error message.

Create stmt:

create table multiline_test ( num int, desc string ) row format delimited fields terminated by ',' lines terminated by '\001' location '/test/multiline_test';

Error :

Error while compiling statement: FAILED: SemanticException 8:20 LINES TERMINATED BY only supports newline '\n' right now. Error encountered near token ''\001''

Seems like hive only supports '\n' as Lines terminated though it says configurable in the DDLs. I did see an open HIVE JIRA issue for the same.

https://issues.apache.org/jira/browse/HIVE-11996

avatar

@Aruna dadi How did you resolve this issue ?