Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to extract Hive query output where we have new line character in a column?

avatar
Contributor
 
1 REPLY 1

avatar
Expert Contributor

@Nilesh Shrimant Try to create table in parquet format , and set this config set hive.fetch.task.conversion=more;

https://issues.apache.org/jira/browse/HIVE-11785

hive> create table repo (lvalue int, charstring string) stored as parquet;
OK
Time taken: 0.34 seconds
hive> load data inpath '/tmp/repo/test.parquet' overwrite into table repo;
Loading data to table default.repo
chgrp: changing ownership of 'hdfs://nameservice1/user/hive/warehouse/repo/test.parquet': User does not belong to hive
Table default.repo stats: [numFiles=1, numRows=0, totalSize=610, rawDataSize=0]
OK
Time taken: 0.732 seconds
hive> set hive.fetch.task.conversion=more;
hive> select * from repo;

Option 2:

There is some info here: http://stackoverflow.com/questions/26339564/handling-newline-character-in-hive

Records in Hive are hard-coded to be terminated by the newline character (even though there is a LINES TERMINATED BY clause, it is not implemented).

  1. Write a custom InputFormat that uses a RecordReader that understands non-newline delimited records. Look at the code for LineReader/LineRecordReader and TextInputFormat.
  2. Use a format other than text/ASCII, like Parquet. I would recommend this regardless, as text is probably the worst format you can store data in anyway.