Support Questions

Find answers, ask questions, and share your expertise

How to extract Hive query output where we have new line character in a column?

avatar
Contributor
 
1 REPLY 1

avatar
Expert Contributor

@Nilesh Shrimant Try to create table in parquet format , and set this config set hive.fetch.task.conversion=more;

https://issues.apache.org/jira/browse/HIVE-11785

hive> create table repo (lvalue int, charstring string) stored as parquet;
OK
Time taken: 0.34 seconds
hive> load data inpath '/tmp/repo/test.parquet' overwrite into table repo;
Loading data to table default.repo
chgrp: changing ownership of 'hdfs://nameservice1/user/hive/warehouse/repo/test.parquet': User does not belong to hive
Table default.repo stats: [numFiles=1, numRows=0, totalSize=610, rawDataSize=0]
OK
Time taken: 0.732 seconds
hive> set hive.fetch.task.conversion=more;
hive> select * from repo;

Option 2:

There is some info here: http://stackoverflow.com/questions/26339564/handling-newline-character-in-hive

Records in Hive are hard-coded to be terminated by the newline character (even though there is a LINES TERMINATED BY clause, it is not implemented).

  1. Write a custom InputFormat that uses a RecordReader that understands non-newline delimited records. Look at the code for LineReader/LineRecordReader and TextInputFormat.
  2. Use a format other than text/ASCII, like Parquet. I would recommend this regardless, as text is probably the worst format you can store data in anyway.