Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to extract Hive query output where we have new line character in a column?

Highlighted

How to extract Hive query output where we have new line character in a column?

 
1 REPLY 1

Re: How to extract Hive query output where we have new line character in a column?

Rising Star

@Nilesh Shrimant Try to create table in parquet format , and set this config set hive.fetch.task.conversion=more;

https://issues.apache.org/jira/browse/HIVE-11785

hive> create table repo (lvalue int, charstring string) stored as parquet;
OK
Time taken: 0.34 seconds
hive> load data inpath '/tmp/repo/test.parquet' overwrite into table repo;
Loading data to table default.repo
chgrp: changing ownership of 'hdfs://nameservice1/user/hive/warehouse/repo/test.parquet': User does not belong to hive
Table default.repo stats: [numFiles=1, numRows=0, totalSize=610, rawDataSize=0]
OK
Time taken: 0.732 seconds
hive> set hive.fetch.task.conversion=more;
hive> select * from repo;

Option 2:

There is some info here: http://stackoverflow.com/questions/26339564/handling-newline-character-in-hive

Records in Hive are hard-coded to be terminated by the newline character (even though there is a LINES TERMINATED BY clause, it is not implemented).

  1. Write a custom InputFormat that uses a RecordReader that understands non-newline delimited records. Look at the code for LineReader/LineRecordReader and TextInputFormat.
  2. Use a format other than text/ASCII, like Parquet. I would recommend this regardless, as text is probably the worst format you can store data in anyway.
Don't have an account?
Coming from Hortonworks? Activate your account here