Options
- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to extract Hive query output where we have new line character in a column?
Labels:
- Labels:
-
Apache Hive
Contributor
Created ‎02-01-2017 07:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1 REPLY 1
Expert Contributor
Created ‎02-01-2017 10:11 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Nilesh Shrimant Try to create table in parquet format , and set this config set hive.fetch.task.conversion=more;
https://issues.apache.org/jira/browse/HIVE-11785
hive> create table repo (lvalue int, charstring string) stored as parquet; OK Time taken: 0.34 seconds hive> load data inpath '/tmp/repo/test.parquet' overwrite into table repo; Loading data to table default.repo chgrp: changing ownership of 'hdfs://nameservice1/user/hive/warehouse/repo/test.parquet': User does not belong to hive Table default.repo stats: [numFiles=1, numRows=0, totalSize=610, rawDataSize=0] OK Time taken: 0.732 seconds hive> set hive.fetch.task.conversion=more; hive> select * from repo;
Option 2:
There is some info here: http://stackoverflow.com/questions/26339564/handling-newline-character-in-hive
Records in Hive are hard-coded to be terminated by the newline character (even though there is a LINES TERMINATED BY
clause, it is not implemented).
- Write a custom
InputFormat
that uses aRecordReader
that understands non-newline delimited records. Look at the code forLineReader
/LineRecordReader
andTextInputFormat
. - Use a format other than text/ASCII, like Parquet. I would recommend this regardless, as text is probably the worst format you can store data in anyway.
