Support Questions

Find answers, ask questions, and share your expertise

Hive - Line Termination in Quotes

avatar
New Contributor

My data set is as below.

"IM43163","SOUTH,OFC","10-Jan-23"

"IM41763","John:

comment added","12-Jan-23"

CREATE EXTERNAL TABLE database.table1(
  `col_1` string, 
  `col_2` string,
  `col_3` string)ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES ( "separatorChar"="," , "quoteChar"="\"") 

Since the record is splitted amont two rows therefore it is not loading properly and gives null values. The output I am getting is mentioned below.

IM43163SOUTH,OFC10-Jan-23
IM41763??
???
3 REPLIES 3

avatar
Super Collaborator

Hi @Abdul_ ,

It looks like the issue is that the data contains a newline character (\n) within a field value, which causes the record to be split into two rows, causing the problem.

Can you modify the data to remove "\n" from the sample data? In that case, the create statement that you are using is correct.

However, if data modification is impossible, you may use "LazySimpleSerDe". However, it may not be as performant as the OpenCSVSerde for large datasets.

 

Hope this helps,

Tarun

Was your question answered? Make sure to mark the answer as the accepted solution.

If you find a reply useful, say thanks by clicking on the thumbs-up button.

 

avatar
New Contributor

Hi @tj2007, It is not possible to modify the data. I have tried the "LazySimpleSerDe" but it didn't give the correct output (Mentioned below).

ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties (
    "separatorChar" = ",",
    "quoteChar"     = '\""')      STORED AS TEXTFILE
"IM43163","SOUTH,OFC","10-Jan-23"??
"IM41763","John:??
comment added","12-Jan-23"??

 

However, I need output like this.

IM43163SOUTH,OFC10-Jan-23
IM41763John:comment added12-Jan-23

Please also note that the input file is a CSV file which I am successfully able to open it in Excel. Your support will be highly appreciated.

avatar
Master Collaborator

@Abdul_ 

As of now hive won't support row delimiter other new line character . Attaching the corresponding Jira for reference HIVE-11996 

As a workaround, Recommend to update the input file using external libraries like awk,...etc and upload the input file in the corresponding FileSystem location to read.  

Eg - 

Through AWK 

[root@c2757-node2 ~]# awk -F "\",\"" 'NF < 3 {getline nextline; $0 = $0 nextline} 1' sample_case.txt
"IM43163","SOUTH,OFC","10-Jan-23"
"IM41763","John:comment added","12-Jan-23"
[root@c2757-node2 ~]# awk -F "\",\"" 'NF < 3 {getline nextline; $0 = $0 nextline} 1' sample_case.txt  > sample_text.csv


Reading from Hive Table

0: jdbc:hive2://c2757-node2.coelab.cloudera.c> select * from table1;
.
.
.
INFO  : Executing command(queryId=hive_20230616064136_333ff98d-636b-43b1-898d-fca66031fe7f): select * from table1
INFO  : Completed executing command(queryId=hive_20230616064136_333ff98d-636b-43b1-898d-fca66031fe7f); Time taken: 0.023 seconds
INFO  : OK
+---------------+---------------------+---------------+
| table1.col_1  |    table1.col_2     | table1.col_3  |
+---------------+---------------------+---------------+
| IM43163       | SOUTH,OFC           | 10-Jan-23     |
| IM41763       | John:comment added  | 12-Jan-23     |
+---------------+---------------------+---------------+
2 rows selected (1.864 seconds)