Support Questions

Abdul_ · ‎04-17-2023

My data set is as below.

"IM43163","SOUTH,OFC","10-Jan-23"

"IM41763","John:

comment added","12-Jan-23"

CREATE EXTERNAL TABLE database.table1(
  `col_1` string, 
  `col_2` string,
  `col_3` string)ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES ( "separatorChar"="," , "quoteChar"="\"")

Since the record is splitted amont two rows therefore it is not loading properly and gives null values. The output I am getting is mentioned below.

IM43163	SOUTH,OFC	10-Jan-23
IM41763	?	?
?	?	?

tj2007 · ‎04-17-2023

Hi @Abdul_ ,

It looks like the issue is that the data contains a newline character (\n) within a field value, which causes the record to be split into two rows, causing the problem.

Can you modify the data to remove "\n" from the sample data? In that case, the create statement that you are using is correct.

However, if data modification is impossible, you may use "LazySimpleSerDe". However, it may not be as performant as the OpenCSVSerde for large datasets.

Hope this helps,

Tarun

Was your question answered? Make sure to mark the answer as the accepted solution.

If you find a reply useful, say thanks by clicking on the thumbs-up button.

Abdul_ · ‎04-17-2023

Hi @tj2007, It is not possible to modify the data. I have tried the "LazySimpleSerDe" but it didn't give the correct output (Mentioned below).

ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties (
    "separatorChar" = ",",
    "quoteChar"     = '\""')      STORED AS TEXTFILE

"IM43163","SOUTH,OFC","10-Jan-23"	?	?
"IM41763","John:	?	?
comment added","12-Jan-23"	?	?

However, I need output like this.

IM43163	SOUTH,OFC	10-Jan-23
IM41763	John:comment added	12-Jan-23

Please also note that the input file is a CSV file which I am successfully able to open it in Excel. Your support will be highly appreciated.

ggangadharan · ‎06-15-2023

@Abdul_

As of now hive won't support row delimiter other new line character . Attaching the corresponding Jira for reference HIVE-11996

As a workaround, Recommend to update the input file using external libraries like awk,...etc and upload the input file in the corresponding FileSystem location to read.

Eg -

Through AWK

[root@c2757-node2 ~]# awk -F "\",\"" 'NF < 3 {getline nextline; $0 = $0 nextline} 1' sample_case.txt
"IM43163","SOUTH,OFC","10-Jan-23"
"IM41763","John:comment added","12-Jan-23"
[root@c2757-node2 ~]# awk -F "\",\"" 'NF < 3 {getline nextline; $0 = $0 nextline} 1' sample_case.txt  > sample_text.csv

Reading from Hive Table

0: jdbc:hive2://c2757-node2.coelab.cloudera.c> select * from table1;
.
.
.
INFO  : Executing command(queryId=hive_20230616064136_333ff98d-636b-43b1-898d-fca66031fe7f): select * from table1
INFO  : Completed executing command(queryId=hive_20230616064136_333ff98d-636b-43b1-898d-fca66031fe7f); Time taken: 0.023 seconds
INFO  : OK
+---------------+---------------------+---------------+
| table1.col_1  |    table1.col_2     | table1.col_3  |
+---------------+---------------------+---------------+
| IM43163       | SOUTH,OFC           | 10-Jan-23     |
| IM41763       | John:comment added  | 12-Jan-23     |
+---------------+---------------------+---------------+
2 rows selected (1.864 seconds)

Cloudera Community

Support Questions

Hive - Line Termination in Quotes