Support Questions

munnyrahul · ‎07-13-2017

I have a fixed width file which i am trying to load in hive. But the thing is that i am getting a '\n' character in one of the line in file which is causing record to split and thereby causing regex to fail. I am creating table using below mentioned approach in hive.

create external table test.abc1_ext(a STRING,b STRING, c STRING, d STRING, e STRING, f STRING, g STRING, h STRING, i STRING, j STRING, k STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ("input.regex" = "(.{12})(.{1})(.{50})(.{30})(.{5})(.{30})(.{4})(.{26})(.{10})(.{10})(.{8})") LOCATION '/abc/';

Column d contains '\n' causing record to split.

is there a way to handle that in hive?

Regards

Rahul

ssubhas · ‎07-26-2017

@rahul gulati

I don't think we can handle \n characters with serde RegedSerDe, as by default all '\n' are retreated as line delimiters by Hive. You might need to handle new line using Omniture Data SerDe, refer link for details.

Cloudera Community

Support Questions

Handle New line character in fixed width files in Hive

Handle the newline characters within the data in H...

How can I convert a fixed width file into Json usi...

Processing Fixed Width Files in Hive Using Native ...

Create a hive table upon a fixed-width log file bu...

Hive Query Recovery Tactics: Handling Failures thr...

Creating an Impala External Table from fixed width...

Ignore first line of a file and process second lin...

Replace Line Feed character in Nifi

Counting lines in text files with NiFi - part 1

Fix Under-replicated blocks in HDFS manually