Support Questions

munnyrahul · ‎07-13-2017

I have a fixed width file which i am trying to load in hive. But the thing is that i am getting a '\n' character in one of the line in file which is causing record to split and thereby causing regex to fail. I am creating table using below mentioned approach in hive.

create external table test.abc1_ext(a STRING,b STRING, c STRING, d STRING, e STRING, f STRING, g STRING, h STRING, i STRING, j STRING, k STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ("input.regex" = "(.{12})(.{1})(.{50})(.{30})(.{5})(.{30})(.{4})(.{26})(.{10})(.{10})(.{8})") LOCATION '/abc/';

Column d contains '\n' causing record to split.

is there a way to handle that in hive?

Regards

Rahul

ssubhas · ‎07-26-2017

@rahul gulati

I don't think we can handle \n characters with serde RegedSerDe, as by default all '\n' are retreated as line delimiters by Hive. You might need to handle new line using Omniture Data SerDe, refer link for details.

Cloudera Community

Support Questions

Handle New line character in fixed width files in Hive