Support Questions

Find answers, ask questions, and share your expertise

Handle New line character in fixed width files in Hive


I have a fixed width file which i am trying to load in hive. But the thing is that i am getting a '\n' character in one of the line in file which is causing record to split and thereby causing regex to fail. I am creating table using below mentioned approach in hive.

create external table test.abc1_ext(a STRING,b STRING, c STRING, d STRING, e STRING, f STRING, g STRING, h STRING, i STRING, j STRING, k STRING)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ("input.regex" = "(.{12})(.{1})(.{50})(.{30})(.{5})(.{30})(.{4})(.{26})(.{10})(.{10})(.{8})") LOCATION '/abc/';

Column d contains '\n' causing record to split.

is there a way to handle that in hive?




@rahul gulati

I don't think we can handle \n characters with serde RegedSerDe, as by default all '\n' are retreated as line delimiters by Hive. You might need to handle new line using Omniture Data SerDe, refer link for details.