- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Handle New line character in fixed width files in Hive
- Labels:
-
Apache Hive
Created ‎07-13-2017 11:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a fixed width file which i am trying to load in hive. But the thing is that i am getting a '\n' character in one of the line in file which is causing record to split and thereby causing regex to fail. I am creating table using below mentioned approach in hive.
create external table test.abc1_ext(a STRING,b STRING, c STRING, d STRING, e STRING, f STRING, g STRING, h STRING, i STRING, j STRING, k STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ("input.regex" = "(.{12})(.{1})(.{50})(.{30})(.{5})(.{30})(.{4})(.{26})(.{10})(.{10})(.{8})") LOCATION '/abc/';
Column d contains '\n' causing record to split.
is there a way to handle that in hive?
Regards
Rahul
Created ‎07-26-2017 05:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't think we can handle \n characters with serde RegedSerDe, as by default all '\n' are retreated as line delimiters by Hive. You might need to handle new line using Omniture Data SerDe, refer link for details.
