- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Can we create external hive table on top of Fixed width file?
- Labels:
-
Apache Hadoop
-
Apache Hive
Created 01-04-2017 02:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can we create external hive table on top of Fixed width file? If yes then how it can be done.
Created 01-04-2017 07:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Use Regex Serde to create the external table like below
CREATE EXTERNAL TABLE customers (userid STRING, fb_id STRING, twitter_id STRING, status STRING) ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’ WITH SERDEPROPERTIES (“input.regex” = “(.{10})(.{10})(.{10})(.2})” )
LOCATION ‘path/to/data’;
Reference: http://www.confusedcoders.com/bigdata/hive/use-hive-serde-for-fixed-length-index-based-strings
Created 01-04-2017 06:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Bala Vignesh N V what do you mean by 'Fixed width file' ? Can you give a more detailed example?
Created 01-04-2017 06:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Sergey SoldatovFixed width files where there will not be any delimiters available. Each column data will be avilable in the specific length but with no delimiters.
Created 01-04-2017 07:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Use Regex Serde to create the external table like below
CREATE EXTERNAL TABLE customers (userid STRING, fb_id STRING, twitter_id STRING, status STRING) ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’ WITH SERDEPROPERTIES (“input.regex” = “(.{10})(.{10})(.{10})(.2})” )
LOCATION ‘path/to/data’;
Reference: http://www.confusedcoders.com/bigdata/hive/use-hive-serde-for-fixed-length-index-based-strings
Created 01-04-2017 07:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Lester Martin has an excellent Pig script to do this type of work. It is not an external table solution but a good way to do this type of work... https://martin.atlassian.net/wiki/pages/viewpage.action?pageId=21299205
Created 02-03-2017 05:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
****** SERDE for MULTI LINE delimiter: ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
WITH SERDEPROPERTIES ("field.delim"="\\s+") *************************************************************************** ====================================================================================================================================== CREATE EXTERNAL TABLE weather (USAF INT, WBAN INT, `Date` STRING, DIR STRING, SPD INT, GUS INT, CLG INT, SKC STRING, L STRING, M STRING, H STRING, VSB DECIMAL, MW1 STRING, MW2 STRING, MW3 STRING, MW4 STRING, AW1 STRING, AW2 STRING, AW3 STRING, AW4 STRING, W STRING, TEMP INT, DEWP INT, SLP DECIMAL, ALT DECIMAL, STP DECIMAL, MAX INT, MIN INT, PCP01 DECIMAL, PCP06 DECIMAL, PCP24 DECIMAL, PCPXX DECIMAL, SD INT) COMMENT 'weather table' ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="\\s+") STORED AS TEXTFILE LOCATION '/data/Weather';
Created 11-03-2017 10:32 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@bpreachukCan we not load the data into a single column of a stage table. And then use split() function to divide the string into multiple columns. Does this approach have any performance issues that would make us choose Pig / Serde approach instead ?Please share if you see any issues with this approach.
