Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can we create external hive table on top of Fixed width file?

avatar

Can we create external hive table on top of Fixed width file? If yes then how it can be done.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Use Regex Serde to create the external table like below

CREATE EXTERNAL TABLE customers (userid STRING, fb_id STRING, twitter_id STRING, status STRING) ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’ WITH SERDEPROPERTIES (“input.regex” = “(.{10})(.{10})(.{10})(.2})” )

LOCATION ‘path/to/data’;

Reference: http://www.confusedcoders.com/bigdata/hive/use-hive-serde-for-fixed-length-index-based-strings

View solution in original post

6 REPLIES 6

avatar
Super Collaborator

@Bala Vignesh N V what do you mean by 'Fixed width file' ? Can you give a more detailed example?

avatar

@Sergey SoldatovFixed width files where there will not be any delimiters available. Each column data will be avilable in the specific length but with no delimiters.

avatar
Expert Contributor

Use Regex Serde to create the external table like below

CREATE EXTERNAL TABLE customers (userid STRING, fb_id STRING, twitter_id STRING, status STRING) ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’ WITH SERDEPROPERTIES (“input.regex” = “(.{10})(.{10})(.{10})(.2})” )

LOCATION ‘path/to/data’;

Reference: http://www.confusedcoders.com/bigdata/hive/use-hive-serde-for-fixed-length-index-based-strings

avatar

Hi @Bala Vignesh N V.

Lester Martin has an excellent Pig script to do this type of work. It is not an external table solution but a good way to do this type of work... https://martin.atlassian.net/wiki/pages/viewpage.action?pageId=21299205

avatar
New Contributor

****** SERDE for MULTI LINE delimiter: ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'

WITH SERDEPROPERTIES ("field.delim"="\\s+") *************************************************************************** ====================================================================================================================================== CREATE EXTERNAL TABLE weather (USAF INT, WBAN INT, `Date` STRING, DIR STRING, SPD INT, GUS INT, CLG INT, SKC STRING, L STRING, M STRING, H STRING, VSB DECIMAL, MW1 STRING, MW2 STRING, MW3 STRING, MW4 STRING, AW1 STRING, AW2 STRING, AW3 STRING, AW4 STRING, W STRING, TEMP INT, DEWP INT, SLP DECIMAL, ALT DECIMAL, STP DECIMAL, MAX INT, MIN INT, PCP01 DECIMAL, PCP06 DECIMAL, PCP24 DECIMAL, PCPXX DECIMAL, SD INT) COMMENT 'weather table' ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="\\s+") STORED AS TEXTFILE LOCATION '/data/Weather';

avatar
Contributor

@bpreachukCan we not load the data into a single column of a stage table. And then use split() function to divide the string into multiple columns. Does this approach have any performance issues that would make us choose Pig / Serde approach instead ?Please share if you see any issues with this approach.