Support Questions

balavignesh_nag · ‎01-04-2017

Can we create external hive table on top of Fixed width file? If yes then how it can be done.

nbalaji-elangov · ‎01-04-2017

Use Regex Serde to create the external table like below

CREATE EXTERNAL TABLE customers (userid STRING, fb_id STRING, twitter_id STRING, status STRING) ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’ WITH SERDEPROPERTIES (“input.regex” = “(.{10})(.{10})(.{10})(.2})” )

LOCATION ‘path/to/data’;

Reference: http://www.confusedcoders.com/bigdata/hive/use-hive-serde-for-fixed-length-index-based-strings

View solution in original post

ssoldatov · ‎01-04-2017

@Bala Vignesh N V what do you mean by 'Fixed width file' ? Can you give a more detailed example?

balavignesh_nag · ‎01-04-2017

@Sergey SoldatovFixed width files where there will not be any delimiters available. Each column data will be avilable in the specific length but with no delimiters.

nbalaji-elangov · ‎01-04-2017

Use Regex Serde to create the external table like below

CREATE EXTERNAL TABLE customers (userid STRING, fb_id STRING, twitter_id STRING, status STRING) ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’ WITH SERDEPROPERTIES (“input.regex” = “(.{10})(.{10})(.{10})(.2})” )

LOCATION ‘path/to/data’;

Reference: http://www.confusedcoders.com/bigdata/hive/use-hive-serde-for-fixed-length-index-based-strings

bpreachuk · ‎01-04-2017

Hi @Bala Vignesh N V.

Lester Martin has an excellent Pig script to do this type of work. It is not an external table solution but a good way to do this type of work... https://martin.atlassian.net/wiki/pages/viewpage.action?pageId=21299205

ghemanthreddy_h · ‎02-03-2017

****** SERDE for MULTI LINE delimiter: ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'

WITH SERDEPROPERTIES ("field.delim"="\\s+") *************************************************************************** ====================================================================================================================================== CREATE EXTERNAL TABLE weather (USAF INT, WBAN INT, `Date` STRING, DIR STRING, SPD INT, GUS INT, CLG INT, SKC STRING, L STRING, M STRING, H STRING, VSB DECIMAL, MW1 STRING, MW2 STRING, MW3 STRING, MW4 STRING, AW1 STRING, AW2 STRING, AW3 STRING, AW4 STRING, W STRING, TEMP INT, DEWP INT, SLP DECIMAL, ALT DECIMAL, STP DECIMAL, MAX INT, MIN INT, PCP01 DECIMAL, PCP06 DECIMAL, PCP24 DECIMAL, PCPXX DECIMAL, SD INT) COMMENT 'weather table' ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="\\s+") STORED AS TEXTFILE LOCATION '/data/Weather';

tanmoy_official · ‎11-03-2017

@bpreachukCan we not load the data into a single column of a stage table. And then use split() function to divide the string into multiple columns. Does this approach have any performance issues that would make us choose Pig / Serde approach instead ?Please share if you see any issues with this approach.