Created 01-04-2017 02:34 PM
Can we create external hive table on top of Fixed width file? If yes then how it can be done.
Created 01-04-2017 07:03 PM
Use Regex Serde to create the external table like below
CREATE EXTERNAL TABLE customers (userid STRING, fb_id STRING, twitter_id STRING, status STRING) ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’ WITH SERDEPROPERTIES (“input.regex” = “(.{10})(.{10})(.{10})(.2})” )
LOCATION ‘path/to/data’;
Reference: http://www.confusedcoders.com/bigdata/hive/use-hive-serde-for-fixed-length-index-based-strings
Created 01-04-2017 06:33 PM
@Bala Vignesh N V what do you mean by 'Fixed width file' ? Can you give a more detailed example?
Created 01-04-2017 06:45 PM
@Sergey SoldatovFixed width files where there will not be any delimiters available. Each column data will be avilable in the specific length but with no delimiters.
Created 01-04-2017 07:03 PM
Use Regex Serde to create the external table like below
CREATE EXTERNAL TABLE customers (userid STRING, fb_id STRING, twitter_id STRING, status STRING) ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’ WITH SERDEPROPERTIES (“input.regex” = “(.{10})(.{10})(.{10})(.2})” )
LOCATION ‘path/to/data’;
Reference: http://www.confusedcoders.com/bigdata/hive/use-hive-serde-for-fixed-length-index-based-strings
Created 01-04-2017 07:23 PM
Lester Martin has an excellent Pig script to do this type of work. It is not an external table solution but a good way to do this type of work... https://martin.atlassian.net/wiki/pages/viewpage.action?pageId=21299205
Created 02-03-2017 05:18 AM
****** SERDE for MULTI LINE delimiter: ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
WITH SERDEPROPERTIES ("field.delim"="\\s+") *************************************************************************** ====================================================================================================================================== CREATE EXTERNAL TABLE weather (USAF INT, WBAN INT, `Date` STRING, DIR STRING, SPD INT, GUS INT, CLG INT, SKC STRING, L STRING, M STRING, H STRING, VSB DECIMAL, MW1 STRING, MW2 STRING, MW3 STRING, MW4 STRING, AW1 STRING, AW2 STRING, AW3 STRING, AW4 STRING, W STRING, TEMP INT, DEWP INT, SLP DECIMAL, ALT DECIMAL, STP DECIMAL, MAX INT, MIN INT, PCP01 DECIMAL, PCP06 DECIMAL, PCP24 DECIMAL, PCPXX DECIMAL, SD INT) COMMENT 'weather table' ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ("field.delim"="\\s+") STORED AS TEXTFILE LOCATION '/data/Weather';
Created 11-03-2017 10:32 AM
@bpreachukCan we not load the data into a single column of a stage table. And then use split() function to divide the string into multiple columns. Does this approach have any performance issues that would make us choose Pig / Serde approach instead ?Please share if you see any issues with this approach.