Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Single Mapper using SERDE

Single Mapper using SERDE

Explorer

I Have a large file i have to process every month, when i import it, it only uses one mapper to import the data causing it to be unnecessarily slow. the file is comma delimited with double-quote text qualifiers...


any advice to allow the cluster to parrallize the prcessing would be helpful. i basically create the external table, managed table, then i query the data from the external table into the managed table. see below.


(using HDP 2.6.5)


Thanks!


CREATE EXTERNAL TABLE `ext_incomingdata`(

`record_note` string COMMENT 'from deserializer',

`firstdate` string COMMENT 'from deserializer',

`lastdate` string COMMENT 'from deserializer',

`fn` string COMMENT 'from deserializer',

`ln` string COMMENT 'from deserializer'

)

ROW FORMAT SERDE

'org.apache.hadoop.hive.serde2.OpenCSVSerde'

WITH SERDEPROPERTIES (

'quoteChar'='\"',

'separatorChar'=',')

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://mynode.mydomain.com:8020/etl/mydir/201905/import.csv'

TBLPROPERTIES (

'last_modified_by'='hdfs',

'last_modified_time'='1556747118',

'numFiles'='1',

'numRows'='0',

'rawDataSize'='0',

'skip.header.line.count'='1',

'totalSize'='37567255367',

'transient_lastDdlTime'='1556747118')


CREATE TABLE `incomingdata`(

`record_note` string,

`firstdate` string,

`lastdate` string,

`fn` string,

`ln` string

)

stored as orc;



insert into incomingdata select * from ext_incomingdata;

1 REPLY 1
Highlighted

Re: Single Mapper using SERDE

Explorer

does anyone have any feedback on this at all?

Don't have an account?
Coming from Hortonworks? Activate your account here