Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Any option to SKIP header line(s) when using the Phoenix CsvBulkLoadTool?

Running the following code:

hadoop jar /usr/hdp/current/phoenix-client/phoenix-4.4.0.2.3.4.0-3485-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool 
-z <Zookeeper nodes>:2181:/hbase-unsecure  
-d $'\t' 
--g 
--table <DB>.<TBL> 
--input /data/product/inbound/<FNAME>.TXT

is there any way to skip the first line of the input file - is there a parameter on the CsvBulkLoadTool that would allow a skip row? Specifically like what Hive gives you with 'tblproperties ("skip.header.line.count"="1")'.

Thanks!

1 ACCEPTED SOLUTION

Hi @bpreachuk, according to this page the bulk load tool doesn't have such a feature, but for smaller files, up to "tens of megabytes" you can use a single threaded psql.py tool which can interpret the first line as a list of columns by using the "-h in-line" option. Thinking about the bulk MR tool it's indeed hard to implement this because every mapper gets a chunk of the file, and we'd like only 1 mapper to remove the very first line, so it will have to be marked in a special way. More details about commands here.

View solution in original post

2 REPLIES 2

Hi @bpreachuk, according to this page the bulk load tool doesn't have such a feature, but for smaller files, up to "tens of megabytes" you can use a single threaded psql.py tool which can interpret the first line as a list of columns by using the "-h in-line" option. Thinking about the bulk MR tool it's indeed hard to implement this because every mapper gets a chunk of the file, and we'd like only 1 mapper to remove the very first line, so it will have to be marked in a special way. More details about commands here.

Thanks Predrag, that's what I thought. psql.py is an option for our smaller files...

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.