Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Pig ParquetStorer is not working

Solved Go to solution

Pig ParquetStorer is not working

New Contributor

Hi There,

We are getting the following error when using ParquetStorer in Pig

ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Cannot instantiate class org.apache.pig.builtin.ParquetStorer (parquet.pig.ParquetStorer) 

We are using HDP-2.3.4.0-3485 version.

Appreciate if any one have any pointers on this.

Thank you,

Ibrahim

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Pig ParquetStorer is not working

Rising Star

--Register the jars

REGISTER lib/parquet-pig-1.3.1.jar;

REGISTER lib/parquet-column-1.3.1.jar;

REGISTER lib/parquet-common-1.3.1.jar;

REGISTER lib/parquet-format-2.0.0.jar;

REGISTER lib/parquet-hadoop-1.3.1.jar;

REGISTER lib/parquet-pig-1.3.1.jar;

REGISTER lib/parquet-encoding-1.3.1.jar;

--store in parquet format

SET parquet.compression gzip or SNAPPY;

STORE table INTO '/path/to/table' USING parquet.pig.ParquetStorer;

-- options you might want to fiddle with

SET parquet.page.size 1048576 -- default. this is your min read/write unit.

SET parquet.block.size 134217728 -- default. your memory budget for buffering data

SET parquet.compression lzo -- or you can use none, gzip, snappy

STORE mydata into '/some/path' USING parquet.pig.ParquetStorer; --Reading mydata = LOAD '/some/path' USING parquet.pig.ParquetLoader AS (x: int, y int);

5 REPLIES 5

Re: Pig ParquetStorer is not working

Mentor

you need to download the parquet jar, upload it to the cluster and register the Parquet jar. HDP doesn't ship with Parquet out of the box. @Ibrahim Jarrar

here's an example.

Re: Pig ParquetStorer is not working

Rising Star

--Register the jars

REGISTER lib/parquet-pig-1.3.1.jar;

REGISTER lib/parquet-column-1.3.1.jar;

REGISTER lib/parquet-common-1.3.1.jar;

REGISTER lib/parquet-format-2.0.0.jar;

REGISTER lib/parquet-hadoop-1.3.1.jar;

REGISTER lib/parquet-pig-1.3.1.jar;

REGISTER lib/parquet-encoding-1.3.1.jar;

--store in parquet format

SET parquet.compression gzip or SNAPPY;

STORE table INTO '/path/to/table' USING parquet.pig.ParquetStorer;

-- options you might want to fiddle with

SET parquet.page.size 1048576 -- default. this is your min read/write unit.

SET parquet.block.size 134217728 -- default. your memory budget for buffering data

SET parquet.compression lzo -- or you can use none, gzip, snappy

STORE mydata into '/some/path' USING parquet.pig.ParquetStorer; --Reading mydata = LOAD '/some/path' USING parquet.pig.ParquetLoader AS (x: int, y int);

Re: Pig ParquetStorer is not working

Mentor

@Ibrahim Jarrar has this been resolved? Can you provide your solution or accept the best answer?

Highlighted

Re: Pig ParquetStorer is not working

Mentor

Here's a much cleaner working example tested with HDP 2.6

wget http://central.maven.org/maven2/org/apache/parquet/parquet-pig-bundle/1.8.1/parquet-pig-bundle-1.8.1...
hdfs dfs -put parquet-pig-bundle-1.8.1.jar .
pig –x tez
REGISTER hdfs://dlm3ha/user/centos/parquet-pig-bundle-1.8.1.jar;
// words is a CSV file with five fields
data = load 'words' using PigStorage(',') as (f1:chararray,f2:chararray,f3:chararray,f4:chararray,f5:chararray);
store data into 'hdfs://dlm3ha/user/centos/output' using org.apache.parquet.pig.ParquetStorer;

Re: Pig ParquetStorer is not working

New Contributor

i used:

register parquet-pig-1.10.1.jar;
register parquet-encoding-1.8.2.jar;
register parquet-column-1.8.2.jar;
register parquet-common-1.8.2.jar;
register parquet-hadoop-1.8.2.jar;
register parquet-format-2.3.1.jar;

base = LOAD '/XXX/yyy/archivo.parquet' USING org.apache.parquet.pig.ParquetLoader AS (

xxx:chararray,
yyyy:chararray,

...

)
;

and ok

Don't have an account?
Coming from Hortonworks? Activate your account here