Created 01-14-2016 05:41 PM
Hi There,
We are getting the following error when using ParquetStorer in Pig
ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Cannot instantiate class org.apache.pig.builtin.ParquetStorer (parquet.pig.ParquetStorer)
We are using HDP-2.3.4.0-3485 version.
Appreciate if any one have any pointers on this.
Thank you,
Ibrahim
Created 01-14-2016 06:45 PM
--Register the jars
REGISTER lib/parquet-pig-1.3.1.jar;
REGISTER lib/parquet-column-1.3.1.jar;
REGISTER lib/parquet-common-1.3.1.jar;
REGISTER lib/parquet-format-2.0.0.jar;
REGISTER lib/parquet-hadoop-1.3.1.jar;
REGISTER lib/parquet-pig-1.3.1.jar;
REGISTER lib/parquet-encoding-1.3.1.jar;
--store in parquet format
SET parquet.compression gzip or SNAPPY;
STORE table INTO '/path/to/table' USING parquet.pig.ParquetStorer;
-- options you might want to fiddle with
SET parquet.page.size 1048576 -- default. this is your min read/write unit.
SET parquet.block.size 134217728 -- default. your memory budget for buffering data
SET parquet.compression lzo -- or you can use none, gzip, snappy
STORE mydata into '/some/path' USING parquet.pig.ParquetStorer; --Reading mydata = LOAD '/some/path' USING parquet.pig.ParquetLoader AS (x: int, y int);
Created 01-14-2016 06:13 PM
you need to download the parquet jar, upload it to the cluster and register the Parquet jar. HDP doesn't ship with Parquet out of the box. @Ibrahim Jarrar
here's an example.
Created 01-14-2016 06:45 PM
--Register the jars
REGISTER lib/parquet-pig-1.3.1.jar;
REGISTER lib/parquet-column-1.3.1.jar;
REGISTER lib/parquet-common-1.3.1.jar;
REGISTER lib/parquet-format-2.0.0.jar;
REGISTER lib/parquet-hadoop-1.3.1.jar;
REGISTER lib/parquet-pig-1.3.1.jar;
REGISTER lib/parquet-encoding-1.3.1.jar;
--store in parquet format
SET parquet.compression gzip or SNAPPY;
STORE table INTO '/path/to/table' USING parquet.pig.ParquetStorer;
-- options you might want to fiddle with
SET parquet.page.size 1048576 -- default. this is your min read/write unit.
SET parquet.block.size 134217728 -- default. your memory budget for buffering data
SET parquet.compression lzo -- or you can use none, gzip, snappy
STORE mydata into '/some/path' USING parquet.pig.ParquetStorer; --Reading mydata = LOAD '/some/path' USING parquet.pig.ParquetLoader AS (x: int, y int);
Created 02-03-2016 01:50 AM
@Ibrahim Jarrar has this been resolved? Can you provide your solution or accept the best answer?
Created 08-25-2017 05:07 PM
Here's a much cleaner working example tested with HDP 2.6
wget http://central.maven.org/maven2/org/apache/parquet/parquet-pig-bundle/1.8.1/parquet-pig-bundle-1.8.1...
hdfs dfs -put parquet-pig-bundle-1.8.1.jar .
pig –x tez
REGISTER hdfs://dlm3ha/user/centos/parquet-pig-bundle-1.8.1.jar;
// words is a CSV file with five fields
data = load 'words' using PigStorage(',') as (f1:chararray,f2:chararray,f3:chararray,f4:chararray,f5:chararray);
store data into 'hdfs://dlm3ha/user/centos/output' using org.apache.parquet.pig.ParquetStorer;
Created 07-30-2019 12:32 AM
i used:
register parquet-pig-1.10.1.jar;
register parquet-encoding-1.8.2.jar;
register parquet-column-1.8.2.jar;
register parquet-common-1.8.2.jar;
register parquet-hadoop-1.8.2.jar;
register parquet-format-2.3.1.jar;
base = LOAD '/XXX/yyy/archivo.parquet' USING org.apache.parquet.pig.ParquetLoader AS (
xxx:chararray,
yyyy:chararray,
...
)
;
and ok