Created on 12-23-2016 06:18 AM - edited 09-16-2022 01:37 AM
often we have need to read the parquet file, parquet-meta data or parquet-footer, parquet tools is shipped with parquet-hadoop library which can help us to read parquet. these are simple steps to build parquet-tools and demonstrate use of it.
prerequisites: maven 3,git, jdk-7/8
// Building a parquet tools
git clone https://github.com/Parquet/parquet-mr.git cd parquet-mr/parquet-tools/ mvn clean package -Plocal
// know the schema of the parquet file
java -jar parquet-tools-1.6.0.jar schema sample.parquet
// Read parquet file
java -jar parquet-tools-1.6.0.jar cat sample.parquet
// Read few lines in parquet file
java -jar parquet-tools-1.6.0.jar head -n5 sample.parquet
// know the meta information of the parquet file
java -jar parquet-tools-1.6.0.jar meta sample.parquet
User | Count |
---|---|
763 | |
379 | |
316 | |
309 | |
270 |