Reply
Contributor
Posts: 56
Registered: ‎02-03-2016

Merging many Parquet files

Has anyone tried to use commandline tools to merge many Parquet files into one? I tried using parquet-tools' merge command, but I cannot get any of the versions with the merge command, 1.8.2, 1.9.0, 1.9.1, to build. Is there a better way or can someone give some help here?

 

Thanks,

Ben

Highlighted
New Contributor
Posts: 2
Registered: ‎10-20-2015

Re: Merging many Parquet files

parquet-tools version 1.8.2 supports merge command. During the build, change the version of apache parquet to 1.8.2-SNAPSHOT in pom.xml for both parquet-mr and parquet-tools,

 

[user@host parquet-mr]$ cat pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>org.apache</groupId>
<artifactId>apache</artifactId>
<version>16</version>
</parent>

<groupId>org.apache.parquet</groupId>
<artifactId>parquet</artifactId>
<version>1.8.2-SNAPSHOT</version>
<packaging>pom</packaging>

Announcements
The Kite SDK is a collection of docs, sample code, APIs, and tools to make Hadoop application development faster. Learn more at http://kitesdk.org.