Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Merging many Parquet files

Highlighted

Merging many Parquet files

Rising Star

Has anyone tried to use commandline tools to merge many Parquet files into one? I tried using parquet-tools' merge command, but I cannot get any of the versions with the merge command, 1.8.2, 1.9.0, 1.9.1, to build. Is there a better way or can someone give some help here?

 

Thanks,

Ben

1 REPLY 1

Re: Merging many Parquet files

New Contributor

parquet-tools version 1.8.2 supports merge command. During the build, change the version of apache parquet to 1.8.2-SNAPSHOT in pom.xml for both parquet-mr and parquet-tools,

 

[user@host parquet-mr]$ cat pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>org.apache</groupId>
<artifactId>apache</artifactId>
<version>16</version>
</parent>

<groupId>org.apache.parquet</groupId>
<artifactId>parquet</artifactId>
<version>1.8.2-SNAPSHOT</version>
<packaging>pom</packaging>