Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Convert to parquet format

avatar
Rising Star

Hi All,

Using sqoop I am importing data to HDFS on daily basis. Now to increase the perfomance I am gonna use parquet file format.

So, my requirement is, import data from RDBMS and store it in HDFS as parquet file format. wanna know how to convert and is there any best practice to do ?

1 ACCEPTED SOLUTION

avatar
Guru

HDP 2.3+ packages Sqoop 1.4.6 which allows direct import to HDFS as parquet file, by using:

--as-parquetfile

See: https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html

If you import directly to hive table (vs HDFS) you may need to do this as 2-step process (https://community.hortonworks.com/questions/56847/parquet-files-sqoop-import.html)

View solution in original post

1 REPLY 1

avatar
Guru

HDP 2.3+ packages Sqoop 1.4.6 which allows direct import to HDFS as parquet file, by using:

--as-parquetfile

See: https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html

If you import directly to hive table (vs HDFS) you may need to do this as 2-step process (https://community.hortonworks.com/questions/56847/parquet-files-sqoop-import.html)