Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Convert to parquet format

avatar
Rising Star

Hi All,

Using sqoop I am importing data to HDFS on daily basis. Now to increase the perfomance I am gonna use parquet file format.

So, my requirement is, import data from RDBMS and store it in HDFS as parquet file format. wanna know how to convert and is there any best practice to do ?

1 ACCEPTED SOLUTION

avatar
Guru

HDP 2.3+ packages Sqoop 1.4.6 which allows direct import to HDFS as parquet file, by using:

--as-parquetfile

See: https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html

If you import directly to hive table (vs HDFS) you may need to do this as 2-step process (https://community.hortonworks.com/questions/56847/parquet-files-sqoop-import.html)

View solution in original post

1 REPLY 1

avatar
Guru

HDP 2.3+ packages Sqoop 1.4.6 which allows direct import to HDFS as parquet file, by using:

--as-parquetfile

See: https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html

If you import directly to hive table (vs HDFS) you may need to do this as 2-step process (https://community.hortonworks.com/questions/56847/parquet-files-sqoop-import.html)