- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Convert to parquet format
- Labels:
-
Apache Sqoop
-
HDFS
Created on ‎11-08-2016 06:26 AM - edited ‎09-16-2022 03:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
Using sqoop I am importing data to HDFS on daily basis. Now to increase the perfomance I am gonna use parquet file format.
So, my requirement is, import data from RDBMS and store it in HDFS as parquet file format. wanna know how to convert and is there any best practice to do ?
Created ‎11-08-2016 01:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HDP 2.3+ packages Sqoop 1.4.6 which allows direct import to HDFS as parquet file, by using:
--as-parquetfile
See: https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html
If you import directly to hive table (vs HDFS) you may need to do this as 2-step process (https://community.hortonworks.com/questions/56847/parquet-files-sqoop-import.html)
Created ‎11-08-2016 01:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HDP 2.3+ packages Sqoop 1.4.6 which allows direct import to HDFS as parquet file, by using:
--as-parquetfile
See: https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html
If you import directly to hive table (vs HDFS) you may need to do this as 2-step process (https://community.hortonworks.com/questions/56847/parquet-files-sqoop-import.html)
