Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Stored data from CSV into a Parquet File and export Parquet File Format in HDFS

avatar
Rising Star
Hi experts,I have a .csv file stored in HDFS and I need to do 3 steps:a) Create a parquet file format b) Load the data from .csv to the Parquet Filec) Store Parquet file in a new HDFS directoryThe first step I had completed using Apache Hive:

create external table parquet_file (ID BIGINT, Date TimeStamp, Size Int)
  ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
  STORED AS 
    INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
    OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
    LOCATION '.../filedirectory';

How can I complete tasks b) and c)??? Many thanks!
1 ACCEPTED SOLUTION

avatar
Contributor

B) Create a hive table. The Hive table should have all the columns stated in your hive2parquet.csv file. Assume (col1, col2, col3). Also assume your csv file is in /tmp dir inside HDFS.

1- Log into Hive and at hive command prompt and execute 2- and 3- and C) below;

// create the hive table

2- create table temp_txt (col1 string,col2 string, col3 string) row format delimited fields terminated by ',';

// load the hive table with hive2parquet.csv file

3- load data input ' /tmp/hive2parquet.csv' into table temp_text;

// Insert from table 'temp_txt' to table 'table_parquet_file'

C- insert into table table_parquet_file select * from temp_txt;

View solution in original post

1 REPLY 1

avatar
Contributor

B) Create a hive table. The Hive table should have all the columns stated in your hive2parquet.csv file. Assume (col1, col2, col3). Also assume your csv file is in /tmp dir inside HDFS.

1- Log into Hive and at hive command prompt and execute 2- and 3- and C) below;

// create the hive table

2- create table temp_txt (col1 string,col2 string, col3 string) row format delimited fields terminated by ',';

// load the hive table with hive2parquet.csv file

3- load data input ' /tmp/hive2parquet.csv' into table temp_text;

// Insert from table 'temp_txt' to table 'table_parquet_file'

C- insert into table table_parquet_file select * from temp_txt;