Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Stored data from CSV into a Parquet File and export Parquet File Format in HDFS

avatar
Rising Star
Hi experts,I have a .csv file stored in HDFS and I need to do 3 steps:a) Create a parquet file format b) Load the data from .csv to the Parquet Filec) Store Parquet file in a new HDFS directoryThe first step I had completed using Apache Hive:

create external table parquet_file (ID BIGINT, Date TimeStamp, Size Int)
  ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
  STORED AS 
    INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
    OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
    LOCATION '.../filedirectory';

How can I complete tasks b) and c)??? Many thanks!
1 ACCEPTED SOLUTION

avatar
New Member

B) Create a hive table. The Hive table should have all the columns stated in your hive2parquet.csv file. Assume (col1, col2, col3). Also assume your csv file is in /tmp dir inside HDFS.

1- Log into Hive and at hive command prompt and execute 2- and 3- and C) below;

// create the hive table

2- create table temp_txt (col1 string,col2 string, col3 string) row format delimited fields terminated by ',';

// load the hive table with hive2parquet.csv file

3- load data input ' /tmp/hive2parquet.csv' into table temp_text;

// Insert from table 'temp_txt' to table 'table_parquet_file'

C- insert into table table_parquet_file select * from temp_txt;

View solution in original post

1 REPLY 1

avatar
New Member

B) Create a hive table. The Hive table should have all the columns stated in your hive2parquet.csv file. Assume (col1, col2, col3). Also assume your csv file is in /tmp dir inside HDFS.

1- Log into Hive and at hive command prompt and execute 2- and 3- and C) below;

// create the hive table

2- create table temp_txt (col1 string,col2 string, col3 string) row format delimited fields terminated by ',';

// load the hive table with hive2parquet.csv file

3- load data input ' /tmp/hive2parquet.csv' into table temp_text;

// Insert from table 'temp_txt' to table 'table_parquet_file'

C- insert into table table_parquet_file select * from temp_txt;