Created 08-25-2016 12:26 PM
Hi experts,I have a .csv file stored in HDFS and I need to do 3 steps:a) Create a parquet file format b) Load the data from .csv to the Parquet Filec) Store Parquet file in a new HDFS directoryThe first step I had completed using Apache Hive: create external table parquet_file (ID BIGINT, Date TimeStamp, Size Int) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat" OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat" LOCATION '.../filedirectory'; How can I complete tasks b) and c)??? Many thanks!
Created 08-25-2016 01:13 PM
B) Create a hive table. The Hive table should have all the columns stated in your hive2parquet.csv file. Assume (col1, col2, col3). Also assume your csv file is in /tmp dir inside HDFS.
1- Log into Hive and at hive command prompt and execute 2- and 3- and C) below;
// create the hive table
2- create table temp_txt (col1 string,col2 string, col3 string) row format delimited fields terminated by ',';
// load the hive table with hive2parquet.csv file
3- load data input ' /tmp/hive2parquet.csv' into table temp_text;
// Insert from table 'temp_txt' to table 'table_parquet_file'
C- insert into table table_parquet_file select * from temp_txt;
Created 08-25-2016 01:13 PM
B) Create a hive table. The Hive table should have all the columns stated in your hive2parquet.csv file. Assume (col1, col2, col3). Also assume your csv file is in /tmp dir inside HDFS.
1- Log into Hive and at hive command prompt and execute 2- and 3- and C) below;
// create the hive table
2- create table temp_txt (col1 string,col2 string, col3 string) row format delimited fields terminated by ',';
// load the hive table with hive2parquet.csv file
3- load data input ' /tmp/hive2parquet.csv' into table temp_text;
// Insert from table 'temp_txt' to table 'table_parquet_file'
C- insert into table table_parquet_file select * from temp_txt;