Reply
New Contributor
Posts: 1
Registered: ‎01-25-2018

HIVE Parquet output always uses index regardless of settings

[ Edited ]

Hello,

 

I'm using HIVE (v1.2.1) to convert our data files from CSV into Parquet for use in AWS Athena.

However, no mater what I try the resulting Parquet always has columns titles [_col0, _col1, ..., _colN]

 

After researching, I read that the line SET parquet.column.index.access=false was supposed to allow for Parquet to use the column titles of my HIVE table; however, it has been unsuccessful so far.

 

Below is an example script I use to create the Parquet from data

 

SET parquet.column.index.access=false;
CREATE EXTERNAL TABLE IF NOT EXISTS EVENTS( `release` STRING, `customer` STRING, `cookie` STRING, `category` STRING, `end_time` STRING, `start_time` STRING, `first_name` STRING, `email` STRING, `phone` STRING, `last_name` STRING, `site` STRING, `source` STRING, `subject` STRING, `raw` STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' LOCATION '${INPUT}'; INSERT OVERWRITE DIRECTORY '${OUTPUT}/parquet' STORED AS PARQUET SELECT * FROM EVENTS;

 

Using parquet-tools, I read the resulting file and below is an example output:

 
_col0 = 0.1.2
_col1 = customer1
_col2 = NULL
_col3 = api
_col4 = 2018-01-21T06:57:57Z 
_col5 = 2018-01-21T06:57:56Z 
_col6 = Brandon
_col7 = brandon@fakesite.com
_col8 = 999-999-9999
_col9 = Pompei
_col10 = Boston
_col11 = Wifi
_col12 = NULL
_col13 = eyJlbmdhZ2VtZW50TWVkaXVtIjoibm9uZSIsImVudHJ5UG9pbnRJZCI6ImQ5YjYwN2UzLTFlN2QtNGY1YS1iZWQ4LWQ4Yjk3NmRkZTQ3MiIsIkVDWF9FVkVOVF9DQVRFR09SWV9BUElfTkFNRSI6IkVDWF9FQ19TSVRFVFJBQ0tfU0lURV9WSVNJVCIsIkVDWF9TSVRFX1JFR0lPTl9BUElfTkFNRSI
 

This is problematic because it is impossible to transfer it to an Athena table (or even back to HIVE) without using these index-based column titles. I need HIVE's column titles to transfer over to the Parquet file. 

 

I've search for a very long time and have come up short. Am I doing something wrong? 
Please let me know if I can provide more information. Thank you!

Announcements