Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HIVE Parquet output always uses index regardless of settings

Highlighted

HIVE Parquet output always uses index regardless of settings

New Contributor

Hello,

 

I'm using HIVE (v1.2.1) to convert our data files from CSV into Parquet for use in AWS Athena.

However, no mater what I try the resulting Parquet always has columns titles [_col0, _col1, ..., _colN]

 

After researching, I read that the line SET parquet.column.index.access=false was supposed to allow for Parquet to use the column titles of my HIVE table; however, it has been unsuccessful so far.

 

Below is an example script I use to create the Parquet from data

 

SET parquet.column.index.access=false;
CREATE EXTERNAL TABLE IF NOT EXISTS EVENTS( `release` STRING, `customer` STRING, `cookie` STRING, `category` STRING, `end_time` STRING, `start_time` STRING, `first_name` STRING, `email` STRING, `phone` STRING, `last_name` STRING, `site` STRING, `source` STRING, `subject` STRING, `raw` STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' LOCATION '${INPUT}'; INSERT OVERWRITE DIRECTORY '${OUTPUT}/parquet' STORED AS PARQUET SELECT * FROM EVENTS;

 

Using parquet-tools, I read the resulting file and below is an example output:

 
_col0 = 0.1.2
_col1 = customer1
_col2 = NULL
_col3 = api
_col4 = 2018-01-21T06:57:57Z 
_col5 = 2018-01-21T06:57:56Z 
_col6 = Brandon
_col7 = brandon@fakesite.com
_col8 = 999-999-9999
_col9 = Pompei
_col10 = Boston
_col11 = Wifi
_col12 = NULL
_col13 = eyJlbmdhZ2VtZW50TWVkaXVtIjoibm9uZSIsImVudHJ5UG9pbnRJZCI6ImQ5YjYwN2UzLTFlN2QtNGY1YS1iZWQ4LWQ4Yjk3NmRkZTQ3MiIsIkVDWF9FVkVOVF9DQVRFR09SWV9BUElfTkFNRSI6IkVDWF9FQ19TSVRFVFJBQ0tfU0lURV9WSVNJVCIsIkVDWF9TSVRFX1JFR0lPTl9BUElfTkFNRSI
 

This is problematic because it is impossible to transfer it to an Athena table (or even back to HIVE) without using these index-based column titles. I need HIVE's column titles to transfer over to the Parquet file. 

 

I've search for a very long time and have come up short. Am I doing something wrong? 
Please let me know if I can provide more information. Thank you!