Reply
Explorer
Posts: 6
Registered: ‎12-01-2015
Accepted Solution

PARQUET_FALLBACK_SCHEMA_RESOLUTION

[ Edited ]

Is it possible to make Impala default to 'name' instead of 'position' so I don't have to do this every Hue session?

 

set PARQUET_FALLBACK_SCHEMA_RESOLUTION=name;

 

My parquet files don't always have the same set of columns, so I have to lookup columns by name.

Cloudera Employee
Posts: 4
Registered: ‎07-24-2017

Re: PARQUET_FALLBACK_SCHEMA_RESOLUTION

[ Edited ]

You can change the default query options via impalad command line options:

-default_query_options='PARQUET_FALLBACK_SCHEMA_RESOLUTION=name' 

The same can be done in Cloudera Manager: https://www.cloudera.com/documentation/enterprise/5-12-x/topics/impala_config_options.html

Explorer
Posts: 6
Registered: ‎12-01-2015

Re: PARQUET_FALLBACK_SCHEMA_RESOLUTION

Thanks, I was able to add it under config--Impala Daemon, Impala Daemon Query Options Advanced Configuration Snippet (Safety Valve), and then add PARQUET_FALLBACK_SCHEMA_RESOLUTION=name. Seems to work.

Do you know if there are any significant consequences to changing this? Seems like name would be a better default to go with if performance was about the same.
Highlighted
Cloudera Employee
Posts: 290
Registered: ‎10-16-2013

Re: PARQUET_FALLBACK_SCHEMA_RESOLUTION

It's generally safe to use name-based resolution by default. Performance should be about the same. I agree name-based resolution may be a better choice because it's more intuitive.

 

Index vs. name based resolution have different tradeoffs in terms of what schema-evolution operations are allowed. For example with index-based resolution you can safely rename a column in your table schema. With name based resolution you can safely add/drop columns in the middle of your table schema, whereas with index-based resolution you can generally only add new columns at the end. So it's really all about tradeoffs.

Announcements