Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

PARQUET_FALLBACK_SCHEMA_RESOLUTION

avatar
Contributor

Is it possible to make Impala default to 'name' instead of 'position' so I don't have to do this every Hue session?

 

set PARQUET_FALLBACK_SCHEMA_RESOLUTION=name;

 

My parquet files don't always have the same set of columns, so I have to lookup columns by name.

1 ACCEPTED SOLUTION

avatar
Cloudera Employee

You can change the default query options via impalad command line options:

-default_query_options='PARQUET_FALLBACK_SCHEMA_RESOLUTION=name' 

The same can be done in Cloudera Manager: https://www.cloudera.com/documentation/enterprise/5-12-x/topics/impala_config_options.html

View solution in original post

3 REPLIES 3

avatar
Cloudera Employee

You can change the default query options via impalad command line options:

-default_query_options='PARQUET_FALLBACK_SCHEMA_RESOLUTION=name' 

The same can be done in Cloudera Manager: https://www.cloudera.com/documentation/enterprise/5-12-x/topics/impala_config_options.html

avatar
Contributor
Thanks, I was able to add it under config--Impala Daemon, Impala Daemon Query Options Advanced Configuration Snippet (Safety Valve), and then add PARQUET_FALLBACK_SCHEMA_RESOLUTION=name. Seems to work.

Do you know if there are any significant consequences to changing this? Seems like name would be a better default to go with if performance was about the same.

avatar

It's generally safe to use name-based resolution by default. Performance should be about the same. I agree name-based resolution may be a better choice because it's more intuitive.

 

Index vs. name based resolution have different tradeoffs in terms of what schema-evolution operations are allowed. For example with index-based resolution you can safely rename a column in your table schema. With name based resolution you can safely add/drop columns in the middle of your table schema, whereas with index-based resolution you can generally only add new columns at the end. So it's really all about tradeoffs.