Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

PARQUET_FALLBACK_SCHEMA_RESOLUTION

avatar
Contributor

Is it possible to make Impala default to 'name' instead of 'position' so I don't have to do this every Hue session?

 

set PARQUET_FALLBACK_SCHEMA_RESOLUTION=name;

 

My parquet files don't always have the same set of columns, so I have to lookup columns by name.

1 ACCEPTED SOLUTION

avatar
Cloudera Employee

You can change the default query options via impalad command line options:

-default_query_options='PARQUET_FALLBACK_SCHEMA_RESOLUTION=name' 

The same can be done in Cloudera Manager: https://www.cloudera.com/documentation/enterprise/5-12-x/topics/impala_config_options.html

View solution in original post

3 REPLIES 3

avatar
Cloudera Employee

You can change the default query options via impalad command line options:

-default_query_options='PARQUET_FALLBACK_SCHEMA_RESOLUTION=name' 

The same can be done in Cloudera Manager: https://www.cloudera.com/documentation/enterprise/5-12-x/topics/impala_config_options.html

avatar
Contributor
Thanks, I was able to add it under config--Impala Daemon, Impala Daemon Query Options Advanced Configuration Snippet (Safety Valve), and then add PARQUET_FALLBACK_SCHEMA_RESOLUTION=name. Seems to work.

Do you know if there are any significant consequences to changing this? Seems like name would be a better default to go with if performance was about the same.

avatar

It's generally safe to use name-based resolution by default. Performance should be about the same. I agree name-based resolution may be a better choice because it's more intuitive.

 

Index vs. name based resolution have different tradeoffs in terms of what schema-evolution operations are allowed. For example with index-based resolution you can safely rename a column in your table schema. With name based resolution you can safely add/drop columns in the middle of your table schema, whereas with index-based resolution you can generally only add new columns at the end. So it's really all about tradeoffs.