Created on 11-28-2017 03:57 PM - edited 09-16-2022 05:34 AM
Is it possible to make Impala default to 'name' instead of 'position' so I don't have to do this every Hue session?
set PARQUET_FALLBACK_SCHEMA_RESOLUTION=name;
My parquet files don't always have the same set of columns, so I have to lookup columns by name.
Created on 11-28-2017 04:14 PM - edited 11-28-2017 04:21 PM
You can change the default query options via impalad command line options:
-default_query_options='PARQUET_FALLBACK_SCHEMA_RESOLUTION=name'
The same can be done in Cloudera Manager: https://www.cloudera.com/documentation/enterprise/5-12-x/topics/impala_config_options.html
Created on 11-28-2017 04:14 PM - edited 11-28-2017 04:21 PM
You can change the default query options via impalad command line options:
-default_query_options='PARQUET_FALLBACK_SCHEMA_RESOLUTION=name'
The same can be done in Cloudera Manager: https://www.cloudera.com/documentation/enterprise/5-12-x/topics/impala_config_options.html
Created 11-28-2017 04:32 PM
Created 11-28-2017 05:00 PM
It's generally safe to use name-based resolution by default. Performance should be about the same. I agree name-based resolution may be a better choice because it's more intuitive.
Index vs. name based resolution have different tradeoffs in terms of what schema-evolution operations are allowed. For example with index-based resolution you can safely rename a column in your table schema. With name based resolution you can safely add/drop columns in the middle of your table schema, whereas with index-based resolution you can generally only add new columns at the end. So it's really all about tradeoffs.