we are running cdh 5.9.0(impala 2.7.0, hive 1.1.0).
we know that, while quering timestamp fields(parquet table generated by hive) with impala, we may get different result(vs hive) because of the timezone reason. the startup args of our impala is:
convert_legacy_hive_parquet_utc_timestamps=false
use_local_tz_for_unix_timestamp_conversions.
what we confuse is that, whatever true/false we set hive.parquet.timestamp.skip.conversion while generating different parquet table in hive, we got the same timestamp result while doing query with impala from both of the generated table. what we expect is that, when the value of hive.parquet.timestamp.skip.conversion is different, the result should be different. but it just don't perform in this way.
we are really confuse about this, any reply will be appreciate.
bellowing steps is a test:
CREATE TABLE test_timestamp (ts TIMESTAMP) STORED AS TEXTFILE;
CREATE TABLE test_ts_skip_conversion_true_parquet (ts TIMESTAMP) STORED AS TEXTFILE;
CREATE TABLE test_ts_skip_conversion_false_parquet (ts TIMESTAMP) STORED AS TEXTFILE;
step1, load data into and query from test_timestamp
step2, select data into test_ts_skip_conversion_true_parquet
(hive.parquet.timestamp.skip.conversion=true)
step3, select data into test_ts_skip_conversion_false_parquet
(hive.parquet.timestamp.skip.conversion=false)
step4, while query test_ts_skip_conversion_true_parquet and test_ts_skip_conversion_false_parquet with impala, we got the same result, but we expect different result here!