Created 06-21-2018 08:13 AM
We have a little problem with our tblproperties
("skip.header.line.count"="1")
. If we do a basic select like select
* from tableabc
we do not get back this header. But once we do a select
distinct columnname from tableabc
we get the header back!
Of course we do not want this for obvious reasons.
Did somebody else also have this issue? If so did you find a fix for this?
----technical information----
Hive2 version: 2.1
running on: Azure HDInsight hive interactive query cluster
This is a very small data set already, 48 records (with header included)
Create Statement:
<code>----------------------------------- --test_type-- ----------------------------------- CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in ( test_type string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\073' STORED AS TEXTFILE LOCATION 'adl://{adlslocation}data/data2/test' tblproperties ("skip.header.line.count"="1")
Select statement:
<code>select * from test_type_in;
Distinct statement
<code>select distinct test_type from test_type_in ORDER BY test_type;
I cannot show the exact statement because of NDA so i changed those values to test.
Thx in advance
Created 07-12-2018 01:26 AM
Did you get any solution for this issue? Thanks.
Created 07-13-2018 09:38 AM
so far I know for sure that if vectorization is disabled the problem goes away(because in that case the reader is not vectorized..)
set hive.vectorized.execution.enabled=false;
corresponding hive ticket: https://issues.apache.org/jira/browse/HIVE-19943
Created 07-23-2018 08:04 AM
Hi
Indeed disabling the vectorization solves the problem as described in the ticket. However we just went with the option to remove the headers since we really need the vectorization.