Support Questions

Find answers, ask questions, and share your expertise

Hive tblproperties (“skip.header.line.count”=“1”) not working with select distinct

avatar
New Contributor

We have a little problem with our tblproperties ("skip.header.line.count"="1"). If we do a basic select like select * from tableabc we do not get back this header. But once we do a select distinct columnname from tableabc we get the header back!

Of course we do not want this for obvious reasons.

Did somebody else also have this issue? If so did you find a fix for this?

----technical information----

Hive2 version: 2.1

running on: Azure HDInsight hive interactive query cluster

This is a very small data set already, 48 records (with header included)

Create Statement:

<code>-----------------------------------
--test_type--
-----------------------------------
CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
  (
    test_type      string
    )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\073'
STORED AS TEXTFILE
LOCATION 'adl://{adlslocation}data/data2/test'
tblproperties ("skip.header.line.count"="1")

Select statement:

<code>select * from test_type_in;

Distinct statement

<code>select distinct test_type from test_type_in ORDER BY test_type;

I cannot show the exact statement because of NDA so i changed those values to test.

Thx in advance

3 REPLIES 3

avatar
Expert Contributor
@Liam De Lee

Did you get any solution for this issue? Thanks.

avatar
Contributor

so far I know for sure that if vectorization is disabled the problem goes away(because in that case the reader is not vectorized..)

set hive.vectorized.execution.enabled=false;

corresponding hive ticket: https://issues.apache.org/jira/browse/HIVE-19943

avatar
New Contributor

Hi

Indeed disabling the vectorization solves the problem as described in the ticket. However we just went with the option to remove the headers since we really need the vectorization.