Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive tblproperties (“skip.header.line.count”=“1”) not working with select distinct

Hive tblproperties (“skip.header.line.count”=“1”) not working with select distinct

New Contributor

We have a little problem with our tblproperties ("skip.header.line.count"="1"). If we do a basic select like select * from tableabc we do not get back this header. But once we do a select distinct columnname from tableabc we get the header back!

Of course we do not want this for obvious reasons.

Did somebody else also have this issue? If so did you find a fix for this?

----technical information----

Hive2 version: 2.1

running on: Azure HDInsight hive interactive query cluster

This is a very small data set already, 48 records (with header included)

Create Statement:

<code>-----------------------------------
--test_type--
-----------------------------------
CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
  (
    test_type      string
    )
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\073'
STORED AS TEXTFILE
LOCATION 'adl://{adlslocation}data/data2/test'
tblproperties ("skip.header.line.count"="1")

Select statement:

<code>select * from test_type_in;

Distinct statement

<code>select distinct test_type from test_type_in ORDER BY test_type;

I cannot show the exact statement because of NDA so i changed those values to test.

Thx in advance

3 REPLIES 3

Re: Hive tblproperties (“skip.header.line.count”=“1”) not working with select distinct

Cloudera Employee
@Liam De Lee

Did you get any solution for this issue? Thanks.

Highlighted

Re: Hive tblproperties (“skip.header.line.count”=“1”) not working with select distinct

New Contributor

so far I know for sure that if vectorization is disabled the problem goes away(because in that case the reader is not vectorized..)

set hive.vectorized.execution.enabled=false;

corresponding hive ticket: https://issues.apache.org/jira/browse/HIVE-19943

Re: Hive tblproperties (“skip.header.line.count”=“1”) not working with select distinct

New Contributor

Hi

Indeed disabling the vectorization solves the problem as described in the ticket. However we just went with the option to remove the headers since we really need the vectorization.

Don't have an account?
Coming from Hortonworks? Activate your account here