Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HIve query issue with distinct.

HIve query issue with distinct.

I am using hdp 2.5.6.0-40.

I am trying to run following command.

select count(distinct a21.company_code) WJXBFS1 from dim_investment a21;.

After ran query expected output is 1. But it is giving 0.

Tried with follwoing properties.

set hive.exec.dynamic.partition=true;

set hive.exec.dynamic.partition.mode=nostrict;

set hive.vectorized.execution.enabled=true;

1 REPLY 1

Re: HIve query issue with distinct.

Super Collaborator

Hi @Ashnee Sharma,

can you please try by setting "hive.groupby.skewindata=true" to make randomized shuffle before reduce.

In any case the following syntax should produce the accurate result

select count(1) from ( select a21.company_code from dim_investment a21 group by a21.company_code) aa

inner group by ensure that there should be a mapper executed before counting the distinct rows.

Don't have an account?
Coming from Hortonworks? Activate your account here