Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive select from tables with join don't give results depend on rows count

Hive select from tables with join don't give results depend on rows count

Contributor

Hi Community,

 

We have a script - collmx_consents_snp.hql (in attachement).

In this script we use join of following tables:

 

  • consent_service_consent_hst
  • consent_service_consent_subject_hst
  • consent_service_client_hst

 

DDL of those table are in attachment too.

 

All of this tables are partitioned by date.

 

The root cause of problem is that join of two tables in production don’t working(there is consistent data in tables but query don’t get nothing):

select *

    from (

        SELECT consent_uid,CASE WHEN for_contract = true THEN evid_srv ELSE NULL END evid_srv,entity_type,to_date(modif_time) apply_date,id_client

        FROM consent_service_consent_snp

    ) csc

    join consent_service_consent_subject_snp cscs on (csc.consent_uid = cscs.consent_uid)

In test environment all fine.

 

1.png

When we add some filter on table consent_service_consent_snp by partition, than query giva us results:

 2.jpg

When we are running select count(*) on this tables, we didn’t get any errors. In our test environment we have less data than in production. And when we add constraint on date in select clause all working fine, so we think that problem may depend on number of rows in the table. Logs of HiveServer2 and HiveMetastore in attachement.

When query is fails we see following in hiveserver2 log:

2017-03-09 20:00:30,406 INFO  org.apache.hadoop.hive.ql.plan.ConditionalResolverCommonJoin: [HiveServer2-Background-Pool: Thread-7269]: Failed to resolve driver alias (threshold : 25000000, length mapping : {cscs:consent_service_consent_subject_hst=571829172, csc:consent_service_consent_snp:consent_service_consent_hst=434475747})

6 REPLIES 6

Re: Hive select from tables with join don't give results depend on rows count

Contributor

Attachments mentioned above can be found on

https://drive.croc.ru/display/data/list?dataId=02745bf5-e54d-47a9-8797-15f108fc057e
login: 024556
password: 0B908ECFE563

Re: Hive select from tables with join don't give results depend on rows count

Contributor

Attachments mentioned above can be found on

https://drive.croc.ru/display/data/list?dataId=02745bf5-e54d-47a9-8797-15f108fc057e
login: 024556
password: 0B908ECFE563

Re: Hive select from tables with join don't give results depend on rows count

Champion

 

Hi  

 

I believe this could be the issue 

 

hive.join.cache.size
Default Value: 25000

How many rows in the joining tables (except the streaming table)
should be cached in memory.

 

Highlighted

Re: Hive select from tables with join don't give results depend on rows count

Contributor

Hi,

 

Tried to increase this property 10x but no results.

 

Regards,

Ramil.

Re: Hive select from tables with join don't give results depend on rows count

Champion

Can you do a explain statement if possible for this query  and share it . 

Re: Hive select from tables with join don't give results depend on rows count

Contributor

Hi,

 

explain statement is very huge so You can download it from our share:

https://drive.croc.ru/display/data/list?dataId=c43e16e0-e0af-40f1-935e-1c44e4b01f91
login: 024741
password: E804F9487956

Don't have an account?
Coming from Hortonworks? Activate your account here