Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive query issue

Solved Go to solution

Hive query issue

Rising Star

Goodmorning everyone.

A customer has a problem with a query on hive (with Tez).

The query is as follows:

select a.hashedaddress from x a join x b on (a.hashedaddress = b.hashedaddress) where a.guid != b.guid;

The process is stopped at the first step of mapping:

VERTICES	STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

Map 1           FAILED     62          0        0       62      43       7 

Map 3           KILLED     62          0        0       62       0       0 

Reducer 2       KILLED   1009          0        0     1009       0       0 

VERTICES: 00/03  [>>--------------------------] 0%    ELAPSED TIME: 26.69 s

Log error is this: vertexerror.txt

The table schema is as follows:

CREATE TABLE `x`(
  `guid` string,
  `brandname` string, `hashedaddress` string, [and more]) PARTITIONED BY (
  `calendardate` date) CLUSTERED BY (
  brandname) 

INTO 5 BUCKETS ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 

STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 

OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' 

LOCATION
  '*****' 

TBLPROPERTIES (
  'transient_lastDdlTime'='1445504632')

Table contains some 20 million records, to do the tests we tried to apply a limit to the tables:

select a.hashedaddress from (select * from x limit 10000) a join (select * from x limit 10000) b on (a.hashedaddress = b.hashedaddress) where a.guid != b.guid;

The query runs without problems

VERTICES      	STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

Map 1 ..........SUCCEEDED     62         62        0        0       0       0 

Map 3 ..........SUCCEEDED     62         62        0        0       0       0 

Reducer 2 ......SUCCEEDED      1          1        0        0       0       0 

Reducer 4 ......SUCCEEDED      1          1        0        0       0       0 

VERTICES: 04/04  [==========================>>] 100%  ELAPSED TIME: 55.13 s
Then we have increased the limits up to the cap (100000. 1000000, 10000000) and it works correctly, After this changing the query removing the limitations and it works:
select a.hashedaddress from (select * from x) a join (select * from x) b on (a.hashedaddress = b.hashedaddress) where a.guid != b.guid;
VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
Map 1 ..........SUCCEEDED     62         62        0        0       0       0 

Map 3 ..........SUCCEEDED     62         62        0        0       0       0 

Reducer 2 ......SUCCEEDED     253        253       0        0       0       0 

VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 

Can you help me understand why not work the first query? The join does it with the same data. Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Hive query issue

Rising Star

Finally the customer has applied the change (hive.vectorized.execution.enabled=false). Everything works correctly now.

Thanks so much

10 REPLIES 10

Re: Hive query issue

Master Collaborator

What is the HDP or Hive version?

Re: Hive query issue

Rising Star

hive 0.14.0.2.2 on HDP-2.2.7.1-10 ps: is hdinsight distribution.

Re: Hive query issue

Master Collaborator

Please provide the explain plan of the failing query. Also can you try the failing query with hive.vectorized.execution.enabled=false?

Re: Hive query issue

Rising Star

Hi Deepesh.

Tonight the customer tries to disable the vectorization, when give me a feedback I update you.

For the moment, thank you!

Re: Hive query issue

@Alessio Ubaldi

Just to test ...can you run the query withouth vectorization?

set hive.vectorized.execution.enabled=false

Re: Hive query issue

Rising Star

@Deepesh suggested this solution.

Friday I informed the customer, I still waiting for a response. Just answer me I'll know you.

Thank you!

Re: Hive query issue

Guru

Hive doesn't support non-equijoins yet. This is coming soon.

Re: Hive query issue

Rising Star

Thanks for answer!

Re: Hive query issue

Rising Star

Finally the customer has applied the change (hive.vectorized.execution.enabled=false). Everything works correctly now.

Thanks so much