Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HIVE MR VS TEZ difference in output, ,Hi,

Solved Go to solution

Re: HIVE MR VS TEZ difference in output, ,Hi,

New Contributor

@Hari Rongali,

Thanks for suggestion, but currently I am not using enforce option. I will try by including with this option and run again.

I am using below options,

hive.exec.dynamic.partition --> true

hive.exec.dynamic.partition.mode --> nonstrict

hive.execution.engine --> tez

But in the given link, about enforce option, they specifically mentioned, Not needed in Hive 2.x onward

and I am using 2.3.2.0

Highlighted

Re: HIVE MR VS TEZ difference in output, ,Hi,

Rising Star
@S. Sali

I am pretty sure HDP 2.3.2.0 does not have Hive2.x and Hive 2.x is GA in future releases of HDP, probably from HDP 2.6 or later.

If you are okay with the solution provided, can you please upvote and accept the answer ? Thanks

Re: HIVE MR VS TEZ difference in output, ,Hi,

New Contributor

Thanks , yes, it worked by setting hive.enforce.bucketing = true.

but I didn't understand the background how it affects while querying thru tez and MR?

Re: HIVE MR VS TEZ difference in output, ,Hi,

New Contributor

hdp 2.3 and hive 1.2

the hive.enforce.bucketing is default true

What is the need to set?

Re: HIVE MR VS TEZ difference in output, ,Hi,

New Contributor

I have the same issue here. Will test with setting hive.enforce.bucketing=true while inserting data. But does anyone know why this setting will help here?

Re: HIVE MR VS TEZ difference in output, ,Hi,

Hi @kerra

Bucketing is supported for hive 2.x and above.

set hive.enforce.bucketing = true;

The main reason is that it allows the correct number of reducers and the cluster by column to be automatically selected based on the table. Otherwise, you would need to set the number of reducers to be the same as the number of buckets as in set mapred.reduce.tasks = 256; and have a CLUSTER BY ... clause in the select.

Re: HIVE MR VS TEZ difference in output, ,Hi,

New Contributor

my hdp is 2.3 hive 1.2 sql union all itself

use tez and orc is right

bug use mr is 0

this is my ddl

CREATE TABLE `test.web`
( `id` string , `uid` string , `user_id` int ) 
PARTITIONED BY (`p_date` string) 
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':' 
LINES TERMINATED BY '\n' 
NULL DEFINED AS '' 
STORED AS ORC 
TBLPROPERTIES('orc.compress'='SNAPPY')

sql

SELECT
	count(*)
FROM
	(
		SELECT
			id,
			user_id
		FROM
			test.web
		WHERE
			p_date = 20171129
		AND user_id > 0
		UNION ALL
			SELECT
				id,
				user_id
			FROM
				test.web
			WHERE
				p_date = 20171129
			AND stat_id = 'adm'
			AND user_id > 0
	) a

hive 1.2 hive.enforce.bucketing default is true

Do need other parameters?

Don't have an account?
Coming from Hortonworks? Activate your account here