Created 10-22-2016 08:19 PM
Hi,
Hive table created using normal MR is not accessing records when using TEZ as execution engine and vice versa.
I created one table using normal MR and the same table I am trying to access by setting engine as TEZ, when using tez, zero records fetching and if i remove tez then getting all records and same is applicable in reverse condition. Why such difference?
,
Created 10-22-2016 10:22 PM
I am pretty sure it might be something related to the way the bucketed table is loaded. Did you have necessary properties set when loading the Hive table ? Especially hive.enforce.bucketing =
true
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables
I would recommend recreate the table by setting necessary properties and let me know if that works.
Created 10-22-2016 08:36 PM
Can you post the transcript of the commands you have executed ? Also what version of HDP are you on and what is the Hive version ?
Created on 10-22-2016 08:44 PM - edited 08-18-2019 06:30 AM
Hi,
I have uploaded the screenshot here, first one using MR and second using the TEZ.
Version 2.3.2.0-2950
Created 10-22-2016 08:50 PM
What version of HDP and Hive you are using ? Also can you upload the output of
describe formatted grdm.fnl_glbl_currency
Created 10-22-2016 08:59 PM
hadoop and hive version - 2.3.2.0-2950
hive> describe formatted grdm.fnl_glbl_currency > ; OK # col_name data_type comment currency_sk int currency_cd varchar(55) currency_dsc varchar(4000) country_nm varchar(255) row_effective_start_dt date row_effective_end_dt date created_job_run_id int updated_job_run_id int created_dts timestamp modified_dts timestamp source_system_join_key varchar(1000) natural_key_col_checksum_str varchar(255) type1_column_checksum_str varchar(255) type2_column_checksum_str varchar(255) system_gen_rec_flg varchar(1) # Partition Information # col_name data_type comment source_system_sk int current_rec_flg string #
Detailed Table Information Database: grdm Owner: cmanyar CreateTime: Mon Oct 03 15:51:54 CDT 2016 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0
Location: hdfs://DenaliQA/dl/hive/warehouse/final/be/global/grdm/grdm.db/fnl_glbl_currency
Table Type: MANAGED_TABLE Table Parameters: orc.compress SNAPPY transient_lastDdlTime 1475527914 # Storage Information SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Compressed: No Num Buckets: 10 Bucket Columns: [source_system_join_key] Sort Columns: [] Storage Desc Params: field.delim \u0001 serialization.format \u0001 Time taken: 1.992 seconds, Fetched: 48 row(s) hive>
Created 10-22-2016 09:46 PM
Do you see this kind of behavior with any other tables or just the bucketed tables ? Can you please check ?
Created 10-22-2016 10:01 PM
Hi,
I checked the some temp table which is input for final table (grdm.fnl_glbl_currency) and it don't have any bucket specified
Bucket Columns: []
For those temp table, can see the count from both(MR and TEZ) and count is same.
So is it because of bucketing?
How it differs from firing the query from TEZ and from MR?
What is the solution in such situation. Anything I need to specify, do you have any idea?
Created 10-22-2016 10:22 PM
I am pretty sure it might be something related to the way the bucketed table is loaded. Did you have necessary properties set when loading the Hive table ? Especially hive.enforce.bucketing =
true
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables
I would recommend recreate the table by setting necessary properties and let me know if that works.
Created 11-30-2017 07:20 AM
my hdp is 2.3 hive 1.2 sql union all itself
use tez and orc is right
bug use mr is 0
this is my ddl
CREATE TABLE `test.web` ( `id` string , `uid` string , `user_id` int ) PARTITIONED BY (`p_date` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY ',' MAP KEYS TERMINATED BY ':' LINES TERMINATED BY '\n' NULL DEFINED AS '' STORED AS ORC TBLPROPERTIES('orc.compress'='SNAPPY')
sql
SELECT count(*) FROM ( SELECT id, user_id FROM test.web WHERE p_date = 20171129 AND user_id > 0 UNION ALL SELECT id, user_id FROM test.web WHERE p_date = 20171129 AND stat_id = 'adm' AND user_id > 0 ) a
can help me ??
Created 11-30-2017 07:24 AM
hive 1.2 hive.enforce.bucketing default is true
Do need other parameters?
Created 10-22-2016 10:31 PM
@Hari Rongali,
Thanks for suggestion, but currently I am not using enforce option. I will try by including with this option and run again.
I am using below options,
hive.exec.dynamic.partition --> true
hive.exec.dynamic.partition.mode --> nonstrict
hive.execution.engine --> tez
But in the given link, about enforce option, they specifically mentioned, Not needed in Hive
2
.x onward
and I am using 2.3.2.0
Created 10-22-2016 10:36 PM
I am pretty sure HDP 2.3.2.0 does not have Hive2.x and Hive 2.x is GA in future releases of HDP, probably from HDP 2.6 or later.
If you are okay with the solution provided, can you please upvote and accept the answer ? Thanks
Created 10-22-2016 11:13 PM
Thanks , yes, it worked by setting hive.enforce.bucketing = true.
but I didn't understand the background how it affects while querying thru tez and MR?
Created 11-30-2017 07:55 AM
hdp 2.3 and hive 1.2
the hive.enforce.bucketing is default true
What is the need to set?
Created 04-09-2017 05:44 PM
I have the same issue here. Will test with setting hive.enforce.bucketing=true while inserting data. But does anyone know why this setting will help here?
Created 04-09-2017 06:44 PM
Hi @kerra
Bucketing is supported for hive 2.x and above.
set hive.enforce.bucketing =
true
;
The main reason is that it allows the correct number of reducers and the cluster by column to be automatically selected based on the table. Otherwise, you would need to set the number of reducers to be the same as the number of buckets as in set mapred.reduce.tasks = 256;
and have a CLUSTER BY ...
clause in the select.
Created 11-30-2017 07:33 AM
my hdp is 2.3 hive 1.2 sql union all itself
use tez and orc is right
bug use mr is 0
this is my ddl
CREATE TABLE `test.web` ( `id` string , `uid` string , `user_id` int ) PARTITIONED BY (`p_date` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY ',' MAP KEYS TERMINATED BY ':' LINES TERMINATED BY '\n' NULL DEFINED AS '' STORED AS ORC TBLPROPERTIES('orc.compress'='SNAPPY')
sql
SELECT count(*) FROM ( SELECT id, user_id FROM test.web WHERE p_date = 20171129 AND user_id > 0 UNION ALL SELECT id, user_id FROM test.web WHERE p_date = 20171129 AND stat_id = 'adm' AND user_id > 0 ) a
hive 1.2 hive.enforce.bucketing default is true
Do need other parameters?