Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

insert overwrite of 2 GB data

Highlighted

insert overwrite of 2 GB data

New Contributor

I am running insert overwrite table emp_rc partition(year) select *,year_num from table1;

table1 size is 1.8 GB and this table is textfile, while emp_rc is RCFile. When i run this sql, it takes 1 hour, I have set mapreduce.job.reduces = 30 and now it takes 15 mins.

Can you advise, what else I can do to improve performance.

Can you explain how to analyse explain plan for such type of HQL.

4 REPLIES 4

Re: insert overwrite of 2 GB data

Re: insert overwrite of 2 GB data

New Contributor

Sorry Neeraj, I have asked a practical question and i don't see answer of my question in above link.

Re: insert overwrite of 2 GB data

@sandeep agarwal

set hive.execution.engine=tez;

create emp_rc table as Orc + Zlib

set hive.cbo.enable=true; set hive.compute.query.using.stats=true; set hive.stats.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true;

analyze table table_name compute statistics for columns; (for all the tables that involved )

set hive.vectorized.execution.enabled = true; set hive.vectorized.execution.reduce.enabled = true;

now run insert overwrite table emp_rc partition(year) select *,year_num from table1;

Re: insert overwrite of 2 GB data

Mentor

@sandeep agarwal are you still having issues with this? Can you accept best answer or provide your own solution?