Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

insert overwrite of 2 GB data

insert overwrite of 2 GB data

New Contributor

I am running insert overwrite table emp_rc partition(year) select *,year_num from table1;

table1 size is 1.8 GB and this table is textfile, while emp_rc is RCFile. When i run this sql, it takes 1 hour, I have set mapreduce.job.reduces = 30 and now it takes 15 mins.

Can you advise, what else I can do to improve performance.

Can you explain how to analyse explain plan for such type of HQL.

4 REPLIES 4

Re: insert overwrite of 2 GB data

Re: insert overwrite of 2 GB data

New Contributor

Sorry Neeraj, I have asked a practical question and i don't see answer of my question in above link.

Highlighted

Re: insert overwrite of 2 GB data

@sandeep agarwal

set hive.execution.engine=tez;

create emp_rc table as Orc + Zlib

set hive.cbo.enable=true; set hive.compute.query.using.stats=true; set hive.stats.fetch.column.stats=true; set hive.stats.fetch.partition.stats=true;

analyze table table_name compute statistics for columns; (for all the tables that involved )

set hive.vectorized.execution.enabled = true; set hive.vectorized.execution.reduce.enabled = true;

now run insert overwrite table emp_rc partition(year) select *,year_num from table1;

Re: insert overwrite of 2 GB data

Mentor

@sandeep agarwal are you still having issues with this? Can you accept best answer or provide your own solution?