Support Questions
Find answers, ask questions, and share your expertise

Increase hive insert performance

Highlighted

Increase hive insert performance

Super Guru

I have about 3 million records I want to insert into a ORC table. non partition - non bucketed. It is simple insert. I have played with various number of mappers but can't seem to increase performance by much. Any pointers to increase the performance would be helpful. I am using MR & Tez. both seem to take a lot of time. I have run stats on the table.

6 REPLIES 6
Highlighted

Re: Increase hive insert performance

@Sunile Manjee

Have you to tried vectorization and compression?

http://hortonworks.com/blog/5-ways-make-hive-queries-run-faster/

Thanks and Regards,

Sindhu

Highlighted

Re: Increase hive insert performance

Super Guru

Yes i am using vectorization ans snappy compression

Highlighted

Re: Increase hive insert performance

@Sunile Manjee

Did you try having parallelism at hive execution, compression at intermediate results and auto join?

  • For map output compression: mapred.compress.map.output=true
  • For job output compression: mapred.output.compress=true
  • hive.auto.convert.join=true
  • hive.exec.parallel=true.

However, the major performance factor would be using partitioning and bucketing.

Thanks and Regards,

Sindhu

Highlighted

Re: Increase hive insert performance

Super Guru

I have tried all those parameters. I think the problem is my question is to vague. Need to close this question and ask specific question on setting and impact performance during insert.

Highlighted

Re: Increase hive insert performance

New Contributor

Sunile,

Horton Works just announced Hive 2.0 with LLAP feature. Please try that and let us know if you still see the low performance.

Highlighted

Re: Increase hive insert performance

@Sunile Manjee

Is your script taking longer time in Mapper phase or Reducer Phase?

if mapper is taking longer, I believe that your hive script "select and where conditions needs to be modify"

Did you add "distribute by" ;

Can I see your hive script?