Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDFS caching to hive performance

avatar
Super Collaborator

One of the customer is exploring ways to improve their hive query performance - and they are wondering about HDFS caching. They wanted to check if this is something we recommend.

1 ACCEPTED SOLUTION

avatar
Master Guru

HDFS memory as a storage is in technical preview (link here). I recommend review hive performance tuning here.

View solution in original post

5 REPLIES 5

avatar
Master Guru

HDFS memory as a storage is in technical preview (link here). I recommend review hive performance tuning here.

avatar
Super Collaborator

Thanks.. exactly what I was looking for.

avatar
New Contributor

HDFS caching helps, however it helps only a bit since you are saving only the cost of moving bytes off disk and are still paying the cost of de-serialization, don't get JVM JIT etc. So, with technologies like Hive LLAP (coming in hive-2) you will get significantly better performance because LLAP caches de-serialized vectors in memory-efficient formats (2 bits for certain integer ranges - rather than 4 bytes), cpu-efficient filters (vectorized query processing via filters etc.) removes JVM startup cost for tasks (100s of ms), provides JIT-enhanced CPU performance etc. Rather excited about it!

avatar
Super Collaborator

Thank you! I am excited about LLAP too. Do we yet have a timeline on which HDP release the new hive-2 will be packaged in?

avatar
Master Mentor

Watch the Tim Hall townhall from this Friday.