Created 03-31-2016 01:45 AM
One of the customer is exploring ways to improve their hive query performance - and they are wondering about HDFS caching. They wanted to check if this is something we recommend.
Created 03-31-2016 01:50 AM
Created 03-31-2016 01:50 AM
Created 03-31-2016 02:15 AM
Thanks.. exactly what I was looking for.
Created 04-01-2016 10:27 PM
HDFS caching helps, however it helps only a bit since you are saving only the cost of moving bytes off disk and are still paying the cost of de-serialization, don't get JVM JIT etc. So, with technologies like Hive LLAP (coming in hive-2) you will get significantly better performance because LLAP caches de-serialized vectors in memory-efficient formats (2 bits for certain integer ranges - rather than 4 bytes), cpu-efficient filters (vectorized query processing via filters etc.) removes JVM startup cost for tasks (100s of ms), provides JIT-enhanced CPU performance etc. Rather excited about it!
Created 04-02-2016 11:52 PM
Thank you! I am excited about LLAP too. Do we yet have a timeline on which HDP release the new hive-2 will be packaged in?
Created 04-02-2016 11:57 PM
Watch the Tim Hall townhall from this Friday.