Support Questions

Find answers, ask questions, and share your expertise

HIVE positioning

avatar
Super Collaborator

Hi All,

HIVE has been established as an analytics engine (SQL query processing) for large file based data. The new features added to HIVE such as ACID, Streaming, updates etc. how does these features fit into the overall HIVE positioning?

Is the idea to create a all-in-one DB on HIVE ?

Thanks,

Avijeet

1 ACCEPTED SOLUTION

avatar
Super Collaborator

In my opinion it is best to still regard Hive as an analytical DB. With the ACID (updates) and streaming features the community is stretching the tool to things it wasn't designed for. These are not to be used at very large scale and very large loads. ACID and streaming will put tremendous strain on the Hive metastore.

In the end the native storage model of Hive is still based on streaming through whole HDFS files, even with ORC. Without true indexes Hive will never be a real good match for high transactional workloads. Doing large analytical sweeps/scans through data is still at odds with high speed random read/write/update/delete.

But that is not bad, there are just other components in HDP to do the other jobs right.

View solution in original post

1 REPLY 1

avatar
Super Collaborator

In my opinion it is best to still regard Hive as an analytical DB. With the ACID (updates) and streaming features the community is stretching the tool to things it wasn't designed for. These are not to be used at very large scale and very large loads. ACID and streaming will put tremendous strain on the Hive metastore.

In the end the native storage model of Hive is still based on streaming through whole HDFS files, even with ORC. Without true indexes Hive will never be a real good match for high transactional workloads. Doing large analytical sweeps/scans through data is still at odds with high speed random read/write/update/delete.

But that is not bad, there are just other components in HDP to do the other jobs right.