Support Questions

Find answers, ask questions, and share your expertise

HiveStreaming or Hive JDBC.

What is the advantage of hive-streaming over hive-jdbc. If we do batch in jdbc what extra advantage hive-streaming has over hive jdbc



Hi @PremKumar Karunakaran

I believe the primary advantage of hive-streaming is that the data is available for query much more quickly allowing for more up to date analysis.

Please see the Hive Streaming wiki page for more info.

Expert Contributor

The advantages depend on your particular use case. If you use a single SQL Insert over JDBC to add a large batch once an hour, for example vs streaming API to do the same - there won't be any advantage. If you have batches that you want to insert every minute, for example, then streaming will be much better. Generally, if your data is available as a continuous stream, streaming will allow you to land it in the target table at very small time intervals and make it immediately visible to readers. The same can't be done efficiently with JDBC.

Streaming API has been integrated with NiFi, Flume and Storm - so there are tools for ingesting event streams into Hive vs the build-it-youself JDBC approach.

Is streaming API integrated with Spark. When we tried to use HiveEndPoint classes within a spark context, many weird class loader issues have come up.

@Eugene Koifman We have tested with Hive JDBC and Hive streaming. The behavior seems to be same when we do compaction along with Hive JDBC. If we do compaction then we don't see much difference between these two. It would be of great help if you could share more details of the advantages of Hive streaming compared to Hive JDBC.