Created 05-23-2017 11:03 AM
What is the advantage of hive-streaming over hive-jdbc. If we do batch in jdbc what extra advantage hive-streaming has over hive jdbc
Created 05-23-2017 01:04 PM
I believe the primary advantage of hive-streaming is that the data is available for query much more quickly allowing for more up to date analysis.
Please see the Hive Streaming wiki page for more info.
Created 05-23-2017 03:57 PM
The advantages depend on your particular use case. If you use a single SQL Insert over JDBC to add a large batch once an hour, for example vs streaming API to do the same - there won't be any advantage. If you have batches that you want to insert every minute, for example, then streaming will be much better. Generally, if your data is available as a continuous stream, streaming will allow you to land it in the target table at very small time intervals and make it immediately visible to readers. The same can't be done efficiently with JDBC.
Streaming API has been integrated with NiFi, Flume and Storm - so there are tools for ingesting event streams into Hive vs the build-it-youself JDBC approach.
Created 08-03-2017 04:31 AM
Is streaming API integrated with Spark. When we tried to use HiveEndPoint classes within a spark context, many weird class loader issues have come up.
Created 08-17-2017 08:24 AM
@Eugene Koifman We have tested with Hive JDBC and Hive streaming. The behavior seems to be same when we do compaction along with Hive JDBC. If we do compaction then we don't see much difference between these two. It would be of great help if you could share more details of the advantages of Hive streaming compared to Hive JDBC.