What is the advantage of hive-streaming over hive-jdbc. If we do batch in jdbc what extra advantage hive-streaming has over hive jdbc
The advantages depend on your particular use case. If you use a single SQL Insert over JDBC to add a large batch once an hour, for example vs streaming API to do the same - there won't be any advantage. If you have batches that you want to insert every minute, for example, then streaming will be much better. Generally, if your data is available as a continuous stream, streaming will allow you to land it in the target table at very small time intervals and make it immediately visible to readers. The same can't be done efficiently with JDBC.
Streaming API has been integrated with NiFi, Flume and Storm - so there are tools for ingesting event streams into Hive vs the build-it-youself JDBC approach.
Is streaming API integrated with Spark. When we tried to use HiveEndPoint classes within a spark context, many weird class loader issues have come up.
@Eugene Koifman We have tested with Hive JDBC and Hive streaming. The behavior seems to be same when we do compaction along with Hive JDBC. If we do compaction then we don't see much difference between these two. It would be of great help if you could share more details of the advantages of Hive streaming compared to Hive JDBC.