Support Questions
Find answers, ask questions, and share your expertise

looking for the low latency query framework to expose the streamed events - what's the best choice ?

Hi ,

we have the following usecase :

ingestion of 3.5 billion log transactions a day , that we need to process , and expose to our front-end reports on top of it .

the reports can be dynamic , and on any of the dimensions of the data .

the response should be in reasonable response time (2-3 seconds max).

the user can query the data (aggregated, top reports) up to 1 year .

the data is persisted to HDFS .

we thought on doing in with spark structured streaming , but the spark sql gives poor performance for this scale without pre-aggregation (that is not dynamic) .

the obvious choice is Vertica or ms-sql columnar DB, or other similar solutions , but they are all expensive .

i thought of ingest the data with spark , and index it in another layer so it give us fast response time .

is there any open source solution for that ? i looked at snappydata example , but they don't seem to shorten the response time in that magnitude according to the benchmark they present vs spark .

please help people ....


Expert Contributor

Did you try to ingest directly in Hive LLAP?


Hi ,

Did you check components Apache Kafka , Apache storm project and Kudu project ?

It may help you to handle faster streaming.



are there any available benchmarks of those frameworks in terms of size of data and query response time ?


Hi, You can check example cases published by each of this project on their website. That may give better idea. Regards, Fahim