Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

looking for the low latency query framework to expose the streamed events - what's the best choice ?

looking for the low latency query framework to expose the streamed events - what's the best choice ?

Hi ,

we have the following usecase :

ingestion of 3.5 billion log transactions a day , that we need to process , and expose to our front-end reports on top of it .

the reports can be dynamic , and on any of the dimensions of the data .

the response should be in reasonable response time (2-3 seconds max).

the user can query the data (aggregated, top reports) up to 1 year .

the data is persisted to HDFS .

we thought on doing in with spark structured streaming , but the spark sql gives poor performance for this scale without pre-aggregation (that is not dynamic) .

the obvious choice is Vertica or ms-sql columnar DB, or other similar solutions , but they are all expensive .

i thought of ingest the data with spark , and index it in another layer so it give us fast response time .

is there any open source solution for that ? i looked at snappydata example , but they don't seem to shorten the response time in that magnitude according to the benchmark they present vs spark .

please help people ....

4 REPLIES 4
Highlighted

Re: looking for the low latency query framework to expose the streamed events - what's the best choice ?

Expert Contributor

Did you try to ingest directly in Hive LLAP?

Highlighted

Re: looking for the low latency query framework to expose the streamed events - what's the best choice ?

Explorer

Hi ,

Did you check components Apache Kafka , Apache storm project and Kudu project ?

It may help you to handle faster streaming.

Regards,

Fahim

Highlighted

Re: looking for the low latency query framework to expose the streamed events - what's the best choice ?

are there any available benchmarks of those frameworks in terms of size of data and query response time ?

Highlighted

Re: looking for the low latency query framework to expose the streamed events - what's the best choice ?

Explorer

Hi, You can check example cases published by each of this project on their website. That may give better idea. Regards, Fahim

Don't have an account?
Coming from Hortonworks? Activate your account here