Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Running a Druid Realtime node under YARN/Ambari

avatar

I've got Druid 0.10.1 running successfully and ingesting data which gets committed to deep storage after the default interval of an hour.

What I need to do next is start up a realtime node so I can get some streaming jobs writing to Druid, then have some other jobs querying the realtime data immediately, which appears to be only immediately available in a realtime node.

I know how to start a druid realtime node, however I don't see realtime "built" into the Ambari deployment.

Do I simply build a config, and start it as an independent process inside the Hadoop cluster?

Thanks for any and all advice.

1 ACCEPTED SOLUTION

avatar

Ok, after having done some extensive reading about Druid here's what I have determined.

If you run a Tranquility job the indexing service automatically runs. This is in effect what the "Realtime" node does.

So, if you need to get data into Druid via something Tranquility does not offer, then you use a Realtime Node.

As I am bringing data in from Kafka via Tranquility I have no need to run an actual Realtime node (which supports things like HTTP input).

View solution in original post

1 REPLY 1

avatar

Ok, after having done some extensive reading about Druid here's what I have determined.

If you run a Tranquility job the indexing service automatically runs. This is in effect what the "Realtime" node does.

So, if you need to get data into Druid via something Tranquility does not offer, then you use a Realtime Node.

As I am bringing data in from Kafka via Tranquility I have no need to run an actual Realtime node (which supports things like HTTP input).