Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

example of spark - kafka - Hbase for real time data like twitter tweets

Highlighted

example of spark - kafka - Hbase for real time data like twitter tweets

Expert Contributor

hello,

I Want to make a demo which includes kafka, spark and Hbase. Like pull tweets from twitter store it in kafka topic, then spark streaming will reads tweets from kafka topic and performs sentiment analysis and stores the analysis result into the Hbase.

can anyone please provide a step by step guide. I am new to hadoop.

Thank you.

1 REPLY 1
Highlighted

Re: example of spark - kafka - Hbase for real time data like twitter tweets

Super Guru
@heta desai

You already have a good idea of how to implement it. However, I'll suggest an easier design.

1. Download, latest version of HDF 3.0 and HDP 2.6.1 from hortonworks website. After installation, create Kafka topics to store data ingested from twitter into Kafka.

2. Use Nifi to ingest data from Twitter. Here is a link to a Nifi Twitter processor.

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-social-media-nar/1.3.0/org.ap...

3. Use Nifi publishKafka processor to push hyour data ingested from Twitter into Kafka topic.

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-kafka-0-10-nar/1.3.0/org.apac...

4. Use Streaming Analytics Manager to create a flow by simple drag and drop which reads from Kafka topic, perform sentiment analysis using processors already provided by Streaming Analytics Manager and then use a processor to push results to HBase. All done without writing a single line of code. Streaming Analytics Manager uses Apache Storm instead of Spark Streaming under the hood. But do you care which tool is used vs your problem is solved.

If you cannot use Streaming Analytics Manager, then you will have to write your Spark Streaming code to ingest data from Kafka and push it to HBase. Here is the doc to integrate Spark Streaming with Kafka.

https://spark.apache.org/docs/latest/streaming-kafka-integration.html

Following link has an example of Java HBase context being used to write to HBase.

https://github.com/tmalaska/SparkOnHBase/blob/master/src/main/scala/org/apache/hadoop/hbase/spark/Ja...

If you follow my suggestion to use Streaming Analytics Manager, you are done at step 4, without writing any code.

Don't have an account?
Coming from Hortonworks? Activate your account here