Support Questions
Find answers, ask questions, and share your expertise

Not able to read tweets format, never seen such behaviour- Flume

Not able to read tweets format, never seen such behaviour- Flume

Explorer

Getting a different format of tweets through Flume stream, not able to parse. Guess the parameter for has an issue:

TwitterAgent.sources.Twitter1.type = org.apache.flume.source.twitter.TwitterSource 

(this downloads, but format not abnormal- never came across this format)

TwitterAgent.sources.Twitter1.type = poc.hortonworks.flume.source.twitter.TwitterSource 

(not able to download ERROR: Unable to load source type: poc.hortonworks.flume.source.twitter.TwitterSource, class: poc.hortonworks.flume.source.twitter.TwitterSource)

Need urgent guidance guys. Anything you think that you would like to share, or something that i can go through.

The tweets are not even close to any keywords that i m providing.

Its all JUNK.

4 REPLIES 4
Highlighted

Re: Not able to read tweets format, never seen such behaviour- Flume

Mentor

Please provide the sample output. You know that Twitter source is labeled experimental right? Please consider looking at apache nifi, here's a great tutorial for tweets https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.h...

Highlighted

Re: Not able to read tweets format, never seen such behaviour- Flume

Mentor

Your problem is "type" look at this example

a1.sources = r1
a1.channels = c1
a1.sources.r1.type = org.apache.flume.source.twitter.TwitterSource
a1.sources.r1.channels = c1
a1.sources.r1.consumerKey = YOUR_TWITTER_CONSUMER_KEY
a1.sources.r1.consumerSecret = YOUR_TWITTER_CONSUMER_SECRET
a1.sources.r1.accessToken = YOUR_TWITTER_ACCESS_TOKEN
a1.sources.r1.accessTokenSecret = YOUR_TWITTER_ACCESS_TOKEN_SECRET
a1.sources.r1.maxBatchSize = 10
a1.sources.r1.maxBatchDurationMillis = 200
Highlighted

Re: Not able to read tweets format, never seen such behaviour- Flume

Expert Contributor
org.apache.flume.source.twitter.TwitterSource

Is a flume experimantal source. Is actually just an example. It downloads sample stream (not filtered by any keyword).

Also it transforms each event to avro format, thats why is not human-readable. You can create hive avro table on top of it, but be aware it returns very limited set of fields.

I can't find source code of hortonworks lib, but i suspect overall idea is the same.

From my pov is easier to adjust flume example for your needs - remove avro transformation and use twitter4j.filter stream instead of "sample" which is used.

Highlighted

Re: Not able to read tweets format, never seen such behaviour- Flume

Mentor
@valent pawar

thanks to @Michael M for pointing out that output is in avro. I didn't know that. You can download avro-tools.jar and convert the unreadable binary avro file to json using the following commands

java -jar ~/avro-tools-1.7.4.jar tojson twitter.avro > twitter.json

You can download the latest avro tools jar from avro website. Pull one of your result files to local filesystem and run the command above.