Getting a different format of tweets through Flume stream, not able to parse. Guess the parameter for has an issue:
TwitterAgent.sources.Twitter1.type = org.apache.flume.source.twitter.TwitterSource
(this downloads, but format not abnormal- never came across this format)
TwitterAgent.sources.Twitter1.type = poc.hortonworks.flume.source.twitter.TwitterSource
(not able to download ERROR: Unable to load source type: poc.hortonworks.flume.source.twitter.TwitterSource, class: poc.hortonworks.flume.source.twitter.TwitterSource)
Need urgent guidance guys. Anything you think that you would like to share, or something that i can go through.
The tweets are not even close to any keywords that i m providing.
Its all JUNK.
Please provide the sample output. You know that Twitter source is labeled experimental right? Please consider looking at apache nifi, here's a great tutorial for tweets https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.h...
Your problem is "type" look at this example
a1.sources = r1 a1.channels = c1 a1.sources.r1.type = org.apache.flume.source.twitter.TwitterSource a1.sources.r1.channels = c1 a1.sources.r1.consumerKey = YOUR_TWITTER_CONSUMER_KEY a1.sources.r1.consumerSecret = YOUR_TWITTER_CONSUMER_SECRET a1.sources.r1.accessToken = YOUR_TWITTER_ACCESS_TOKEN a1.sources.r1.accessTokenSecret = YOUR_TWITTER_ACCESS_TOKEN_SECRET a1.sources.r1.maxBatchSize = 10 a1.sources.r1.maxBatchDurationMillis = 200
Is a flume experimantal source. Is actually just an example. It downloads sample stream (not filtered by any keyword).
Also it transforms each event to avro format, thats why is not human-readable. You can create hive avro table on top of it, but be aware it returns very limited set of fields.
I can't find source code of hortonworks lib, but i suspect overall idea is the same.
From my pov is easier to adjust flume example for your needs - remove avro transformation and use twitter4j.filter stream instead of "sample" which is used.
thanks to @Michael M for pointing out that output is in avro. I didn't know that. You can download avro-tools.jar and convert the unreadable binary avro file to json using the following commands
java -jar ~/avro-tools-1.7.4.jar tojson twitter.avro > twitter.json
You can download the latest avro tools jar from avro website. Pull one of your result files to local filesystem and run the command above.