Created 10-25-2016 06:18 PM
I have come to the conclusion that the properties file is bad and therefor producing the bad JSON file , can someone point out how I can correct it ? I am uploading the json file its producing, if someone can confirm its bad .
flume-ng agent --conf-file twitter-to-hdfs.properties --name agent1 -Dflume.root.logger=WARN,console -Dtwitter4j.http.proxyHost=dotatofwproxy.tolls.dot.state.fl.us -Dtwitter4j.http.proxyPort=8080 [root@hadoop1 ~]# more twitter-to-hdfs.properties agent1.sources =source1 agent1.sinks = sink1 agent1.channels = channel1 agent1.sources.source1.channels = channel1 agent1.sinks.sink1.channel = channel1 agent1.sources.source1.type = org.apache.flume.source.twitter.TwitterSource agent1.sources.source1.consumerKey = xxxxxxxxxxxxxxxxxxxxxxxxxTaz agent1.sources.source1.consumerSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCI9 agent1.sources.source1.accessToken = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxwov agent1.sources.source1.accessTokenSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxY5H3 agent1.sources.source1.keywords = Clinton Trump agent1.sinks.sink1.type = hdfs agent1.sinks.sink1.hdfs.path = /user/flume/tweets agent1.sinks.sink1.hdfs.filePrefix = events agent1.sinks.sink1.hdfs.fileSuffix = .log agent1.sinks.sink1.hdfs.inUsePrefix = _ agent1.sinks.sink1.hdfs.fileType = DataStream agent1.channels.channel1.type = file
Created 10-25-2016 08:50 PM
I found the issue thanks to a post on stackoverflow , please see below .
Created 10-25-2016 06:36 PM
Sami, I don't see keywords listed as a property for the TwitterSource
https://flume.apache.org/FlumeUserGuide.html#twitter-1-firehose-source-experimental
However, your upload looks to be an avro file, which is what the documentation says you will receive from the source. What is it about your result that you think is incorrect?
Created 10-25-2016 06:51 PM
because I cant read it into hive in a standard way which I see many people are using on web and it works for them .
and I have tried 4 different SerDe's .. all give error.
CREATE EXTERNAL TABLE tweetdata3(created_at STRING, text STRING, person STRUCT< screen_name:STRING, name:STRING, locations:STRING, description:STRING, created_at:STRING, followers_count:INT, url:STRING> ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' location '/user/flume/tweets'; hive> > > select person.name,person.locations, person.created_at, text from tweetdata3; OK Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null') at [Source: java.io.ByteArrayInputStream@2bc779ed; line: 1, column: 2] Time taken: 0.274 seconds hive>
Created 10-25-2016 07:50 PM
Look at my code below I did exactly what you wantd to do and it work just copy and substitute the values to correspond with your environment it should work.
And this is how you launch it ! Substitute the values to fit your setup
/usr/bin/flume-ng agent -c /etc/flume-ng/conf -f /etc/flume-ng/conf/flume.conf -n agent
####################################################### # This is a test configuration created the 31/07/2016 # by Geoffrey Shelton Okot ####################################################### # Twitter Agent ######################################################## # Twitter agent for collecting Twitter data to HDFS. ######################################################## TwitterAgent.sources = Twitter TwitterAgent.channels = MemChannel TwitterAgent.sinks = HDFS ######################################################## # Describing and configuring the sources ######################################################## TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource TwitterAgent.sources.Twitter.Channels = MemChannel TwitterAgent.sources.Twitter.consumerKey = xxxxxxxx TwitterAgent.sources.Twitter.consumerSecret = xxxxxxxx TwitterAgent.sources.Twitter.accessToken = xxxxxxxx TwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxx TwitterAgent.sources.Twitter.Keywords = hadoop,Data Scientist,BigData,Trump,computing,flume,Nifi ####################################################### # Twitter configuring HDFS sink ######################################################## TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true TwitterAgent.sinks.HDFS.channel = MemChannel TwitterAgent.sinks.HDFS.type = hdfs TwitterAgent.sinks.HDFS.hdfs.path = hdfs://namenode.com:8020/user/flume TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream TwitterAgent.sinks.HDFS.hdfs.WriteFormat = Text TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000 TwitterAgent.sinks.HDFS.hdfs.rollSize = 0 TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000 ####################################################### # Twitter Channel ######################################################## TwitterAgent.channels.MemChannel.type = memory TwitterAgent.channels.MemChannel.capacity = 20000 #TwitterAgent.channels.MemChannel.DataDirs = TwitterAgent.channels.MemChannel.transactionCapacity =1000 ####################################################### # Binding the Source and the Sink to the Channel ######################################################## TwitterAgent.sources.Twitter.channels = MemChannel TwitterAgent.sinks.HDFS.channels = MemChannel ########################################################
Created 10-25-2016 08:21 PM
hi Geoffery I created new files based on your template but I still get the same error when querying through hive.
did you try creating the table in hive and querying it ? please try it and also try it against my file I am uploading it.
thanks
CREATE EXTERNAL TABLE tweetdata3(created_at STRING, text STRING, person STRUCT< screen_name:STRING, name:STRING, locations:STRING, description:STRING, created_at:STRING, followers_count:INT, url:STRING> ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' location '/user/flume/tweets'; hive> describe tweetdata3; OK created_at string from deserializer text string from deserializer person struct<screen_name:string,name:string,locations:string,description:string,created_at:string,followers_count:int,url:string> from deserializer Time taken: 0.311 seconds, Fetched: 3 row(s) hive> select person.name,person.locations, person.created_at, text from tweetdata3;select person.name,person.locations, person.created_at, text from tweetdata3; OK Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null') at [Source: java.io.ByteArrayInputStream@4d5fa2a; line: 1, column: 2] Time taken: 0.333 seconds hive>
Created 10-25-2016 08:43 PM
can you give me your hive create table statement please? and the select command from the table ?
Created 10-25-2016 08:50 PM
I found the issue thanks to a post on stackoverflow , please see below .
Created 01-29-2018 09:37 AM
@Sami Ahmad did you finally get any solution. After a year also I'm getting same error.