- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
bad HDFS sink property
- Labels:
Apache Flume
Apache Hadoop
Created ‎10-25-2016 06:18 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have come to the conclusion that the properties file is bad and therefor producing the bad JSON file , can someone point out how I can correct it ? I am uploading the json file its producing, if someone can confirm its bad .
flume-ng agent --conf-file twitter-to-hdfs.properties --name agent1 -Dflume.root.logger=WARN,console -Dtwitter4j.http.proxyHost=dotatofwproxy.tolls.dot.state.fl.us -Dtwitter4j.http.proxyPort=8080 [root@hadoop1 ~]# more twitter-to-hdfs.properties agent1.sources =source1 agent1.sinks = sink1 agent1.channels = channel1 agent1.sources.source1.channels = channel1 agent1.sinks.sink1.channel = channel1 agent1.sources.source1.type = org.apache.flume.source.twitter.TwitterSource agent1.sources.source1.consumerKey = xxxxxxxxxxxxxxxxxxxxxxxxxTaz agent1.sources.source1.consumerSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCI9 agent1.sources.source1.accessToken = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxwov agent1.sources.source1.accessTokenSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxY5H3 agent1.sources.source1.keywords = Clinton Trump agent1.sinks.sink1.type = hdfs agent1.sinks.sink1.hdfs.path = /user/flume/tweets agent1.sinks.sink1.hdfs.filePrefix = events agent1.sinks.sink1.hdfs.fileSuffix = .log agent1.sinks.sink1.hdfs.inUsePrefix = _ agent1.sinks.sink1.hdfs.fileType = DataStream agent1.channels.channel1.type = file
Created ‎10-25-2016 08:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found the issue thanks to a post on stackoverflow , please see below .
Created ‎10-25-2016 06:36 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sami, I don't see keywords listed as a property for the TwitterSource
However, your upload looks to be an avro file, which is what the documentation says you will receive from the source. What is it about your result that you think is incorrect?
Created ‎10-25-2016 06:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
because I cant read it into hive in a standard way which I see many people are using on web and it works for them .
and I have tried 4 different SerDe's .. all give error.
CREATE EXTERNAL TABLE tweetdata3(created_at STRING, text STRING, person STRUCT< screen_name:STRING, name:STRING, locations:STRING, description:STRING, created_at:STRING, followers_count:INT, url:STRING> ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' location '/user/flume/tweets'; hive> > > select person.name,person.locations, person.created_at, text from tweetdata3; OK Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null') at [Source: java.io.ByteArrayInputStream@2bc779ed; line: 1, column: 2] Time taken: 0.274 seconds hive>
Created ‎10-25-2016 07:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Look at my code below I did exactly what you wantd to do and it work just copy and substitute the values to correspond with your environment it should work.
And this is how you launch it ! Substitute the values to fit your setup
/usr/bin/flume-ng agent -c /etc/flume-ng/conf -f /etc/flume-ng/conf/flume.conf -n agent
####################################################### # This is a test configuration created the 31/07/2016 # by Geoffrey Shelton Okot ####################################################### # Twitter Agent ######################################################## # Twitter agent for collecting Twitter data to HDFS. ######################################################## TwitterAgent.sources = Twitter TwitterAgent.channels = MemChannel TwitterAgent.sinks = HDFS ######################################################## # Describing and configuring the sources ######################################################## TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource TwitterAgent.sources.Twitter.Channels = MemChannel TwitterAgent.sources.Twitter.consumerKey = xxxxxxxx TwitterAgent.sources.Twitter.consumerSecret = xxxxxxxx TwitterAgent.sources.Twitter.accessToken = xxxxxxxx TwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxx TwitterAgent.sources.Twitter.Keywords = hadoop,Data Scientist,BigData,Trump,computing,flume,Nifi ####################################################### # Twitter configuring HDFS sink ######################################################## TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true TwitterAgent.sinks.HDFS.channel = MemChannel TwitterAgent.sinks.HDFS.type = hdfs TwitterAgent.sinks.HDFS.hdfs.path = hdfs://namenode.com:8020/user/flume TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream TwitterAgent.sinks.HDFS.hdfs.WriteFormat = Text TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000 TwitterAgent.sinks.HDFS.hdfs.rollSize = 0 TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000 ####################################################### # Twitter Channel ######################################################## TwitterAgent.channels.MemChannel.type = memory TwitterAgent.channels.MemChannel.capacity = 20000 #TwitterAgent.channels.MemChannel.DataDirs = TwitterAgent.channels.MemChannel.transactionCapacity =1000 ####################################################### # Binding the Source and the Sink to the Channel ######################################################## TwitterAgent.sources.Twitter.channels = MemChannel TwitterAgent.sinks.HDFS.channels = MemChannel ########################################################
Created ‎10-25-2016 08:21 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi Geoffery I created new files based on your template but I still get the same error when querying through hive.
did you try creating the table in hive and querying it ? please try it and also try it against my file I am uploading it.
CREATE EXTERNAL TABLE tweetdata3(created_at STRING, text STRING, person STRUCT< screen_name:STRING, name:STRING, locations:STRING, description:STRING, created_at:STRING, followers_count:INT, url:STRING> ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' location '/user/flume/tweets'; hive> describe tweetdata3; OK created_at string from deserializer text string from deserializer person struct<screen_name:string,name:string,locations:string,description:string,created_at:string,followers_count:int,url:string> from deserializer Time taken: 0.311 seconds, Fetched: 3 row(s) hive> select person.name,person.locations, person.created_at, text from tweetdata3;select person.name,person.locations, person.created_at, text from tweetdata3; OK Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null') at [Source: java.io.ByteArrayInputStream@4d5fa2a; line: 1, column: 2] Time taken: 0.333 seconds hive>
Created ‎10-25-2016 08:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
can you give me your hive create table statement please? and the select command from the table ?
Created ‎10-25-2016 08:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found the issue thanks to a post on stackoverflow , please see below .
Created ‎01-29-2018 09:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Sami Ahmad did you finally get any solution. After a year also I'm getting same error.