Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

bad HDFS sink property

Solved Go to solution
Highlighted

bad HDFS sink property

Master Collaborator

events1476284674520.zip

I have come to the conclusion that the properties file is bad and therefor producing the bad JSON file , can someone point out how I can correct it ? I am uploading the json file its producing, if someone can confirm its bad .

flume-ng agent --conf-file twitter-to-hdfs.properties --name agent1  -Dflume.root.logger=WARN,console -Dtwitter4j.http.proxyHost=dotatofwproxy.tolls.dot.state.fl.us -Dtwitter4j.http.proxyPort=8080
[root@hadoop1 ~]# more twitter-to-hdfs.properties
agent1.sources =source1
agent1.sinks = sink1
agent1.channels = channel1

agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
agent1.sources.source1.type = org.apache.flume.source.twitter.TwitterSource
agent1.sources.source1.consumerKey = xxxxxxxxxxxxxxxxxxxxxxxxxTaz
agent1.sources.source1.consumerSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCI9
agent1.sources.source1.accessToken = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxwov
agent1.sources.source1.accessTokenSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxY5H3
agent1.sources.source1.keywords = Clinton Trump
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path = /user/flume/tweets
agent1.sinks.sink1.hdfs.filePrefix = events
agent1.sinks.sink1.hdfs.fileSuffix = .log
agent1.sinks.sink1.hdfs.inUsePrefix = _
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.channels.channel1.type = file

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: bad HDFS sink property

Master Collaborator
7 REPLIES 7
Highlighted

Re: bad HDFS sink property

Expert Contributor

Sami, I don't see keywords listed as a property for the TwitterSource

https://flume.apache.org/FlumeUserGuide.html#twitter-1-firehose-source-experimental

However, your upload looks to be an avro file, which is what the documentation says you will receive from the source. What is it about your result that you think is incorrect?

Highlighted

Re: bad HDFS sink property

Master Collaborator

because I cant read it into hive in a standard way which I see many people are using on web and it works for them .

and I have tried 4 different SerDe's .. all give error.

CREATE EXTERNAL TABLE tweetdata3(created_at STRING,
text STRING,
  person STRUCT< 
     screen_name:STRING,
     name:STRING,
     locations:STRING,
     description:STRING,
     created_at:STRING,
     followers_count:INT,
     url:STRING>
) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'  location '/user/flume/tweets';

hive>
    >
    > select person.name,person.locations, person.created_at, text from tweetdata3;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@2bc779ed; line: 1, column: 2]
Time taken: 0.274 seconds
hive>
Highlighted

Re: bad HDFS sink property

Mentor

@Sami Ahmad

Look at my code below I did exactly what you wantd to do and it work just copy and substitute the values to correspond with your environment it should work.

And this is how you launch it ! Substitute the values to fit your setup

/usr/bin/flume-ng agent -c /etc/flume-ng/conf -f /etc/flume-ng/conf/flume.conf -n agent
#######################################################
# This is a test configuration created the 31/07/2016
#    by Geoffrey Shelton Okot
#######################################################
# Twitter Agent
########################################################
# Twitter agent for collecting Twitter data to HDFS.
########################################################
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
########################################################
# Describing and configuring the sources
########################################################
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.Channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = xxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret = xxxxxxxx
TwitterAgent.sources.Twitter.accessToken = xxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxx
TwitterAgent.sources.Twitter.Keywords = hadoop,Data Scientist,BigData,Trump,computing,flume,Nifi
#######################################################
# Twitter configuring  HDFS sink
########################################################
TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://namenode.com:8020/user/flume
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.WriteFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
#######################################################
# Twitter Channel
########################################################
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 20000
#TwitterAgent.channels.MemChannel.DataDirs =
TwitterAgent.channels.MemChannel.transactionCapacity =1000
#######################################################
# Binding the Source and the Sink to the Channel
########################################################
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channels = MemChannel
########################################################
Highlighted

Re: bad HDFS sink property

Master Collaborator

flumedata.zip

hi Geoffery I created new files based on your template but I still get the same error when querying through hive.

did you try creating the table in hive and querying it ? please try it and also try it against my file I am uploading it.

thanks

CREATE EXTERNAL TABLE tweetdata3(created_at STRING,
text STRING,
  person STRUCT< 
     screen_name:STRING,
     name:STRING,
     locations:STRING,
     description:STRING,
     created_at:STRING,
     followers_count:INT,
     url:STRING>
) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'  location '/user/flume/tweets';


hive> describe tweetdata3;
OK
created_at              string                  from deserializer
text                    string                  from deserializer
person                  struct<screen_name:string,name:string,locations:string,description:string,created_at:string,followers_count:int,url:string>     from deserializer
Time taken: 0.311 seconds, Fetched: 3 row(s)
hive> select person.name,person.locations, person.created_at, text from tweetdata3;select person.name,person.locations, person.created_at, text from tweetdata3;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@4d5fa2a; line: 1, column: 2]
Time taken: 0.333 seconds
hive>

Highlighted

Re: bad HDFS sink property

Master Collaborator

can you give me your hive create table statement please? and the select command from the table ?

Highlighted

Re: bad HDFS sink property

Master Collaborator

Re: bad HDFS sink property

Explorer

@Sami Ahmad did you finally get any solution. After a year also I'm getting same error.

Don't have an account?
Coming from Hortonworks? Activate your account here