Support Questions

aliyesami · ‎10-25-2016

I have come to the conclusion that the properties file is bad and therefor producing the bad JSON file , can someone point out how I can correct it ? I am uploading the json file its producing, if someone can confirm its bad .

flume-ng agent --conf-file twitter-to-hdfs.properties --name agent1  -Dflume.root.logger=WARN,console -Dtwitter4j.http.proxyHost=dotatofwproxy.tolls.dot.state.fl.us -Dtwitter4j.http.proxyPort=8080
[root@hadoop1 ~]# more twitter-to-hdfs.properties
agent1.sources =source1
agent1.sinks = sink1
agent1.channels = channel1

agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
agent1.sources.source1.type = org.apache.flume.source.twitter.TwitterSource
agent1.sources.source1.consumerKey = xxxxxxxxxxxxxxxxxxxxxxxxxTaz
agent1.sources.source1.consumerSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCI9
agent1.sources.source1.accessToken = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxwov
agent1.sources.source1.accessTokenSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxY5H3
agent1.sources.source1.keywords = Clinton Trump
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path = /user/flume/tweets
agent1.sinks.sink1.hdfs.filePrefix = events
agent1.sinks.sink1.hdfs.fileSuffix = .log
agent1.sinks.sink1.hdfs.inUsePrefix = _
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.channels.channel1.type = file

aliyesami · ‎10-25-2016

I found the issue thanks to a post on stackoverflow , please see below .

http://stackoverflow.com/questions/30657983/type-error-string-from-deserializer-instead-of-int-when-...

View solution in original post

bhagan · ‎10-25-2016

Sami, I don't see keywords listed as a property for the TwitterSource

https://flume.apache.org/FlumeUserGuide.html#twitter-1-firehose-source-experimental

However, your upload looks to be an avro file, which is what the documentation says you will receive from the source. What is it about your result that you think is incorrect?

aliyesami · ‎10-25-2016

because I cant read it into hive in a standard way which I see many people are using on web and it works for them .

and I have tried 4 different SerDe's .. all give error.

CREATE EXTERNAL TABLE tweetdata3(created_at STRING,
text STRING,
  person STRUCT< 
     screen_name:STRING,
     name:STRING,
     locations:STRING,
     description:STRING,
     created_at:STRING,
     followers_count:INT,
     url:STRING>
) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'  location '/user/flume/tweets';

hive>
    >
    > select person.name,person.locations, person.created_at, text from tweetdata3;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@2bc779ed; line: 1, column: 2]
Time taken: 0.274 seconds
hive>

Shelton · ‎10-25-2016

@Sami Ahmad

Look at my code below I did exactly what you wantd to do and it work just copy and substitute the values to correspond with your environment it should work.

And this is how you launch it ! Substitute the values to fit your setup

/usr/bin/flume-ng agent -c /etc/flume-ng/conf -f /etc/flume-ng/conf/flume.conf -n agent

#######################################################
# This is a test configuration created the 31/07/2016
#    by Geoffrey Shelton Okot
#######################################################
# Twitter Agent
########################################################
# Twitter agent for collecting Twitter data to HDFS.
########################################################
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
########################################################
# Describing and configuring the sources
########################################################
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.Channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = xxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret = xxxxxxxx
TwitterAgent.sources.Twitter.accessToken = xxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxx
TwitterAgent.sources.Twitter.Keywords = hadoop,Data Scientist,BigData,Trump,computing,flume,Nifi
#######################################################
# Twitter configuring  HDFS sink
########################################################
TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://namenode.com:8020/user/flume
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.WriteFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
#######################################################
# Twitter Channel
########################################################
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 20000
#TwitterAgent.channels.MemChannel.DataDirs =
TwitterAgent.channels.MemChannel.transactionCapacity =1000
#######################################################
# Binding the Source and the Sink to the Channel
########################################################
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channels = MemChannel
########################################################

aliyesami · ‎10-25-2016

flumedata.zip

hi Geoffery I created new files based on your template but I still get the same error when querying through hive.

did you try creating the table in hive and querying it ? please try it and also try it against my file I am uploading it.

thanks

CREATE EXTERNAL TABLE tweetdata3(created_at STRING,
text STRING,
  person STRUCT< 
     screen_name:STRING,
     name:STRING,
     locations:STRING,
     description:STRING,
     created_at:STRING,
     followers_count:INT,
     url:STRING>
) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'  location '/user/flume/tweets';


hive> describe tweetdata3;
OK
created_at              string                  from deserializer
text                    string                  from deserializer
person                  struct<screen_name:string,name:string,locations:string,description:string,created_at:string,followers_count:int,url:string>     from deserializer
Time taken: 0.311 seconds, Fetched: 3 row(s)
hive> select person.name,person.locations, person.created_at, text from tweetdata3;select person.name,person.locations, person.created_at, text from tweetdata3;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@4d5fa2a; line: 1, column: 2]
Time taken: 0.333 seconds
hive>

aliyesami · ‎10-25-2016

can you give me your hive create table statement please? and the select command from the table ?

aliyesami · ‎10-25-2016

I found the issue thanks to a post on stackoverflow , please see below .

http://stackoverflow.com/questions/30657983/type-error-string-from-deserializer-instead-of-int-when-...

bedantaguru · ‎01-29-2018

@Sami Ahmad did you finally get any solution. After a year also I'm getting same error.

Cloudera Community

Support Questions

bad HDFS sink property

Flume: HDFS sink: Can't write large files

HDFS Balancer: Why configure same property?

Bad : The Hive Metastore canary failed to create a...

Nifi processor missing property

flume kafkasource, hdfs sink remove avro field

Using Apache Flume Sources and Sinks with Apache N...

Supporting Custom Properties for Expression Langua...

got confused with Flume spoolDirsrc=> morphlineInt...

Cloudera Kafka HDFS Sink Connector cannot connect ...

Using Streams Messaging Manager in CDP to create a...