Member since
04-22-2016
931
Posts
46
Kudos Received
26
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1498 | 10-11-2018 01:38 AM | |
1867 | 09-26-2018 02:24 AM | |
1826 | 06-29-2018 02:35 PM | |
2418 | 06-29-2018 02:34 PM | |
5365 | 06-20-2018 04:30 PM |
10-27-2016
05:08 PM
flumedata.zipif that is the case then its not matching the JSON format . please see the attached file
... View more
10-27-2016
03:54 PM
1 Kudo
with the commands below , what type of file is being produced . JSON or AVRO ? flume-ng agent --conf ./conf/ -f conf/twitter-to-hdfs.properties --name TwitterAgent -Dflume.root.logger=WARN,console -Dtwitter4j.http.proxyHost=proxy.server.com -Dtwitter4j.http.proxyPort=8080
[flume@hadoop1 conf]$ pwd
/home/flume/conf
[flume@hadoop1 conf]$
[flume@hadoop1 conf]$ more twitter-to-hdfs.properties
########################################################
# Twitter agent for collecting Twitter data to HDFS.
########################################################
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
########################################################
# Describing and configuring the sources
########################################################
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.Channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = xxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret =xxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessToken = xxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxxxxxxxxxxxxxxxxx
TwitterAgent.sources.Twitter.Keywords = hadoop,Data Scientist,BigData,Trump,computing,flume,Nifi
#######################################################
# Twitter configuring HDFS sink
########################################################
TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hadoop1:8020/user/flume/tweets
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.WriteFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
#######################################################
# Twitter Channel
########################################################
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 20000
#TwitterAgent.channels.MemChannel.DataDirs =
TwitterAgent.channels.MemChannel.transactionCapacity =1000
#######################################################
# Binding the Source and the Sink to the Channel
########################################################
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channels = MemChannel
[flume@hadoop1 conf]$
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache Hadoop
10-25-2016
08:50 PM
I found the issue thanks to a post on stackoverflow , please see below . http://stackoverflow.com/questions/30657983/type-error-string-from-deserializer-instead-of-int-when-load-csv-to-table
... View more
10-25-2016
08:43 PM
can you give me your hive create table statement please? and the select command from the table ?
... View more
10-25-2016
08:21 PM
flumedata.zip hi Geoffery I created new files based on your template but I still get the same error when querying through hive. did you try creating the table in hive and querying it ? please try it and also try it against my file I am uploading it. thanks CREATE EXTERNAL TABLE tweetdata3(created_at STRING,
text STRING,
person STRUCT<
screen_name:STRING,
name:STRING,
locations:STRING,
description:STRING,
created_at:STRING,
followers_count:INT,
url:STRING>
) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' location '/user/flume/tweets';
hive> describe tweetdata3;
OK
created_at string from deserializer
text string from deserializer
person struct<screen_name:string,name:string,locations:string,description:string,created_at:string,followers_count:int,url:string> from deserializer
Time taken: 0.311 seconds, Fetched: 3 row(s)
hive> select person.name,person.locations, person.created_at, text from tweetdata3;select person.name,person.locations, person.created_at, text from tweetdata3;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: java.io.ByteArrayInputStream@4d5fa2a; line: 1, column: 2]
Time taken: 0.333 seconds
hive>
... View more
10-25-2016
06:51 PM
because I cant read it into hive in a standard way which I see many people are using on web and it works for them . and I have tried 4 different SerDe's .. all give error. CREATE EXTERNAL TABLE tweetdata3(created_at STRING,
text STRING,
person STRUCT<
screen_name:STRING,
name:STRING,
locations:STRING,
description:STRING,
created_at:STRING,
followers_count:INT,
url:STRING>
) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' location '/user/flume/tweets';
hive>
>
> select person.name,person.locations, person.created_at, text from tweetdata3;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: java.io.ByteArrayInputStream@2bc779ed; line: 1, column: 2]
Time taken: 0.274 seconds
hive>
... View more
10-25-2016
06:19 PM
also created a new one just for finding out how to create the twitter json file from flume https://community.hortonworks.com/questions/63419/bad-hdfs-sink-property.html
... View more
10-25-2016
06:18 PM
events1476284674520.zip I have come to the conclusion that the properties file is bad and therefor producing the bad JSON file , can someone point out how I can correct it ? I am uploading the json file its producing, if someone can confirm its bad . flume-ng agent --conf-file twitter-to-hdfs.properties --name agent1 -Dflume.root.logger=WARN,console -Dtwitter4j.http.proxyHost=dotatofwproxy.tolls.dot.state.fl.us -Dtwitter4j.http.proxyPort=8080
[root@hadoop1 ~]# more twitter-to-hdfs.properties
agent1.sources =source1
agent1.sinks = sink1
agent1.channels = channel1
agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
agent1.sources.source1.type = org.apache.flume.source.twitter.TwitterSource
agent1.sources.source1.consumerKey = xxxxxxxxxxxxxxxxxxxxxxxxxTaz
agent1.sources.source1.consumerSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCI9
agent1.sources.source1.accessToken = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxwov
agent1.sources.source1.accessTokenSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxY5H3
agent1.sources.source1.keywords = Clinton Trump
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path = /user/flume/tweets
agent1.sinks.sink1.hdfs.filePrefix = events
agent1.sinks.sink1.hdfs.fileSuffix = .log
agent1.sinks.sink1.hdfs.inUsePrefix = _
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.channels.channel1.type = file
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache Hadoop
10-25-2016
05:49 PM
I already have another thread open but not getting much responses there . can you please follow up on that thread ? also I attached my output file and how I generate it in this thread and the other , can you help me identify why the file is bad ? https://community.hortonworks.com/questions/61181/reading-json-files.html#comment-63383
... View more
10-25-2016
01:45 PM
Artem any advise? I am anxiously waiting for yours or anyone feedback
... View more