<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: bad HDFS sink property in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/bad-HDFS-sink-property/m-p/149353#M44494</link>
    <description>&lt;P&gt;because I cant read it into hive in a standard way which I see many people are using on web and it works for them .&lt;/P&gt;&lt;P&gt;and I have tried 4 different SerDe's  .. all give error.&lt;/P&gt;&lt;PRE&gt;CREATE EXTERNAL TABLE tweetdata3(created_at STRING,
text STRING,
  person STRUCT&amp;lt; 
     screen_name:STRING,
     name:STRING,
     locations:STRING,
     description:STRING,
     created_at:STRING,
     followers_count:INT,
     url:STRING&amp;gt;
) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'  location '/user/flume/tweets';

hive&amp;gt;
    &amp;gt;
    &amp;gt; select person.name,person.locations, person.created_at, text from tweetdata3;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@2bc779ed; line: 1, column: 2]
Time taken: 0.274 seconds
hive&amp;gt;
&lt;/PRE&gt;</description>
    <pubDate>Wed, 26 Oct 2016 01:51:35 GMT</pubDate>
    <dc:creator>aliyesami</dc:creator>
    <dc:date>2016-10-26T01:51:35Z</dc:date>
    <item>
      <title>bad HDFS sink property</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/bad-HDFS-sink-property/m-p/149351#M44492</link>
      <description>&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/8875-events1476284674520.zip"&gt;events1476284674520.zip&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I have come to the conclusion that the properties file is bad and therefor producing the bad JSON file , can someone point out how I can correct it ?  I am uploading the json file its producing, if someone can confirm its bad .&lt;/P&gt;&lt;PRE&gt;flume-ng agent --conf-file twitter-to-hdfs.properties --name agent1  -Dflume.root.logger=WARN,console -Dtwitter4j.http.proxyHost=dotatofwproxy.tolls.dot.state.fl.us -Dtwitter4j.http.proxyPort=8080
[root@hadoop1 ~]# more twitter-to-hdfs.properties
agent1.sources =source1
agent1.sinks = sink1
agent1.channels = channel1

agent1.sources.source1.channels = channel1
agent1.sinks.sink1.channel = channel1
agent1.sources.source1.type = org.apache.flume.source.twitter.TwitterSource
agent1.sources.source1.consumerKey = xxxxxxxxxxxxxxxxxxxxxxxxxTaz
agent1.sources.source1.consumerSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCI9
agent1.sources.source1.accessToken = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxwov
agent1.sources.source1.accessTokenSecret = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxY5H3
agent1.sources.source1.keywords = Clinton Trump
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path = /user/flume/tweets
agent1.sinks.sink1.hdfs.filePrefix = events
agent1.sinks.sink1.hdfs.fileSuffix = .log
agent1.sinks.sink1.hdfs.inUsePrefix = _
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.channels.channel1.type = file

&lt;/PRE&gt;</description>
      <pubDate>Wed, 26 Oct 2016 01:18:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/bad-HDFS-sink-property/m-p/149351#M44492</guid>
      <dc:creator>aliyesami</dc:creator>
      <dc:date>2016-10-26T01:18:12Z</dc:date>
    </item>
    <item>
      <title>Re: bad HDFS sink property</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/bad-HDFS-sink-property/m-p/149352#M44493</link>
      <description>&lt;P&gt;Sami, I don't see keywords listed as a property for the TwitterSource&lt;/P&gt;&lt;P&gt;&lt;A href="https://flume.apache.org/FlumeUserGuide.html#twitter-1-firehose-source-experimental" target="_blank"&gt;https://flume.apache.org/FlumeUserGuide.html#twitter-1-firehose-source-experimental&lt;/A&gt;&lt;/P&gt;&lt;P&gt;However, your upload looks to be an avro file, which is what the documentation says you will receive from the source. What is it about your result that you think is incorrect?&lt;/P&gt;</description>
      <pubDate>Wed, 26 Oct 2016 01:36:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/bad-HDFS-sink-property/m-p/149352#M44493</guid>
      <dc:creator>bhagan</dc:creator>
      <dc:date>2016-10-26T01:36:11Z</dc:date>
    </item>
    <item>
      <title>Re: bad HDFS sink property</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/bad-HDFS-sink-property/m-p/149353#M44494</link>
      <description>&lt;P&gt;because I cant read it into hive in a standard way which I see many people are using on web and it works for them .&lt;/P&gt;&lt;P&gt;and I have tried 4 different SerDe's  .. all give error.&lt;/P&gt;&lt;PRE&gt;CREATE EXTERNAL TABLE tweetdata3(created_at STRING,
text STRING,
  person STRUCT&amp;lt; 
     screen_name:STRING,
     name:STRING,
     locations:STRING,
     description:STRING,
     created_at:STRING,
     followers_count:INT,
     url:STRING&amp;gt;
) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'  location '/user/flume/tweets';

hive&amp;gt;
    &amp;gt;
    &amp;gt; select person.name,person.locations, person.created_at, text from tweetdata3;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@2bc779ed; line: 1, column: 2]
Time taken: 0.274 seconds
hive&amp;gt;
&lt;/PRE&gt;</description>
      <pubDate>Wed, 26 Oct 2016 01:51:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/bad-HDFS-sink-property/m-p/149353#M44494</guid>
      <dc:creator>aliyesami</dc:creator>
      <dc:date>2016-10-26T01:51:35Z</dc:date>
    </item>
    <item>
      <title>Re: bad HDFS sink property</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/bad-HDFS-sink-property/m-p/149354#M44495</link>
      <description>&lt;P&gt;
	@&lt;A href="https://community.hortonworks.com/users/10115/sahmad43.html"&gt;Sami Ahmad&lt;/A&gt;&lt;/P&gt;&lt;P&gt;
	Look at my code below I did exactly what you wantd to do and it work just copy and substitute the values to correspond with your environment it should work.&lt;/P&gt;&lt;P&gt;And this is how you launch it ! Substitute the values to fit your setup&lt;/P&gt;&lt;PRE&gt;/usr/bin/flume-ng agent -c /etc/flume-ng/conf -f /etc/flume-ng/conf/flume.conf -n agent&lt;/PRE&gt;&lt;PRE&gt;#######################################################
# This is a test configuration created the 31/07/2016
#    by Geoffrey Shelton Okot
#######################################################
# Twitter Agent
########################################################
# Twitter agent for collecting Twitter data to HDFS.
########################################################
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
########################################################
# Describing and configuring the sources
########################################################
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.Channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = xxxxxxxx
TwitterAgent.sources.Twitter.consumerSecret = xxxxxxxx
TwitterAgent.sources.Twitter.accessToken = xxxxxxxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxx
TwitterAgent.sources.Twitter.Keywords = hadoop,Data Scientist,BigData,Trump,computing,flume,Nifi
#######################################################
# Twitter configuring  HDFS sink
########################################################
TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://namenode.com:8020/user/flume
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.WriteFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
#######################################################
# Twitter Channel
########################################################
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 20000
#TwitterAgent.channels.MemChannel.DataDirs =
TwitterAgent.channels.MemChannel.transactionCapacity =1000
#######################################################
# Binding the Source and the Sink to the Channel
########################################################
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channels = MemChannel
########################################################&lt;/PRE&gt;</description>
      <pubDate>Wed, 26 Oct 2016 02:50:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/bad-HDFS-sink-property/m-p/149354#M44495</guid>
      <dc:creator>Shelton</dc:creator>
      <dc:date>2016-10-26T02:50:13Z</dc:date>
    </item>
    <item>
      <title>Re: bad HDFS sink property</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/bad-HDFS-sink-property/m-p/149355#M44496</link>
      <description>&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/8877-flumedata.zip"&gt;flumedata.zip&lt;/A&gt;&lt;/P&gt;&lt;P&gt;hi Geoffery I created new files based on your template but I still get the same error when querying through hive. &lt;/P&gt;&lt;P&gt;did you try creating the table in hive and querying  it ?    please try it and also try it against my file I am uploading it.&lt;/P&gt;&lt;P&gt;thanks&lt;/P&gt;&lt;PRE&gt;CREATE EXTERNAL TABLE tweetdata3(created_at STRING,
text STRING,
  person STRUCT&amp;lt; 
     screen_name:STRING,
     name:STRING,
     locations:STRING,
     description:STRING,
     created_at:STRING,
     followers_count:INT,
     url:STRING&amp;gt;
) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'  location '/user/flume/tweets';


hive&amp;gt; describe tweetdata3;
OK
created_at              string                  from deserializer
text                    string                  from deserializer
person                  struct&amp;lt;screen_name:string,name:string,locations:string,description:string,created_at:string,followers_count:int,url:string&amp;gt;     from deserializer
Time taken: 0.311 seconds, Fetched: 3 row(s)
hive&amp;gt; select person.name,person.locations, person.created_at, text from tweetdata3;select person.name,person.locations, person.created_at, text from tweetdata3;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@4d5fa2a; line: 1, column: 2]
Time taken: 0.333 seconds
hive&amp;gt;

&lt;/PRE&gt;</description>
      <pubDate>Wed, 26 Oct 2016 03:21:51 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/bad-HDFS-sink-property/m-p/149355#M44496</guid>
      <dc:creator>aliyesami</dc:creator>
      <dc:date>2016-10-26T03:21:51Z</dc:date>
    </item>
    <item>
      <title>Re: bad HDFS sink property</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/bad-HDFS-sink-property/m-p/149356#M44497</link>
      <description>&lt;P&gt;can you give me your hive create table statement please? and the select command from the table ?&lt;/P&gt;</description>
      <pubDate>Wed, 26 Oct 2016 03:43:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/bad-HDFS-sink-property/m-p/149356#M44497</guid>
      <dc:creator>aliyesami</dc:creator>
      <dc:date>2016-10-26T03:43:16Z</dc:date>
    </item>
    <item>
      <title>Re: bad HDFS sink property</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/bad-HDFS-sink-property/m-p/149357#M44498</link>
      <description>&lt;P&gt;I found the issue thanks to a post on stackoverflow , please see below . &lt;/P&gt;&lt;P&gt;&lt;A href="http://stackoverflow.com/questions/30657983/type-error-string-from-deserializer-instead-of-int-when-load-csv-to-table"&gt;http://stackoverflow.com/questions/30657983/type-error-string-from-deserializer-instead-of-int-when-load-csv-to-table&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 26 Oct 2016 03:50:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/bad-HDFS-sink-property/m-p/149357#M44498</guid>
      <dc:creator>aliyesami</dc:creator>
      <dc:date>2016-10-26T03:50:05Z</dc:date>
    </item>
    <item>
      <title>Re: bad HDFS sink property</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/bad-HDFS-sink-property/m-p/149358#M44499</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/10115/sahmad43.html" nodeid="10115"&gt;@Sami Ahmad&lt;/A&gt; did you finally get any solution. After a year also I'm getting same error. &lt;/P&gt;</description>
      <pubDate>Mon, 29 Jan 2018 17:37:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/bad-HDFS-sink-property/m-p/149358#M44499</guid>
      <dc:creator>bedantaguru</dc:creator>
      <dc:date>2018-01-29T17:37:39Z</dc:date>
    </item>
  </channel>
</rss>

