<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Use Flume to get a webpage data. How to configure, how to use it to stream data in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48118#M49206</link>
    <description>&lt;P&gt;Since you are going to use third party webpage I assuming that you wont be able to integrate or deploy flume sdk.if the webpage is ok in sending data via HTTP rather than using Flume's RPC , then I think HTTP source would be a good fit. From a client point of view HTTP source will act like a web server that accepts flume event.Either you can write your own Handler or use HTTPSourceXMLHandler in your configuration , the default Handler accepts Json format .&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The format which&amp;nbsp;&lt;SPAN&gt;HTTPSourceXMLHandler accept is state below&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;&amp;lt;events&amp;gt;
&amp;lt;event 1 2 3 ..&amp;gt;
&amp;lt;headers&amp;gt;
&amp;lt;header 1 2 3 ..&amp;gt;
&amp;lt;/header&amp;gt;
&amp;lt;body&amp;gt; &amp;lt;/body&amp;gt;
&amp;lt;/event..&amp;gt;
&amp;lt;/events&amp;gt;&lt;/PRE&gt;&lt;P&gt;&lt;SPAN&gt;The handler will parse the XML into flume events and pass it on to the HTTP Source. Which will then pass on to Channel and goes to Sink or Another agent depends on the flow. &amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 29 Nov 2016 13:42:33 GMT</pubDate>
    <dc:creator>csguna</dc:creator>
    <dc:date>2016-11-29T13:42:33Z</dc:date>
    <item>
      <title>Use Flume to get a webpage data. How to configure, how to use it to stream data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48080#M49205</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have 3 node cluster, using latest cloudera parcels for 5.9 version. OS is CentOS 6.7 on all three of them.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am using Flume for the 1st time. I have just used 'add service' option on CLoudera GUI to add Flume.&amp;nbsp;&lt;/P&gt;&lt;P&gt;My purpose is to get the data from a webpage to hdfs/hbase.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you please help me how can I do it? what else do I need to make the data streaming from a webpage possible.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also, I have a seen an example on net for Twitter, there we need to make token on twitter page to get the data. However the webpage I am referring is a 3rd party one and I am not sure how to configure Flume to get the data on my cluster. I guess it would be over http.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please help me to get this done.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in Advance.&lt;/P&gt;&lt;P&gt;Shilpa&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:49:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48080#M49205</guid>
      <dc:creator>ShilpaSinha</dc:creator>
      <dc:date>2022-09-16T10:49:54Z</dc:date>
    </item>
    <item>
      <title>Re: Use Flume to get a webpage data. How to configure, how to use it to stream data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48118#M49206</link>
      <description>&lt;P&gt;Since you are going to use third party webpage I assuming that you wont be able to integrate or deploy flume sdk.if the webpage is ok in sending data via HTTP rather than using Flume's RPC , then I think HTTP source would be a good fit. From a client point of view HTTP source will act like a web server that accepts flume event.Either you can write your own Handler or use HTTPSourceXMLHandler in your configuration , the default Handler accepts Json format .&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The format which&amp;nbsp;&lt;SPAN&gt;HTTPSourceXMLHandler accept is state below&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;&amp;lt;events&amp;gt;
&amp;lt;event 1 2 3 ..&amp;gt;
&amp;lt;headers&amp;gt;
&amp;lt;header 1 2 3 ..&amp;gt;
&amp;lt;/header&amp;gt;
&amp;lt;body&amp;gt; &amp;lt;/body&amp;gt;
&amp;lt;/event..&amp;gt;
&amp;lt;/events&amp;gt;&lt;/PRE&gt;&lt;P&gt;&lt;SPAN&gt;The handler will parse the XML into flume events and pass it on to the HTTP Source. Which will then pass on to Channel and goes to Sink or Another agent depends on the flow. &amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 29 Nov 2016 13:42:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48118#M49206</guid>
      <dc:creator>csguna</dc:creator>
      <dc:date>2016-11-29T13:42:33Z</dc:date>
    </item>
    <item>
      <title>Re: Use Flume to get a webpage data. How to configure, how to use it to stream data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48149#M49207</link>
      <description>&lt;P&gt;Thanks so much for the reply. It really helps me to understand how can it work.&amp;nbsp;&lt;/P&gt;&lt;P&gt;However as I told you i am using Flume for the 1st time, i have no idea how to change its configurations to make it work. my configs are currently set to default values.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I will really appreciate it if you could help me with them. PFB&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Flume configs.PNG" style="width: 600px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/2425i55584095EAEDD02A/image-size/large?v=v2&amp;amp;px=999" role="button" title="Flume configs.PNG" alt="Flume configs.PNG" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline"&gt;&lt;img /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="flume configs 2.PNG" style="width: 600px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/2428i9AD78C232FFA2760/image-size/large?v=v2&amp;amp;px=999" role="button" title="flume configs 2.PNG" alt="flume configs 2.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I look forward to your reply.&lt;/P&gt;&lt;P&gt;Thanks in Advance!&lt;/P&gt;&lt;P&gt;Shilpa&lt;/P&gt;</description>
      <pubDate>Tue, 29 Nov 2016 20:11:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48149#M49207</guid>
      <dc:creator>ShilpaSinha</dc:creator>
      <dc:date>2016-11-29T20:11:56Z</dc:date>
    </item>
    <item>
      <title>Re: Use Flume to get a webpage data. How to configure, how to use it to stream data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48211#M49208</link>
      <description>&lt;P&gt;Please help me with the Flume configurations. I am running out of time.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Shilpa&lt;/P&gt;</description>
      <pubDate>Wed, 30 Nov 2016 22:57:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48211#M49208</guid>
      <dc:creator>ShilpaSinha</dc:creator>
      <dc:date>2016-11-30T22:57:32Z</dc:date>
    </item>
    <item>
      <title>Re: Use Flume to get a webpage data. How to configure, how to use it to stream data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48287#M49209</link>
      <description>&lt;P&gt;Hey guys!&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am still wating for the reply. Please help me with those configs or point me to a document.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Shilpa&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2016 17:26:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48287#M49209</guid>
      <dc:creator>ShilpaSinha</dc:creator>
      <dc:date>2016-12-02T17:26:36Z</dc:date>
    </item>
    <item>
      <title>Re: Use Flume to get a webpage data. How to configure, how to use it to stream data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48290#M49210</link>
      <description>Flume doesn't have the ability to poll an http service, however it can act as an http service itself (&lt;A href="http://flume.apache.org/FlumeUserGuide.html#http-source" target="_blank"&gt;http://flume.apache.org/FlumeUserGuide.html#http-source&lt;/A&gt;) that you can post json data to (or other formats).&lt;BR /&gt;&lt;BR /&gt;I would suggest reviewing the documentation here: &lt;A href="http://flume.apache.org/FlumeUserGuide.html" target="_blank"&gt;http://flume.apache.org/FlumeUserGuide.html&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;To see some examples and different configuration options.&lt;BR /&gt;&lt;BR /&gt;In Cloudera Manager, you will be editing Configuration file section, and that is the configuration that is read when flume starts up.&lt;BR /&gt;&lt;BR /&gt;-pd</description>
      <pubDate>Fri, 02 Dec 2016 20:05:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48290#M49210</guid>
      <dc:creator>pdvorak</dc:creator>
      <dc:date>2016-12-02T20:05:22Z</dc:date>
    </item>
    <item>
      <title>Re: Use Flume to get a webpage data. How to configure, how to use it to stream data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48291#M49211</link>
      <description>&lt;P&gt;Thanks pd. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I will go through the links and will get back If I am stuck anywhere.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;-Shilpa&lt;/P&gt;</description>
      <pubDate>Fri, 02 Dec 2016 20:51:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48291#M49211</guid>
      <dc:creator>ShilpaSinha</dc:creator>
      <dc:date>2016-12-02T20:51:41Z</dc:date>
    </item>
    <item>
      <title>Re: Use Flume to get a webpage data. How to configure, how to use it to stream data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48582#M49212</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/2139"&gt;@pdvorak&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My flume.conf look like below. However I have a question as i want to get data from a third party website, in my case a news website.&lt;/P&gt;
&lt;P&gt;I dont know which port to use in the configuration, please see highlighted part below. I thought of using 8080 but&lt;/P&gt;
&lt;P&gt;it is not open.&lt;/P&gt;
&lt;P&gt;it i&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="flume conf.PNG" style="width: 466px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/2454iAA7125C40C26D9E6/image-size/large?v=v2&amp;amp;px=999" role="button" title="flume conf.PNG" alt="flume conf.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="andale mono,times"&gt;[root@LnxMasterNode01 ~]# nc -l 132.247.1.32 8080&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="andale mono,times"&gt;nc: Cannot assign requested address&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Can you tell me how to deal with this part.&lt;/P&gt;
&lt;P&gt;Also, i am looking for special keywords in this website such as fashion, sports and i want the data only related to them. Can you tell me if this can be done using multiple sink for each header such as:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="andale mono,times"&gt;&lt;SPAN&gt;&lt;EM&gt;# list the sources, sinks and channels in the agent&lt;/EM&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="andale mono,times"&gt;&lt;SPAN&gt;agent_foo.sources&lt;/SPAN&gt; &lt;SPAN&gt;&lt;STRONG&gt;=&lt;/STRONG&gt;&lt;/SPAN&gt; &lt;SPAN&gt;avro-AppSrv-source1&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="andale mono,times"&gt;&lt;SPAN&gt;agent_foo.sinks&lt;/SPAN&gt; &lt;SPAN&gt;&lt;STRONG&gt;=&lt;/STRONG&gt;&lt;/SPAN&gt; &lt;SPAN&gt;hdfs-Cluster1-sink1 avro-forward-sink2&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="andale mono,times"&gt;&lt;SPAN&gt;agent_foo.channels&lt;/SPAN&gt; &lt;SPAN&gt;&lt;STRONG&gt;=&lt;/STRONG&gt;&lt;/SPAN&gt; &lt;SPAN&gt;mem-channel-1 mem-channel-2&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="andale mono,times"&gt;&lt;SPAN&gt;&lt;EM&gt;# set channels for source&lt;/EM&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="andale mono,times"&gt;&lt;SPAN&gt;agent_foo.sources.avro-AppSrv-source1.channels&lt;/SPAN&gt; &lt;SPAN&gt;&lt;STRONG&gt;=&lt;/STRONG&gt;&lt;/SPAN&gt; &lt;SPAN&gt;mem-channel-1 mem-channel-2&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="andale mono,times"&gt;&lt;SPAN&gt;&lt;EM&gt;# set channel for sinks&lt;/EM&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="andale mono,times"&gt;&lt;SPAN&gt;agent_foo.sinks.hdfs-Cluster1-sink1.channel&lt;/SPAN&gt; &lt;SPAN&gt;&lt;STRONG&gt;=&lt;/STRONG&gt;&lt;/SPAN&gt; &lt;SPAN&gt;mem-channel-1 mem-channel-2&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="andale mono,times"&gt;&lt;SPAN&gt;&lt;EM&gt;# channel selector configuration&lt;/EM&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="andale mono,times"&gt;&lt;SPAN&gt;agent_foo.sources.avro-AppSrv-source1.selector.type&lt;/SPAN&gt; &lt;SPAN&gt;&lt;STRONG&gt;=&lt;/STRONG&gt;&lt;/SPAN&gt; &lt;SPAN&gt;multiplexing&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="andale mono,times"&gt;&lt;SPAN&gt;agent_foo.sources.avro-AppSrv-source1.selector.header&lt;/SPAN&gt; &lt;SPAN&gt;&lt;STRONG&gt;=&lt;/STRONG&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;SPAN&gt;Category&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="andale mono,times"&gt;&lt;SPAN&gt;agent_foo.sources.avro-AppSrv-source1.selector.mapping.Fashion&lt;/SPAN&gt;&amp;nbsp;&lt;SPAN&gt;&lt;STRONG&gt;=&lt;/STRONG&gt;&lt;/SPAN&gt; &lt;SPAN&gt;mem-channel-1&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="andale mono,times"&gt;&lt;SPAN&gt;agent_foo.sources.avro-AppSrv-source1.selector.mapping.Baseball&lt;/SPAN&gt;&amp;nbsp;&lt;SPAN&gt;&lt;STRONG&gt;=&lt;/STRONG&gt;&lt;/SPAN&gt; mem&lt;SPAN&gt;-channel-2&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="andale mono,times"&gt;&lt;SPAN&gt;agent_foo.sources.avro-AppSrv-source1.selector.mapping.Basketball&lt;/SPAN&gt;&amp;nbsp;&lt;SPAN&gt;&lt;STRONG&gt;=&lt;/STRONG&gt;&lt;/SPAN&gt; &lt;SPAN&gt;mem-channel-1 mem-channel-2&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="andale mono,times"&gt;&lt;SPAN&gt;agent_foo.sources.avro-AppSrv-source1.selector.default&lt;/SPAN&gt; &lt;SPAN&gt;&lt;STRONG&gt;=&lt;/STRONG&gt;&lt;/SPAN&gt; &lt;SPAN&gt;mem-channel-1&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thanks in Advance!&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Shilpa&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Dec 2016 13:36:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48582#M49212</guid>
      <dc:creator>ShilpaSinha</dc:creator>
      <dc:date>2016-12-16T13:36:40Z</dc:date>
    </item>
    <item>
      <title>Re: Use Flume to get a webpage data. How to configure, how to use it to stream data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48605#M49213</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/2139"&gt;@pdvorak&lt;/a&gt;/ All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please help. I am not sure how to deal with it.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Shilpa&lt;/P&gt;</description>
      <pubDate>Fri, 16 Dec 2016 17:49:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48605#M49213</guid>
      <dc:creator>ShilpaSinha</dc:creator>
      <dc:date>2016-12-16T17:49:13Z</dc:date>
    </item>
    <item>
      <title>Re: Use Flume to get a webpage data. How to configure, how to use it to stream data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48655#M49214</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/185"&gt;@hshreedharan&lt;/a&gt;&amp;nbsp;/&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/2139"&gt;@pdvorak&lt;/a&gt;&amp;nbsp;/ All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My project is in crisis really need help.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;my flume.conf is as follows:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;tier1.sources&amp;nbsp; = http-source&lt;/P&gt;&lt;P&gt;tier1.channels = mem-channel-1&lt;/P&gt;&lt;P&gt;tier1.sinks&amp;nbsp;&amp;nbsp;&amp;nbsp; = hdfs-sink&lt;/P&gt;&lt;P&gt;# For each source, channel, and sink, set&lt;/P&gt;&lt;P&gt;# standard properties.&lt;/P&gt;&lt;P&gt;tier1.sources.http-source.type&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = http&lt;/P&gt;&lt;P&gt;tier1.sources.http-source.handler = org.apache.flume.source.http.JSONHandler&lt;/P&gt;&lt;P&gt;tier1.sources.http-source.bind&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = 132.247.1.32&lt;/P&gt;&lt;P&gt;tier1.sources.http-source.port&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = 21&lt;/P&gt;&lt;P&gt;tier1.sources.http-source.channels = mem-channel-1&lt;/P&gt;&lt;P&gt;tier1.channels.mem-channel-1.type&amp;nbsp;&amp;nbsp; = memory&lt;/P&gt;&lt;P&gt;tier1.sinks.hdfs-sink.type&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = hdfs&lt;/P&gt;&lt;P&gt;tier1.sinks.hdfs-sink.channel&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = mem-channel-1&lt;/P&gt;&lt;P&gt;tier1.sinks.hdfs-sink.hdfs.path&amp;nbsp;&amp;nbsp;&amp;nbsp; = /flume/events/%y-%m-%d/%H%M/%S&lt;/P&gt;&lt;P&gt;# Other properties are specific to each type of&lt;/P&gt;&lt;P&gt;# source, channel, or sink. In this case, we&lt;/P&gt;&lt;P&gt;# specify the capacity of the memory channel.&lt;/P&gt;&lt;P&gt;tier1.channels.mem-channel-1.capacity = 100&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;The error I am getting while starting flume-ng:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;2016-12-19 15:52:27,356 WARN org.mortbay.log: failed SelectChannelConnector@132.247.1.32:21: java.net.BindException: Cannot assign requested address&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;2016-12-19 15:52:27,356 WARN org.mortbay.log: failed Server@9439eee: java.net.BindException: Cannot assign requested address&lt;/FONT&gt;&lt;BR /&gt;&lt;STRONG&gt;&lt;FONT size="2"&gt;2016-12-19 15:52:27,357 ERROR org.apache.flume.source.http.HTTPSource: Error while starting HTTPSource. Exception follows.&lt;/FONT&gt;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&lt;FONT size="2"&gt;java.net.BindException: Cannot assign requested address&lt;/FONT&gt;&lt;/STRONG&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at sun.nio.ch.Net.bind0(Native Method)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at sun.nio.ch.Net.bind(Net.java:444)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at sun.nio.ch.Net.bind(Net.java:436)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at org.mortbay.jetty.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:315)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at org.mortbay.jetty.Server.doStart(Server.java:235)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at org.apache.flume.source.http.HTTPSource.start(HTTPSource.java:207)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at java.lang.Thread.run(Thread.java:745)&lt;/FONT&gt;&lt;BR /&gt;&lt;STRONG&gt;&lt;FONT size="2"&gt;2016-12-19 15:52:27,367 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner: { source:org.apache.flume.source.http.HTTPSource{name:http-source,state:IDLE} } - Exception follows.&lt;/FONT&gt;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&lt;FONT size="2"&gt;java.lang.RuntimeException: java.net.BindException: Cannot assign requested address&lt;/FONT&gt;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&lt;FONT size="2"&gt;at com.google.common.base.Throwables.propagate(Throwables.java:156)&lt;/FONT&gt;&lt;/STRONG&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at org.apache.flume.source.http.HTTPSource.start(HTTPSource.java:211)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I guess, the source is not able to connect to IP and Port I provided. This is because the IP I provided is of 3rd Party, a news webpage. And I dont know which port is open to connect.&lt;/P&gt;&lt;P&gt;Please guide me how can I can enable data streaming from a 3rd Party webpage to HDFS using Flume.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also, I cannot see my plugins.d for Flume. Can this be one of the reason? Do i have to download it jars separetly? I have installed Flume from Cloudera Manager UI.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;[root@LnxMasterNode01 CDH-5.9.0-1.cdh5.9.0.p0.23]# cd /var/lib/flume-ng/&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;[root@LnxMasterNode01 flume-ng]# ll&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;total 0&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;[root@LnxMasterNode01 flume-ng]# cd /&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;[root@LnxMasterNode01 /]# find . -name plugins.d&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;./etc/audisp/plugins.d&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;[root@LnxMasterNode01 /]#&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/185"&gt;@hshreedharan&lt;/a&gt;&amp;nbsp;I saw your post on Flume on a seperate forum hence looped you. Please help.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Shilpa&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Dec 2016 22:26:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48655#M49214</guid>
      <dc:creator>ShilpaSinha</dc:creator>
      <dc:date>2016-12-19T22:26:57Z</dc:date>
    </item>
    <item>
      <title>Re: Use Flume to get a webpage data. How to configure, how to use it to stream data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48659#M49215</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/2139"&gt;@pdvorak&lt;/a&gt;&amp;nbsp;/&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/185"&gt;@hshreedharan&lt;/a&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I ran a curl on the IP and saw it is using port 80 to connect to the news webpage. Even Telnet is working on the port 80.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;[root@LnxMasterNode01 /]# telnet 132.247.1.32 80&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;Trying 132.247.1.32...&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;Connected to 132.247.1.32.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;Escape character is '^]'.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;^CConnection closed by foreign host.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="3"&gt;However when restarting flume, I am getting the same error as earlier. Can this be ONLY related to absence of plugins.d(see the previous post)&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="3"&gt;&lt;FONT size="2"&gt;&lt;STRONG&gt;2016-12-19 16:45:00,353 WARN org.mortbay.log: failed SelectChannelConnector@132.247.1.32:80: java.net.BindException: Cannot assign requested address&lt;/STRONG&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&lt;STRONG&gt;2016-12-19 16:45:00,353 WARN org.mortbay.log: failed Server@36772002: java.net.BindException: Cannot assign requested address&lt;/STRONG&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&lt;STRONG&gt;2016-12-19 16:45:00,353 ERROR org.apache.flume.source.http.HTTPSource: Error while starting HTTPSource. Exception follows.&lt;/STRONG&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&lt;STRONG&gt;java.net.BindException: Cannot assign requested address&lt;/STRONG&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at sun.nio.ch.Net.bind0(Native Method)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at sun.nio.ch.Net.bind(Net.java:444)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at sun.nio.ch.Net.bind(Net.java:436)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at org.mortbay.jetty.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:315)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at org.mortbay.jetty.Server.doStart(Server.java:235)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at org.apache.flume.source.http.HTTPSource.start(HTTPSource.java:207)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;at java.lang.Thread.run(Thread.java:745)&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&lt;STRONG&gt;2016-12-19 16:45:00,364 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner: { source:org.apache.flume.source.http.HTTPSource{&lt;/STRONG&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&lt;STRONG&gt;name:http-source,state:IDLE} } - Exception follows.&lt;/STRONG&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&lt;STRONG&gt;java.lang.RuntimeException: java.net.BindException: Cannot assign requested address&amp;nbsp;&lt;/STRONG&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="3"&gt;Please help me resolve this issue.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="3"&gt;Thanks,&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="3"&gt;Shilpa&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Dec 2016 22:56:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48659#M49215</guid>
      <dc:creator>ShilpaSinha</dc:creator>
      <dc:date>2016-12-19T22:56:41Z</dc:date>
    </item>
    <item>
      <title>Re: Use Flume to get a webpage data. How to configure, how to use it to stream data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48810#M49216</link>
      <description>&lt;P&gt;I have edited my flume.conf to&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;# Please paste flume.conf here.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;# Sources, channels, and sinks are defined per&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;# agent name, in this case 'tier1'.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;tier1.sources = http-source&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;tier1.channels = mem-channel-1&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;tier1.sinks = hdfs-sink&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;# For each source, channel, and sink, set&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;# standard properties.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;tier1.sources.http-source.type = http&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;tier1.sources.http-source.handler = org.apache.flume.source.http.JSONHandler&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;tier1.sources.http-source.bind = localhost&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;tier1.sources.http-source.url = &lt;A href="http://www.jornada.unam.mx/ultimas" target="_blank"&gt;http://www.jornada.unam.mx/ultimas&lt;/A&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;tier1.sources.http-source.port = 5440&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;tier1.sources.http-source.channels = mem-channel-1&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;tier1.channels.mem-channel-1.type = memory&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;tier1.sinks.hdfs-sink.type = hdfs&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;tier1.sinks.hdfs-sink.channel = mem-channel-1&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;tier1.sinks.hdfs-sink.hdfs.path = hdfs://lnxmasternode01.centralus.cloudapp.azure.com/flume/events/%y-%m-%d/%H%M/%S&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;# Other properties are specific to each type of&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;# source, channel, or sink. In this case, we&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;# specify the capacity of the memory channel.&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;tier1.channels.mem-channel-1.capacity = 100&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="3"&gt;Now, I can see http-source as started in flume logs.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="3"&gt;However, no data is getting streamed in hdfs path i mentioned in the config. Can anyone suggest now what to do?&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="3"&gt;&lt;FONT size="2"&gt;-bash-4.1$ hadoop fs -ls /flume&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;Found 1 items&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;drwxr-xr-x - flume hdfs 0 2016-12-23 11:49 /flume/events&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;-bash-4.1$ hadoop fs -ls /flume/events&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;-bash-4.1$&lt;/FONT&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 26 Dec 2016 19:31:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48810#M49216</guid>
      <dc:creator>ShilpaSinha</dc:creator>
      <dc:date>2016-12-26T19:31:29Z</dc:date>
    </item>
    <item>
      <title>Re: Use Flume to get a webpage data. How to configure, how to use it to stream data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48818#M49217</link>
      <description>&lt;P&gt;&lt;SPAN&gt;I checked the flume jars for source, I can find only these with cloudera bundle:&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;FONT size="2"&gt;[hadoop@LnxMasterNode01 jars]$ ll flume*source*&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;-rw-r--r-- 1 root root 20586 Oct 21 04:58 flume-avro-source-1.6.0-cdh5.9.0.jar&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;-rw-r--r-- 1 root root 26893 Oct 21 04:58 flume-jms-source-1.6.0-cdh5.9.0.jar&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;-rw-r--r-- 1 root root 22843 Oct 21 04:58 flume-kafka-source-1.6.0-cdh5.9.0.jar&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;-rw-r--r-- 1 root root 61447 Oct 21 04:58 flume-scribe-source-1.6.0-cdh5.9.0.jar&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;-rw-r--r-- 1 root root 34830 Oct 21 04:58 flume-taildir-source-1.6.0-cdh5.9.0.jar&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;-rw-r--r-- 1 root root 80709 Oct 21 04:58 flume-thrift-source-1.6.0-cdh5.9.0.jar&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;-rw-r--r-- 1 root root 14540 Oct 21 04:58 flume-twitter-source-1.6.0-cdh5.9.0.jar&lt;/FONT&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;Can this be the reason why http source is not working i.e. data streaming not happening despite no error in flume.log and it says &lt;/SPAN&gt;&lt;STRONG&gt;http-source started&lt;/STRONG&gt;&lt;SPAN&gt;?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;How do I get jars related to http-source?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Shilpa&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Dec 2016 00:56:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/48818#M49217</guid>
      <dc:creator>ShilpaSinha</dc:creator>
      <dc:date>2016-12-27T00:56:13Z</dc:date>
    </item>
    <item>
      <title>Re: Use Flume to get a webpage data. How to configure, how to use it to stream data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/49158#M49218</link>
      <description>As I stated before, flume can't consume from a remote http server. You would need to have something that could consume from the remote server and then post to flume.&lt;BR /&gt;&lt;BR /&gt;-pd</description>
      <pubDate>Fri, 06 Jan 2017 20:13:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/49158#M49218</guid>
      <dc:creator>pdvorak</dc:creator>
      <dc:date>2017-01-06T20:13:19Z</dc:date>
    </item>
    <item>
      <title>Re: Use Flume to get a webpage data. How to configure, how to use it to stream data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/49163#M49219</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/2139"&gt;@pdvorak&lt;/a&gt;&amp;nbsp;thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Yes, i wrote a java code to pull RSS feed and used Exec source and Avro Sink on 2 nodes and Avro Source as collector and HDFS sink on the 3rd node.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 07 Jan 2017 00:05:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/49163#M49219</guid>
      <dc:creator>ShilpaSinha</dc:creator>
      <dc:date>2017-01-07T00:05:05Z</dc:date>
    </item>
    <item>
      <title>Re: Use Flume to get a webpage data. How to configure, how to use it to stream data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/50517#M49220</link>
      <description>&lt;P&gt;Hi Shilpa,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Were you able to get webpage data to HDFS via flume? Please let me know what all you did.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Mon, 06 Feb 2017 23:21:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/50517#M49220</guid>
      <dc:creator>aparnak</dc:creator>
      <dc:date>2017-02-06T23:21:37Z</dc:date>
    </item>
    <item>
      <title>Re: Use Flume to get a webpage data. How to configure, how to use it to stream data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/55451#M49221</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/19629"&gt;@ShilpaSinha&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;can you share how you get that java code to pull the RSS feed?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;David&lt;/P&gt;</description>
      <pubDate>Tue, 06 Jun 2017 14:32:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/55451#M49221</guid>
      <dc:creator>Dboudart</dc:creator>
      <dc:date>2017-06-06T14:32:29Z</dc:date>
    </item>
    <item>
      <title>Re: Use Flume to get a webpage data. How to configure, how to use it to stream data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/56179#M49222</link>
      <description>Here is an example for creating a simple java RSS reader and setting flume up to read the output:&lt;BR /&gt;&lt;BR /&gt;&lt;A href="http://www.ibm.com/developerworks/library/bd-flumews/" target="_blank"&gt;http://www.ibm.com/developerworks/library/bd-flumews/&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;-pd</description>
      <pubDate>Tue, 20 Jun 2017 16:52:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/56179#M49222</guid>
      <dc:creator>pdvorak</dc:creator>
      <dc:date>2017-06-20T16:52:00Z</dc:date>
    </item>
    <item>
      <title>Re: Use Flume to get a webpage data. How to configure, how to use it to stream data</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/56182#M49223</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/2139"&gt;@pdvorak&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks a lot for you answer, I've already checked that page and it helped.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks again.&lt;/P&gt;&lt;P&gt;DB&lt;/P&gt;</description>
      <pubDate>Tue, 20 Jun 2017 16:58:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Use-Flume-to-get-a-webpage-data-How-to-configure-how-to-use/m-p/56182#M49223</guid>
      <dc:creator>Dboudart</dc:creator>
      <dc:date>2017-06-20T16:58:53Z</dc:date>
    </item>
  </channel>
</rss>

