Reply
Expert Contributor
Posts: 63
Registered: ‎11-17-2016

Re: Use Flume to get a webpage data. How to configure, how to use it to stream data

@pdvorak / @hshreedharan

 

I ran a curl on the IP and saw it is using port 80 to connect to the news webpage. Even Telnet is working on the port 80.

 

[root@LnxMasterNode01 /]# telnet 132.247.1.32 80
Trying 132.247.1.32...
Connected to 132.247.1.32.
Escape character is '^]'.
^CConnection closed by foreign host.

 

However when restarting flume, I am getting the same error as earlier. Can this be ONLY related to absence of plugins.d(see the previous post)

 

2016-12-19 16:45:00,353 WARN org.mortbay.log: failed SelectChannelConnector@132.247.1.32:80: java.net.BindException: Cannot assign requested address
2016-12-19 16:45:00,353 WARN org.mortbay.log: failed Server@36772002: java.net.BindException: Cannot assign requested address
2016-12-19 16:45:00,353 ERROR org.apache.flume.source.http.HTTPSource: Error while starting HTTPSource. Exception follows.
java.net.BindException: Cannot assign requested address
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
at org.mortbay.jetty.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:315)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.jetty.Server.doStart(Server.java:235)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.apache.flume.source.http.HTTPSource.start(HTTPSource.java:207)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2016-12-19 16:45:00,364 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner: { source:org.apache.flume.source.http.HTTPSource{
name:http-source,state:IDLE} } - Exception follows.
java.lang.RuntimeException: java.net.BindException: Cannot assign requested address 

 

Please help me resolve this issue.

 

Thanks,

Shilpa

Expert Contributor
Posts: 63
Registered: ‎11-17-2016

Re: Use Flume to get a webpage data. How to configure, how to use it to stream data

I have edited my flume.conf to

 

# Please paste flume.conf here.

# Sources, channels, and sinks are defined per
# agent name, in this case 'tier1'.
tier1.sources = http-source
tier1.channels = mem-channel-1
tier1.sinks = hdfs-sink
# For each source, channel, and sink, set
# standard properties.
tier1.sources.http-source.type = http
tier1.sources.http-source.handler = org.apache.flume.source.http.JSONHandler
tier1.sources.http-source.bind = localhost
tier1.sources.http-source.url = http://www.jornada.unam.mx/ultimas
tier1.sources.http-source.port = 5440
tier1.sources.http-source.channels = mem-channel-1
tier1.channels.mem-channel-1.type = memory
tier1.sinks.hdfs-sink.type = hdfs
tier1.sinks.hdfs-sink.channel = mem-channel-1
tier1.sinks.hdfs-sink.hdfs.path = hdfs://lnxmasternode01.centralus.cloudapp.azure.com/flume/events/%y-%m-%d/%H%M/%S
# Other properties are specific to each type of
# source, channel, or sink. In this case, we
# specify the capacity of the memory channel.
tier1.channels.mem-channel-1.capacity = 100

 

Now, I can see http-source as started in flume logs.

 

However, no data is getting streamed in hdfs path i mentioned in the config. Can anyone suggest now what to do?

 

-bash-4.1$ hadoop fs -ls /flume
Found 1 items
drwxr-xr-x - flume hdfs 0 2016-12-23 11:49 /flume/events
-bash-4.1$ hadoop fs -ls /flume/events
-bash-4.1$

Expert Contributor
Posts: 63
Registered: ‎11-17-2016

Re: Use Flume to get a webpage data. How to configure, how to use it to stream data

I checked the flume jars for source, I can find only these with cloudera bundle:

 

[hadoop@LnxMasterNode01 jars]$ ll flume*source*
-rw-r--r-- 1 root root 20586 Oct 21 04:58 flume-avro-source-1.6.0-cdh5.9.0.jar
-rw-r--r-- 1 root root 26893 Oct 21 04:58 flume-jms-source-1.6.0-cdh5.9.0.jar
-rw-r--r-- 1 root root 22843 Oct 21 04:58 flume-kafka-source-1.6.0-cdh5.9.0.jar
-rw-r--r-- 1 root root 61447 Oct 21 04:58 flume-scribe-source-1.6.0-cdh5.9.0.jar
-rw-r--r-- 1 root root 34830 Oct 21 04:58 flume-taildir-source-1.6.0-cdh5.9.0.jar
-rw-r--r-- 1 root root 80709 Oct 21 04:58 flume-thrift-source-1.6.0-cdh5.9.0.jar
-rw-r--r-- 1 root root 14540 Oct 21 04:58 flume-twitter-source-1.6.0-cdh5.9.0.jar

 

 Can this be the reason why http source is not working i.e. data streaming not happening despite no error in flume.log and it says http-source started?

 

How do I get jars related to http-source?

 

Thanks,

Shilpa

Cloudera Employee
Posts: 273
Registered: ‎01-09-2014

Re: Use Flume to get a webpage data. How to configure, how to use it to stream data

As I stated before, flume can't consume from a remote http server. You would need to have something that could consume from the remote server and then post to flume.

-pd
Highlighted
Expert Contributor
Posts: 63
Registered: ‎11-17-2016

Re: Use Flume to get a webpage data. How to configure, how to use it to stream data

[ Edited ]

@pdvorak thanks!

 

Yes, i wrote a java code to pull RSS feed and used Exec source and Avro Sink on 2 nodes and Avro Source as collector and HDFS sink on the 3rd node.

 

New Contributor
Posts: 1
Registered: ‎02-06-2017

Re: Use Flume to get a webpage data. How to configure, how to use it to stream data

Hi Shilpa,

 

Were you able to get webpage data to HDFS via flume? Please let me know what all you did.

 

Thanks.

Explorer
Posts: 10
Registered: ‎02-16-2017

Re: Use Flume to get a webpage data. How to configure, how to use it to stream data

Hi @ShilpaSinha,

 

can you share how you get that java code to pull the RSS feed?

 

Regards,

David

Cloudera Employee
Posts: 273
Registered: ‎01-09-2014

Re: Use Flume to get a webpage data. How to configure, how to use it to stream data

Here is an example for creating a simple java RSS reader and setting flume up to read the output:

http://www.ibm.com/developerworks/library/bd-flumews/

-pd
Explorer
Posts: 10
Registered: ‎02-16-2017

Re: Use Flume to get a webpage data. How to configure, how to use it to stream data

Hi @pdvorak,

 

thanks a lot for you answer, I've already checked that page and it helped.

 

Thanks again.

DB

Announcements
New solutions