Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Use Flume to get a webpage data. How to configure, how to use it to stream data

Solved Go to solution

Re: Use Flume to get a webpage data. How to configure, how to use it to stream data

Expert Contributor

@pdvorak / @hshreedharan

 

I ran a curl on the IP and saw it is using port 80 to connect to the news webpage. Even Telnet is working on the port 80.

 

[root@LnxMasterNode01 /]# telnet 132.247.1.32 80
Trying 132.247.1.32...
Connected to 132.247.1.32.
Escape character is '^]'.
^CConnection closed by foreign host.

 

However when restarting flume, I am getting the same error as earlier. Can this be ONLY related to absence of plugins.d(see the previous post)

 

2016-12-19 16:45:00,353 WARN org.mortbay.log: failed SelectChannelConnector@132.247.1.32:80: java.net.BindException: Cannot assign requested address
2016-12-19 16:45:00,353 WARN org.mortbay.log: failed Server@36772002: java.net.BindException: Cannot assign requested address
2016-12-19 16:45:00,353 ERROR org.apache.flume.source.http.HTTPSource: Error while starting HTTPSource. Exception follows.
java.net.BindException: Cannot assign requested address
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
at org.mortbay.jetty.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:315)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.jetty.Server.doStart(Server.java:235)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.apache.flume.source.http.HTTPSource.start(HTTPSource.java:207)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2016-12-19 16:45:00,364 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner: { source:org.apache.flume.source.http.HTTPSource{
name:http-source,state:IDLE} } - Exception follows.
java.lang.RuntimeException: java.net.BindException: Cannot assign requested address 

 

Please help me resolve this issue.

 

Thanks,

Shilpa

Re: Use Flume to get a webpage data. How to configure, how to use it to stream data

Expert Contributor

I have edited my flume.conf to

 

# Please paste flume.conf here.

# Sources, channels, and sinks are defined per
# agent name, in this case 'tier1'.
tier1.sources = http-source
tier1.channels = mem-channel-1
tier1.sinks = hdfs-sink
# For each source, channel, and sink, set
# standard properties.
tier1.sources.http-source.type = http
tier1.sources.http-source.handler = org.apache.flume.source.http.JSONHandler
tier1.sources.http-source.bind = localhost
tier1.sources.http-source.url = http://www.jornada.unam.mx/ultimas
tier1.sources.http-source.port = 5440
tier1.sources.http-source.channels = mem-channel-1
tier1.channels.mem-channel-1.type = memory
tier1.sinks.hdfs-sink.type = hdfs
tier1.sinks.hdfs-sink.channel = mem-channel-1
tier1.sinks.hdfs-sink.hdfs.path = hdfs://lnxmasternode01.centralus.cloudapp.azure.com/flume/events/%y-%m-%d/%H%M/%S
# Other properties are specific to each type of
# source, channel, or sink. In this case, we
# specify the capacity of the memory channel.
tier1.channels.mem-channel-1.capacity = 100

 

Now, I can see http-source as started in flume logs.

 

However, no data is getting streamed in hdfs path i mentioned in the config. Can anyone suggest now what to do?

 

-bash-4.1$ hadoop fs -ls /flume
Found 1 items
drwxr-xr-x - flume hdfs 0 2016-12-23 11:49 /flume/events
-bash-4.1$ hadoop fs -ls /flume/events
-bash-4.1$

Re: Use Flume to get a webpage data. How to configure, how to use it to stream data

Expert Contributor

I checked the flume jars for source, I can find only these with cloudera bundle:

 

[hadoop@LnxMasterNode01 jars]$ ll flume*source*
-rw-r--r-- 1 root root 20586 Oct 21 04:58 flume-avro-source-1.6.0-cdh5.9.0.jar
-rw-r--r-- 1 root root 26893 Oct 21 04:58 flume-jms-source-1.6.0-cdh5.9.0.jar
-rw-r--r-- 1 root root 22843 Oct 21 04:58 flume-kafka-source-1.6.0-cdh5.9.0.jar
-rw-r--r-- 1 root root 61447 Oct 21 04:58 flume-scribe-source-1.6.0-cdh5.9.0.jar
-rw-r--r-- 1 root root 34830 Oct 21 04:58 flume-taildir-source-1.6.0-cdh5.9.0.jar
-rw-r--r-- 1 root root 80709 Oct 21 04:58 flume-thrift-source-1.6.0-cdh5.9.0.jar
-rw-r--r-- 1 root root 14540 Oct 21 04:58 flume-twitter-source-1.6.0-cdh5.9.0.jar

 

 Can this be the reason why http source is not working i.e. data streaming not happening despite no error in flume.log and it says http-source started?

 

How do I get jars related to http-source?

 

Thanks,

Shilpa

Re: Use Flume to get a webpage data. How to configure, how to use it to stream data

Super Collaborator
As I stated before, flume can't consume from a remote http server. You would need to have something that could consume from the remote server and then post to flume.

-pd
Highlighted

Re: Use Flume to get a webpage data. How to configure, how to use it to stream data

Expert Contributor

@pdvorak thanks!

 

Yes, i wrote a java code to pull RSS feed and used Exec source and Avro Sink on 2 nodes and Avro Source as collector and HDFS sink on the 3rd node.

 

Re: Use Flume to get a webpage data. How to configure, how to use it to stream data

New Contributor

Hi Shilpa,

 

Were you able to get webpage data to HDFS via flume? Please let me know what all you did.

 

Thanks.

Re: Use Flume to get a webpage data. How to configure, how to use it to stream data

Explorer

Hi @ShilpaSinha,

 

can you share how you get that java code to pull the RSS feed?

 

Regards,

David

Re: Use Flume to get a webpage data. How to configure, how to use it to stream data

Super Collaborator
Here is an example for creating a simple java RSS reader and setting flume up to read the output:

http://www.ibm.com/developerworks/library/bd-flumews/

-pd

Re: Use Flume to get a webpage data. How to configure, how to use it to stream data

Explorer

Hi @pdvorak,

 

thanks a lot for you answer, I've already checked that page and it helped.

 

Thanks again.

DB

Don't have an account?
Coming from Hortonworks? Activate your account here