About ShilpaSinha

ShilpaSinha · ‎01-03-2017

Thanks a lot @mbigelow I thought earlier, if it is in YARN mode then those gateways should run hence was trying. I can see the spark enteries in RM UI because I opened spark-shell. However just one question, in my spark directory (basically spark/sbin) I can see various Stop, Start scripts for Spark Master, slave, history server etc. I tried to start Spark master from the command and now the UI is opening. Do you think I did the right thing? Or it is NOT required? But if I run start slave script, worker UI is Not opening. Can you help me again pls. Thanks, Shilpa

ShilpaSinha · ‎01-02-2017

Hi All, I have 3 node cluster having Cloudera 5.9 I am new in spark. I have installed it using cloudera add service feature. However I notice Spark gateway is not running on any of the node. When I try to start , it gives the error " Command Start is not currently available for execution " I am not sure why is this error coming. Do i have to chose the option from CM UI Install spark jars ? I am using Spark to use Yarn Resource Manager i.e. it not a stand alone Spark. I have 7077 707818080 18081 18088 and 4040 ports open for them. History server UI is opening. Also, I am able to launch and use spark-shell. However I cannot open Spark UI using ¨http://lnxmasternode01.centralus.cloudapp.azure.com:4040/¨ It is giving error ¨the site cant be reached¨ Please advice what should I do. -bash-4.1$ spark-shell Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.6.0 /_/ Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_67) Type in expressions to have them evaluated. Type :help for more information. Spark context available as sc (master = yarn-client, app id = application_1483386427562_0003). SQL context available as sqlContext. scala> Thanks, Shilpa

ShilpaSinha · ‎12-26-2016

I checked the flume jars for source, I can find only these with cloudera bundle: [hadoop@LnxMasterNode01 jars]$ ll flume*source* -rw-r--r-- 1 root root 20586 Oct 21 04:58 flume-avro-source-1.6.0-cdh5.9.0.jar -rw-r--r-- 1 root root 26893 Oct 21 04:58 flume-jms-source-1.6.0-cdh5.9.0.jar -rw-r--r-- 1 root root 22843 Oct 21 04:58 flume-kafka-source-1.6.0-cdh5.9.0.jar -rw-r--r-- 1 root root 61447 Oct 21 04:58 flume-scribe-source-1.6.0-cdh5.9.0.jar -rw-r--r-- 1 root root 34830 Oct 21 04:58 flume-taildir-source-1.6.0-cdh5.9.0.jar -rw-r--r-- 1 root root 80709 Oct 21 04:58 flume-thrift-source-1.6.0-cdh5.9.0.jar -rw-r--r-- 1 root root 14540 Oct 21 04:58 flume-twitter-source-1.6.0-cdh5.9.0.jar Can this be the reason why http source is not working i.e. data streaming not happening despite no error in flume.log and it says http-source started? How do I get jars related to http-source? Thanks, Shilpa

ShilpaSinha · ‎12-26-2016

I have edited my flume.conf to # Please paste flume.conf here. # Sources, channels, and sinks are defined per # agent name, in this case 'tier1'. tier1.sources = http-source tier1.channels = mem-channel-1 tier1.sinks = hdfs-sink # For each source, channel, and sink, set # standard properties. tier1.sources.http-source.type = http tier1.sources.http-source.handler = org.apache.flume.source.http.JSONHandler tier1.sources.http-source.bind = localhost tier1.sources.http-source.url = http://www.jornada.unam.mx/ultimas tier1.sources.http-source.port = 5440 tier1.sources.http-source.channels = mem-channel-1 tier1.channels.mem-channel-1.type = memory tier1.sinks.hdfs-sink.type = hdfs tier1.sinks.hdfs-sink.channel = mem-channel-1 tier1.sinks.hdfs-sink.hdfs.path = hdfs://lnxmasternode01.centralus.cloudapp.azure.com/flume/events/%y-%m-%d/%H%M/%S # Other properties are specific to each type of # source, channel, or sink. In this case, we # specify the capacity of the memory channel. tier1.channels.mem-channel-1.capacity = 100 Now, I can see http-source as started in flume logs. However, no data is getting streamed in hdfs path i mentioned in the config. Can anyone suggest now what to do? -bash-4.1$ hadoop fs -ls /flume Found 1 items drwxr-xr-x - flume hdfs 0 2016-12-23 11:49 /flume/events -bash-4.1$ hadoop fs -ls /flume/events -bash-4.1$

ShilpaSinha · ‎12-19-2016

@pdvorak / @hshreedharan I ran a curl on the IP and saw it is using port 80 to connect to the news webpage. Even Telnet is working on the port 80. [root@LnxMasterNode01 /]# telnet 132.247.1.32 80 Trying 132.247.1.32... Connected to 132.247.1.32. Escape character is '^]'. ^CConnection closed by foreign host. However when restarting flume, I am getting the same error as earlier. Can this be ONLY related to absence of plugins.d(see the previous post) 2016-12-19 16:45:00,353 WARN org.mortbay.log: failed SelectChannelConnector@132.247.1.32:80: java.net.BindException: Cannot assign requested address 2016-12-19 16:45:00,353 WARN org.mortbay.log: failed Server@36772002: java.net.BindException: Cannot assign requested address 2016-12-19 16:45:00,353 ERROR org.apache.flume.source.http.HTTPSource: Error while starting HTTPSource. Exception follows. java.net.BindException: Cannot assign requested address at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216) at org.mortbay.jetty.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:315) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.Server.doStart(Server.java:235) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.apache.flume.source.http.HTTPSource.start(HTTPSource.java:207) at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2016-12-19 16:45:00,364 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner: { source:org.apache.flume.source.http.HTTPSource{ name:http-source,state:IDLE} } - Exception follows. java.lang.RuntimeException: java.net.BindException: Cannot assign requested address Please help me resolve this issue. Thanks, Shilpa

ShilpaSinha · ‎12-19-2016

Hi @hshreedharan / @pdvorak / All, My project is in crisis really need help. my flume.conf is as follows: tier1.sources = http-source tier1.channels = mem-channel-1 tier1.sinks = hdfs-sink # For each source, channel, and sink, set # standard properties. tier1.sources.http-source.type = http tier1.sources.http-source.handler = org.apache.flume.source.http.JSONHandler tier1.sources.http-source.bind = 132.247.1.32 tier1.sources.http-source.port = 21 tier1.sources.http-source.channels = mem-channel-1 tier1.channels.mem-channel-1.type = memory tier1.sinks.hdfs-sink.type = hdfs tier1.sinks.hdfs-sink.channel = mem-channel-1 tier1.sinks.hdfs-sink.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S # Other properties are specific to each type of # source, channel, or sink. In this case, we # specify the capacity of the memory channel. tier1.channels.mem-channel-1.capacity = 100 The error I am getting while starting flume-ng: 2016-12-19 15:52:27,356 WARN org.mortbay.log: failed SelectChannelConnector@132.247.1.32:21: java.net.BindException: Cannot assign requested address 2016-12-19 15:52:27,356 WARN org.mortbay.log: failed Server@9439eee: java.net.BindException: Cannot assign requested address 2016-12-19 15:52:27,357 ERROR org.apache.flume.source.http.HTTPSource: Error while starting HTTPSource. Exception follows. java.net.BindException: Cannot assign requested address at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216) at org.mortbay.jetty.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:315) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.Server.doStart(Server.java:235) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.apache.flume.source.http.HTTPSource.start(HTTPSource.java:207) at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2016-12-19 15:52:27,367 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner: { source:org.apache.flume.source.http.HTTPSource{name:http-source,state:IDLE} } - Exception follows. java.lang.RuntimeException: java.net.BindException: Cannot assign requested address at com.google.common.base.Throwables.propagate(Throwables.java:156) at org.apache.flume.source.http.HTTPSource.start(HTTPSource.java:211) at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44) at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) I guess, the source is not able to connect to IP and Port I provided. This is because the IP I provided is of 3rd Party, a news webpage. And I dont know which port is open to connect. Please guide me how can I can enable data streaming from a 3rd Party webpage to HDFS using Flume. Also, I cannot see my plugins.d for Flume. Can this be one of the reason? Do i have to download it jars separetly? I have installed Flume from Cloudera Manager UI. [root@LnxMasterNode01 CDH-5.9.0-1.cdh5.9.0.p0.23]# cd /var/lib/flume-ng/ [root@LnxMasterNode01 flume-ng]# ll total 0 [root@LnxMasterNode01 flume-ng]# cd / [root@LnxMasterNode01 /]# find . -name plugins.d ./etc/audisp/plugins.d [root@LnxMasterNode01 /]# @hshreedharan I saw your post on Flume on a seperate forum hence looped you. Please help. Thanks, Shilpa

ShilpaSinha · ‎12-16-2016

Hi @pdvorak/ All, Please help. I am not sure how to deal with it. Thanks, Shilpa

ShilpaSinha · ‎12-15-2016

Hey @pdvorak, My flume.conf look like below. However I have a question as i want to get data from a third party website, in my case a news website. I dont know which port to use in the configuration, please see highlighted part below. I thought of using 8080 but it is not open. it i [root@LnxMasterNode01 ~]# nc -l 132.247.1.32 8080 nc: Cannot assign requested address Can you tell me how to deal with this part. Also, i am looking for special keywords in this website such as fashion, sports and i want the data only related to them. Can you tell me if this can be done using multiple sink for each header such as: # list the sources, sinks and channels in the agent agent_foo.sources = avro-AppSrv-source1 agent_foo.sinks = hdfs-Cluster1-sink1 avro-forward-sink2 agent_foo.channels = mem-channel-1 mem-channel-2 # set channels for source agent_foo.sources.avro-AppSrv-source1.channels = mem-channel-1 mem-channel-2 # set channel for sinks agent_foo.sinks.hdfs-Cluster1-sink1.channel = mem-channel-1 mem-channel-2 # channel selector configuration agent_foo.sources.avro-AppSrv-source1.selector.type = multiplexing agent_foo.sources.avro-AppSrv-source1.selector.header = Category agent_foo.sources.avro-AppSrv-source1.selector.mapping.Fashion = mem-channel-1 agent_foo.sources.avro-AppSrv-source1.selector.mapping.Baseball = mem-channel-2 agent_foo.sources.avro-AppSrv-source1.selector.mapping.Basketball = mem-channel-1 mem-channel-2 agent_foo.sources.avro-AppSrv-source1.selector.default = mem-channel-1 Thanks in Advance! Shilpa

ShilpaSinha · ‎12-02-2016

Thanks pd. 🙂 I will go through the links and will get back If I am stuck anywhere. -Shilpa

ShilpaSinha · ‎12-02-2016

Hey guys! I am still wating for the reply. Please help me with those configs or point me to a document. Thanks, Shilpa

Online	Offline
Last Visited	‎05-17-2018 02:52 PM

Member Since	‎11-17-2016 11:39 AM
Last Visited	‎05-17-2018 02:52 PM
Posts	63
Kudos received	7

Cloudera Community

Re: dfs storage(dfs.data.dir) space issue

Re: Install and run Apache Nutch on existing Hadoo...

Re: Run SparkR | or R package on my Cloudera 5.9 S...

Re: Use Flume to get a webpage data. How to config...

Re: hdfs.HDFSEventSink: HDFS IO error java.io.IOEx...

Re: Spark gateway not starting

Spark gateway not starting

Re: Use Flume to get a webpage data. How to config...

Re: Use Flume to get a webpage data. How to config...

Re: Use Flume to get a webpage data. How to config...

Re: Use Flume to get a webpage data. How to config...

Re: Use Flume to get a webpage data. How to config...

Re: Use Flume to get a webpage data. How to config...

Re: Use Flume to get a webpage data. How to config...

Re: Use Flume to get a webpage data. How to config...