Member since
03-08-2016
33
Posts
0
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
12316 | 04-25-2016 05:45 AM | |
13388 | 04-07-2016 04:13 AM |
02-19-2020
11:23 AM
The connectivity between the DN and NN has to be gauged. Check whether DN is able to resolve the NN hostname (forward & reverse ).
... View more
03-29-2017
09:17 AM
Now cloudera work Fine but in Host Monitor log file I get an error : Could not fetch descriptor after 5 tries, exiting. And i can't restart this service, and when i'm trying to restart the Cloudera Management Service i get : Cannot restart service when Host Monitor (master) is in STOPPING state.
... View more
02-16-2017
12:40 PM
That seems like that is the most common fix but as pointed out it could be other alternative set.
... View more
11-21-2016
12:51 PM
Hi, The select * query can run "locally" in a "streaming mode", so no mapreduce job is needed. The select count(*) query on the other hand starts up a mapreduce job, so it involves more nodes. The error message says that the "master1" node cannot connect to "slave1" node because of networking problems. Please check if the network connectivity from all the hosts to all other hosts are fine (network interface is up and running), you don't use multiple network interfaces and the DNS resultion is working fine. If you use Cloudera Manager, then please run the "Host inspector" to check these easily. Regards Miklos Szurap Customer Operations Engineer
... View more
05-12-2016
03:10 AM
I am using spark streaming , event count example , flume as source. this is my config file : FileFileAgent.sources = File
FileFileAgent.channels = MemChannel
FileFileAgent.sinks = spark
#configuring the souce
FileFileAgent.sources.File.type = spooldir
FileFileAgent.sources.File.spoolDir = /usr/lib/flume/spooldir
#FileFileAgent.sources.File.fileHeader = true
# Describing/Configuring the sink
FileAgent.sinks.spark.type = org.apache.spark.streaming.flume.sink.SparkSink
FileAgent.sinks.spark.hostname = 192.168.1.31
FileAgent.sinks.spark.port = 18088
FileAgent.sinks.spark.channel = MemoryChannel
# Describing/Configuring the channel
FileFileAgent.channels.MemChannel.type = memory
FileFileAgent.channels.MemChannel.capacity = 10000
FileFileAgent.channels.MemChannel.transactionCapacity = 100
# Binding the source and sink to the channel
FileFileAgent.sources.File.channels = MemChannel when I run the command line : bin/flume-ng agent --conf conf --conf-file flumeSpark.conf --name FileAgent -Dflume.root.logger=INFO,console I have a warning : 16/05/12 10:44:52 WARN conf.FlumeConfiguration: Agent configuration for 'FileAgent' does not contain any channels. Marking it as invalid.
16/05/12 10:44:52 WARN conf.FlumeConfiguration: Agent configuration invalid for agent 'FileAgent'. It will be removed.
16/05/12 10:44:52 WARN conf.FlumeConfiguration: no context for sinkspark
16/05/12 10:44:52 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [FileFileAgent]
16/05/12 10:44:52 WARN node.AbstractConfigurationProvider: No configuration found for this host:FileAgent
16/05/12 10:44:52 INFO node.Application: Starting new configuration:{ sourceRunners:{} sinkRunners:{} channels:{} } And when i execute my spark application : val conf = new SparkConf()
.setAppName("File Count")
.setMaster("local[2]")
val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(10))
val flumeStream = FlumeUtils.createPollingStream(ssc, "192.168.1.31", 18088)
flumeStream.count().map(cnt => "Received " + cnt + " flume events." ).print()
ssc.start()
ssc.awaitTermination() I have this warning : 16/05/12 10:46:59 WARN FlumeBatchFetcher: Error while receiving data from Flume
java.io.IOException: NettyTransceiver closed
at org.apache.avro.ipc.NettyTransceiver.disconnect(NettyTransceiver.java:363)
at org.apache.avro.ipc.NettyTransceiver.access$200(NettyTransceiver.java:61)
at org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.handleUpstream(NettyTransceiver.java:580)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493)
at org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:375)
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:58)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:54)
at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
at org.jboss.netty.channel.Channels.close(Channels.java:812)
at org.jboss.netty.channel.AbstractChannel.close(AbstractChannel.java:197)
at org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decodePackHeader(NettyTransportCodec.java:166)
at org.apache.avro.ipc.NettyTransportCodec$NettyFrameDecoder.decode(NettyTransportCodec.java:139)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) Any help please . how can i execute my application, i don't know how to resolve this problem?
... View more
05-11-2016
03:56 AM
Yes it's working now, thanks for your help.
... View more
04-25-2016
05:45 AM
I have found the solution : var addedRDD : org.apache.spark.rdd.RDD[(String,Int)] = sc.emptyRDD
... View more
04-14-2016
07:50 AM
I'm new to Spark, I want to make a treatment on files in streaming. I have files csv which arrive non-stop: Example csv file: world world
count world
world earth
count world and I want to do two treatment on them : the first treatment is for a result like this : (world,2,2) // word is twice repeated for the first column and distinct (world,earth) for second therefore (2,2)
(count,2,1) // word is twice repeated for the first column and not distinct (world,world) for second therefore (2,1) the second result I want to get that result after each hour.in our example: (world,1) // 1=2/2
(count,2) //2=2/1 this is my code : val conf = new SparkConf()
.setAppName("File Count")
.setMaster("local[2]")
val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(10m))
val file = ssc.textFileStream("hdfs://192.168.1.31:8020/user/sparkStreaming/input")
var result = file.map(x => (x.split(" ")(0)+";"+x.split(" ")(1), 1)).reduceByKey((x,y) => x+y)
val window = result.reduceByKeyAndWindow((a:Int,b:Int) => (a + b), Seconds(60), Seconds(20))
val result1 = window.map(x => x.toString )
val result2 = result1.map(line => line.split(";")(0)+","+line.split(",")(1))
val result3 = result2.map(line => line.substring(1, line.length-1))
val result4 = result3.map(line => (line.split(",")(0),line.split(",")(1).toInt ) )
val result5 = result4.reduceByKey((x,y) => x+y )
val result6 = result3.map(line => (line.split(",")(0), 1 ))
val result7 = result6.reduceByKey((x,y) => x+y )
val result8 = result7.join(result5) // (world,2,2)
val finalResult = result8.mapValues(x => x._1.toFloat / x._2 ) // (world,1), I want this result after every one hour
ssc.start()
ssc.awaitTermination() Thanks in Advance!!!
... View more
04-12-2016
07:26 AM
Hi, I have a simple code spark streaming do some processing on csv files. val conf = new SparkConf()
.setAppName("File Count")
.setMaster("local[2]")
val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(10))
val file = ssc.textFileStream("hdfs://192.168.1.31:8020/user/sparkStreaming/input")
val test = file.map(x => (x.split(" ")(0)+";"+x.split(" ")(1), 1)).reduceByKey((x,y) => x+y)
val windowed = test.reduceByKeyAndWindow((a:Int,b:Int) => (a + b), Seconds(60), Seconds(20))
windowed.foreachRDD(rdd=>{
rdd.saveAsTextFile("hdfs://192.168.1.31:8020/user/sparkStreaming/output")
})
ssc.start()
ssc.awaitTermination()
}
} in directory output I have only the last file processing result. I want to keep result of all the file not only the last one.
... View more
04-07-2016
04:26 AM
1 Kudo
Thats because you have no new files arriving in the directory after streaming application starts. You can try "cp" to drop files in the directory after starting the streaming application.
... View more
03-31-2016
09:27 PM
For filename, you can use basenameHeader or fileHeader [1]. There isn't currently a header value that is populated for MD5 however, so you'd have to modify the spooldir source if you needed to pull the md5 value from the file. -PD [1] http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source
... View more