Member since
03-08-2016
33
Posts
0
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
13389 | 04-25-2016 05:45 AM | |
19460 | 04-07-2016 04:13 AM |
03-29-2017
09:17 AM
Now cloudera work Fine but in Host Monitor log file I get an error : Could not fetch descriptor after 5 tries, exiting. And i can't restart this service, and when i'm trying to restart the Cloudera Management Service i get : Cannot restart service when Host Monitor (master) is in STOPPING state.
... View more
03-29-2017
08:45 AM
I do not know why the number of files mgmt_mgmt-NAVIGATORMETASERVER* increases mgmt_mgmt-NAVIGATORMETASERVER-9a89af62abe8393b48c78926720ffe2c_pid19656.hprof Despite that I have increased java heap size
... View more
03-29-2017
07:40 AM
after configure Navigator Metadata Server Heap, i m trying to restart Cloudera Management Service but i can't . I get : Cannot restart service when Host Monitor (master) is in STOPPING state In the Host Monitor log file : mars 29, 14:05:55.133 ERROR com.cloudera.cmon.firehose.Main
Could not fetch descriptor after 5 tries, exiting. and the number of files mgmt_mgmt-NAVIGATORMETASERVER* increased
... View more
03-29-2017
06:28 AM
Hi Jim, How to change the Navigator configuration to allocate enough memory to the JVM
... View more
03-29-2017
12:57 AM
I found the files using that space : -rw-------. 1 cloudera-scm cloudera-scm 359M Mar 27 14:40 mgmt_mgmt-NAVIGATOR-9a89af62abe8393b48c78926720ffe2c_pid28766.hprof It is repeated 40 times. And : -rw-------. 1 cloudera-scm cloudera-scm 761M Mar 27 15:10 mgmt_mgmt-NAVIGATORMETASERVER-9a89af62abe8393b48c78926720ffe2c_pid11739.hprof It is repeated 12 times. How to resolve this ?
... View more
03-28-2017
07:46 AM
when i do : du -sh / i get : du: cannot access ‘/proc/4982/task/4982/fd/4’: No such file or directory
du: cannot access ‘/proc/4982/task/4982/fdinfo/4’: No such file or directory
du: cannot access ‘/proc/4982/fd/4’: No such file or directory
du: cannot access ‘/proc/4982/fdinfo/4’: No such file or directory
34G /
... View more
03-28-2017
06:46 AM
I installed Cloudera using PATH B installation in 4 machines (VMs, Centos 7) 1 master and 3 slaves, after installation i get an error in clock synchronization in every slave, I resolve it when I do : systemctl start ntpd After a few minutes I get an error in master node and i can't display cloudera page (master:7180) although cloudera-scm-server status is running. I noticed afterwards that the hard drive of Master node is full: when I do : df -h I get : [root@master ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 34G 34G 20K 100% /
devtmpfs 4.1G 0 4.1G 0% /dev
tmpfs 4.1G 0 4.1G 0% /dev/shm
tmpfs 4.1G 8.7M 4.1G 1% /run
tmpfs 4.1G 0 4.1G 0% /sys/fs/cgroup
/dev/sda1 497M 212M 286M 43% /boot
/dev/mapper/centos-home 17G 36M 17G 1% /home
tmpfs 833M 0 833M 0% /run/user/0 I thought that maybe the ntpd log is behind all that. if / dir is full (use% = 100%) so the master can't desplay any think. Any help please to resolve this, and avoid hard disk bombardment of Master node. This is the third I'm trying to install cloudera and every time I have the same problem.
... View more
- Tags:
- Installation
Labels:
- Labels:
-
Cloudera Manager
05-11-2016
03:56 AM
Yes it's working now, thanks for your help.
... View more
05-11-2016
03:15 AM
thanks, but i have an other error with sink machine hostname. I do : FlumeUtils.createPollingStream(ssc,198.168.1.31,8020) the error is : overloaded method value createPollingStream with alternatives In the official site : val flumeStream = FlumeUtils.createPollingStream(streamingContext, [sink machine hostname], [sink port]) How can i enter the sink machine hostname ?
... View more
05-11-2016
01:20 AM
Hi i'm using spark streaming to analyse data arrived from flume. But i have an error with FlumeUtils, he says : not found value : FlumeUtils This is my code : import org.apache.spark.SparkConf
import org.apache.spark.streaming.flume._
import org.apache.spark.SparkContext
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.{ Seconds, StreamingContext }
object WordCount {
def main(args: Array[String]) {
val conf = new SparkConf()
.setAppName("File Count")
.setMaster("local[2]")
val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(10))
val flumeStream = FlumeUtils.createPollingStream(ssc,198.168.1.31,8020) // not found value : FlumeUtils
............
ssc.start()
ssc.awaitTermination()
}
} and this is pom.xml dependency: <dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.10.4</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-flume-sink_2.10</artifactId>
<version>1.5.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.5.0</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.3.2</version>
</dependency> thanks in advance for your reply !!!
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache Spark
04-25-2016
05:45 AM
I have found the solution : var addedRDD : org.apache.spark.rdd.RDD[(String,Int)] = sc.emptyRDD
... View more
04-25-2016
04:53 AM
I want to make a union for which RDD who I have in streaming this is my code : val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(10))
val file = ssc.textFileStream("hdfs://192.168.1.20:8020/user/sparkStreaming/input")
var test = file.map(x => (x.split(";")(0)+";"+x.split(";")(1), 1)).reduceByKey((x,y) => x+y)
var addedRDD = sc.emptyRDD
test.foreachRDD{ rdd =>
addedRDD = addedRDD union rdd
addedRDD.cache()
} but I have this error : type mismatch; found : org.apache.spark.rdd.RDD[(String, Int)] required: org.apache.spark.rdd.RDD[Nothing] And when I try to create an empty RDD with a given type, I have this error : type mismatch; found : org.apache.spark.rdd.RDD[(String, Int)] required: org.apache.spark.rdd.EmptyRDD[(String, Int)] How can I fix this problem? thanks in advance !!!
... View more
Labels:
- Labels:
-
Apache Spark
-
HDFS
04-07-2016
04:13 AM
I don't know why but I re-run and it works, but I have an empty _success file into the directory file1. here is the complete code : def main(args: Array[String]) {
val conf = new SparkConf()
.setAppName("File Count")
.setMaster("local[2]")
val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(1))
val file = ssc.textFileStream("/root/file/test/file")
file.foreachRDD(t=> {
val test = t.map(x => (x.split(" ")(0)+";"+x.split(" ")(1), 1)).reduceByKey((x,y) => x+y)
test.saveAsTextFile("/root/file/file1")
})
ssc.start()
ssc.awaitTermination()
}
... View more
04-07-2016
03:34 AM
16/04/06 14:09:52 INFO FileInputDStream: Duration for remembering RDDs set to 60000 ms for org.apache.spark.streaming.dstream.FileInputDStream@4bf57335
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.util.ThreadUtils$.runInNewThread$default$2()Z
at org.apache.spark.streaming.StreamingContext.liftedTree1$1(StreamingContext.scala:606)
at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:600)
at com.org.file.filecount.FileCount$.main(FileCount.scala:52)
at com.org.file.filecount.FileCount.main(FileCount.scala) there's a mismatch in the versions of dependencies and runtime so i do : <dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.6.1</version>
</dependency> And i am getting error like as the following : 16/04/07 11:23:56 WARN FileInputDStream: Error finding new files
java.io.IOException: Incomplete HDFS URI, no host: "/root/file/test"
... View more
04-06-2016
06:20 AM
I try it, and I get : 16/04/06 14:09:52 INFO FileInputDStream: Duration for remembering RDDs set to 60000 ms for org.apache.spark.streaming.dstream.FileInputDStream@4bf57335
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.util.ThreadUtils$.runInNewThread$default$2()Z
at org.apache.spark.streaming.StreamingContext.liftedTree1$1(StreamingContext.scala:606)
at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:600)
at com.org.file.filecount.FileCount$.main(FileCount.scala:52)
at com.org.file.filecount.FileCount.main(FileCount.scala)
... View more
04-06-2016
04:59 AM
this is my code : import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.scheduler.SparkListener
import org.apache.spark.scheduler.SparkListenerStageCompleted
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.{Seconds, StreamingContext}
object FileCount {
def main(args: Array[String]) {
val conf = new SparkConf()
.setAppName("File Count")
.setMaster("local")
val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(10))
val file = ssc.textFileStream("/root/file/test/f3")
file.foreachRDD(t=> {
val test = t.map(x => (x.split(" ")(0)+";"+x.split(" ")(1), 1)).reduceByKey((x,y) => x+y)
test.saveAsTextFile("/root/file/file1")
})
sc.stop()
}
}
... View more
04-06-2016
04:51 AM
It does not work, what is the problem? Here are my console: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/04/06 12:44:42 INFO SparkContext: Running Spark version 1.5.0
16/04/06 12:44:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/04/06 12:44:48 INFO SecurityManager: Changing view acls to: root
16/04/06 12:44:48 INFO SecurityManager: Changing modify acls to: root
16/04/06 12:44:48 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/04/06 12:45:00 INFO Slf4jLogger: Slf4jLogger started
16/04/06 12:45:00 INFO Remoting: Starting remoting
16/04/06 12:45:04 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.31:38825]
16/04/06 12:45:04 INFO Utils: Successfully started service 'sparkDriver' on port 38825.
16/04/06 12:45:04 INFO SparkEnv: Registering MapOutputTracker
16/04/06 12:45:04 INFO SparkEnv: Registering BlockManagerMaster
16/04/06 12:45:05 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-1b896884-d84a-4c39-b9dd-93decdb6ee0b
16/04/06 12:45:05 INFO MemoryStore: MemoryStore started with capacity 1027.3 MB
16/04/06 12:45:06 INFO HttpFileServer: HTTP File server directory is /tmp/spark-14a1c553-e160-4b93-8822-3b943e27edd1/httpd-849fa48d-e2de-46de-845a-a68a02f76b94
16/04/06 12:45:06 INFO HttpServer: Starting HTTP Server
16/04/06 12:45:08 INFO Utils: Successfully started service 'HTTP file server' on port 50992.
16/04/06 12:45:08 INFO SparkEnv: Registering OutputCommitCoordinator
16/04/06 12:45:11 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/04/06 12:45:11 INFO SparkUI: Started SparkUI at http://192.168.1.31:4040
16/04/06 12:45:12 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
16/04/06 12:45:12 INFO Executor: Starting executor ID driver on host localhost
16/04/06 12:45:15 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42498.
16/04/06 12:45:15 INFO NettyBlockTransferService: Server created on 42498
16/04/06 12:45:15 INFO BlockManagerMaster: Trying to register BlockManager
16/04/06 12:45:15 INFO BlockManagerMasterEndpoint: Registering block manager localhost:42498 with 1027.3 MB RAM, BlockManagerId(driver, localhost, 42498)
16/04/06 12:45:15 INFO BlockManagerMaster: Registered BlockManager
16/04/06 12:45:18 WARN StreamingContext: spark.master should be set as local[n], n > 1 in local mode if you have receivers to get data, otherwise Spark jobs will not get resources to process the received data.
16/04/06 12:45:22 INFO FileInputDStream: Duration for remembering RDDs set to 60000 ms for org.apache.spark.streaming.dstream.FileInputDStream@11fb9657
16/04/06 12:45:23 INFO SparkUI: Stopped Spark web UI at http://192.168.1.31:4040
16/04/06 12:45:23 INFO DAGScheduler: Stopping DAGScheduler
16/04/06 12:45:23 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/04/06 12:45:23 INFO MemoryStore: MemoryStore cleared
16/04/06 12:45:23 INFO BlockManager: BlockManager stopped
16/04/06 12:45:23 INFO BlockManagerMaster: BlockManagerMaster stopped
16/04/06 12:45:23 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/04/06 12:45:23 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/04/06 12:45:23 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/04/06 12:45:23 INFO SparkContext: Successfully stopped SparkContext
16/04/06 12:45:23 INFO ShutdownHookManager: Shutdown hook called
16/04/06 12:45:23 INFO ShutdownHookManager: Deleting directory /tmp/spark-14a1c553-e160-4b93-8822-3b943e27edd1 No creation of the file, nothing happens. what's wrong?
... View more
04-06-2016
01:22 AM
I try : val file = ssc.textFileStream("/root/file/test")
file.foreachRDD(t=> {
var test = file.map(x => (x.split(" ")(0)+";"+x.split(" ")(1), 1)).reduceByKey((x,y) => x+y)
test.saveAsTextFiles("/root/file/file1")
})
sc.stop() But it doesn't work
... View more
04-05-2016
08:26 AM
Hi, It is simple to display the result in RDD, for example: val sc = new SparkContext(conf)
val textFile = sc.textFile("/root/file/test")
val apps = textFile.map (line => line.split(";")(0))
.map(p=>(p,1)) // convert to countable tuples
.reduceByKey(_+_) // count keys
.collect() // collect the result
apps.foreach(println) And I have the result in my console.And if I want to save the output to a file I do: apps.saveAsTextFiles("/root/file/file1") But how I can do it now with DStream,this is my code: val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(10))
val file = ssc.textFileStream("/root/file/test")
var test = file.map(x => (x.split(" ")(0)+";"+x.split(" ")(1), 1)).reduceByKey((x,y) => x+y)
test.saveAsTextFiles("/root/file/file1")
sc.stop()
}
} But it doesn't work. Any help please !!
... View more
Labels:
- Labels:
-
Apache Spark