Member since
06-16-2014
2
Posts
0
Kudos Received
0
Solutions
10-22-2014
12:52 PM
set the port number in spark-en.sh file, SPARK-WORKER-PORT.
... View more
08-29-2014
11:53 AM
Case 1: when use java command line to run: #$JAVA_HOME/bin/java -cp $CLASSPATH -Dspark.master=spark://10.xxx.xxx.xxx:43191 com.cloudera.sparkwordcount.SparkWordCount hdfs://xxxxxx.com:8020/user/hdfs/spark/LICENSE 2 I got java.lang.NoSuchMethodError: com.google.common.HashFunction.hasInt(I)Lcom... 14/08/29 18:37:16 INFO spark.SecurityManager: Changing view acls to: root 14/08/29 18:37:16 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root) 14/08/29 18:37:17 INFO slf4j.Slf4jLogger: Slf4jLogger started 14/08/29 18:37:17 INFO Remoting: Starting remoting 14/08/29 18:37:17 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@Exxxxy-head.amers1.ciscloud:52049] 14/08/29 18:37:17 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@Exxxxxx.ciscloud:52049] 14/08/29 18:37:17 INFO spark.SparkEnv: Registering MapOutputTracker 14/08/29 18:37:17 INFO spark.SparkEnv: Registering BlockManagerMaster 14/08/29 18:37:17 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20140829183717-3369 14/08/29 18:37:17 INFO storage.MemoryStore: MemoryStore started with capacity 2.0 GB. 14/08/29 18:37:17 INFO network.ConnectionManager: Bound socket to port 45604 with id = ConnectionManagerId(xxxxx,45604) 14/08/29 18:37:17 INFO storage.BlockManagerMaster: Trying to register BlockManager 14/08/29 18:37:17 INFO storage.BlockManagerInfo: Registering block manager ETSInterDay-head.amers1.ciscloud:45604 with 2.0 GB RAM 14/08/29 18:37:17 INFO storage.BlockManagerMaster: Registered BlockManager 14/08/29 18:37:17 INFO spark.HttpServer: Starting HTTP Server 14/08/29 18:37:17 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/08/29 18:37:17 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:54315 14/08/29 18:37:17 INFO broadcast.HttpBroadcast: Broadcast server started at http://xxxxx:54315 14/08/29 18:37:17 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-19eccd14-bc32-4112-9e97-2197e059456b 14/08/29 18:37:17 INFO spark.HttpServer: Starting HTTP Server 14/08/29 18:37:17 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/08/29 18:37:17 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:50747 14/08/29 18:37:18 INFO server.Server: jetty-8.y.z-SNAPSHOT 14/08/29 18:37:18 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 14/08/29 18:37:18 INFO ui.SparkUI: Started SparkUI at http://xxxd:4040 14/08/29 18:37:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 14/08/29 18:37:18 INFO client.AppClient$ClientActor: Connecting to master spark://1xxxx... 14/08/29 18:37:18 WARN storage.BlockManager: Putting block broadcast_0 failed Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashInt(I)Lcom/google/common/hash/HashCode; at org.apache.spark.util.collection.OpenHashSet.org$apache$spark$util$collection$OpenHashSet$$hashcode(OpenHashSet.scala:261) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.getPos$mcI$sp(OpenHashSet.scala:165) at org.apache.spark.util.collection.OpenHashSet$mcI$sp.contains$mcI$sp(OpenHashSet.scala:102) at org.apache.spark.util.SizeEstimator$$anonfun$visitArray$2.apply$mcVI$sp(SizeEstimator.scala:214) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:210) at org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:169) at org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:161) at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:155) at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:75) at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:92) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:661) at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546) at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:812) at org.apache.spark.broadcast.HttpBroadcast.<init>(HttpBroadcast.scala:52) at org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:35) at org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:776) at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:545) at org.apache.spark.SparkContext.textFile(SparkContext.scala:457) at com.cloudera.sparkwordcount.SparkWordCount$.main(SparkWordCount.scala:17) at com.cloudera.sparkwordcount.SparkWordCount.main(SparkWordCount.scala) Case 2 ====== When use: ./spark-submit --class com.cloudera.sparkwordcount.SparkWordCount --master spark://xx.xxx.xxx.xxx:43191 /hadoop/cloudera/parcels/CDH/lib/spark/m/wordcount/target/sparkwordcount-0.0.1-SNAPSHOT.jar hdfs://xxxx.xxx.xxx:8020//user//hdfs//spark//LICENSE 2 I got: 14/08/29 18:41:59 INFO client.AppClient$ClientActor: Executor updated: app-20140829184159-0005/0 is now RUNNING 14/08/29 18:41:59 INFO client.AppClient$ClientActor: Executor updated: app-20140829184159-0005/1 is now RUNNING 14/08/29 18:41:59 INFO mapred.FileInputFormat: Total input paths to process : 1 14/08/29 18:42:00 INFO spark.SparkContext: Starting job: collect at SparkWordCount.scala:28 14/08/29 18:42:00 INFO scheduler.DAGScheduler: Registering RDD 4 (reduceByKey at SparkWordCount.scala:20) 14/08/29 18:42:00 INFO scheduler.DAGScheduler: Registering RDD 10 (reduceByKey at SparkWordCount.scala:26) 14/08/29 18:42:00 INFO scheduler.DAGScheduler: Got job 0 (collect at SparkWordCount.scala:28) with 2 output partitions (allowLocal=false) 14/08/29 18:42:00 INFO scheduler.DAGScheduler: Final stage: Stage 0(collect at SparkWordCount.scala:28) 14/08/29 18:42:00 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 1) 14/08/29 18:42:00 INFO scheduler.DAGScheduler: Missing parents: List(Stage 1) 14/08/29 18:42:00 INFO scheduler.DAGScheduler: Submitting Stage 2 (MapPartitionsRDD[4] at reduceByKey at SparkWordCount.scala:20), which has no missing parents 14/08/29 18:42:00 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 2 (MapPartitionsRDD[4] at reduceByKey at SparkWordCount.scala:20) 14/08/29 18:42:00 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 2 tasks 14/08/29 18:42:01 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@xxx:44712/user/Executor#-1200084333] with ID 1 14/08/29 18:42:01 INFO scheduler.TaskSetManager: Starting task 2.0:0 as TID 0 on executor 1: ETSInterDay-worker1.amers1.ciscloud (PROCESS_LOCAL) 14/08/29 18:42:01 INFO scheduler.TaskSetManager: Serialized task 2.0:0 as 2192 bytes in 2 ms 14/08/29 18:42:01 INFO scheduler.TaskSetManager: Starting task 2.0:1 as TID 1 on executor 1: ETSInterDay-worker1.amers1.ciscloud (PROCESS_LOCAL) 14/08/29 18:42:01 INFO scheduler.TaskSetManager: Serialized task 2.0:1 as 2192 bytes in 0 ms 14/08/29 18:42:01 INFO storage.BlockManagerInfo: Registering block manager ETSInterDay-worker1.amers1.ciscloud:41977 with 294.9 MB RAM 14/08/29 18:42:01 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@xxxxx:49084/user/Executor#593030937] with ID 0 14/08/29 18:42:02 INFO storage.BlockManagerInfo: Registering block managerxxxxxx:55303 with 294.9 MB RAM 14/08/29 18:42:02 WARN scheduler.TaskSetManager: Lost TID 0 (task 2.0:0) 14/08/29 18:42:02 WARN scheduler.TaskSetManager: Loss was due to java.lang.RuntimeException java.lang.RuntimeException: java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:657) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:389) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362) at org.apache.spark.SparkContext$$anonfun$22.apply(SparkContext.scala:546) at org.apache.spark.SparkContext$$anonfun$22.apply(SparkContext.scala:546) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$1.apply(HadoopRDD.scala:145) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$1.apply(HadoopRDD.scala:145) at scala.Option.map(Option.scala:145) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:145) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:189) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:184) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:93) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Please help!
... View more
Labels:
- Labels:
-
Apache Spark