Support Questions

Find answers, ask questions, and share your expertise

Connection timeout in spark program (Eclipse)

avatar
Contributor

Hi All,

 

I created a simple select query using HiveContext in spark but it seems I have a connectivity issues between my cluster and Eclipse IDE. Is there any configuration should I apply to resolve this? 

 

Logs :

 

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/02/29 21:04:43 INFO SparkContext: Running Spark version 1.6.0
16/02/29 21:04:54 INFO SecurityManager: Changing view acls to: Orson
16/02/29 21:04:54 INFO SecurityManager: Changing modify acls to: Orson
16/02/29 21:04:54 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Orson); users with modify permissions: Set(Orson)
16/02/29 21:04:55 INFO Utils: Successfully started service 'sparkDriver' on port 60476.
16/02/29 21:04:55 INFO Slf4jLogger: Slf4jLogger started
16/02/29 21:04:55 INFO Remoting: Starting remoting
16/02/29 21:04:55 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.181.1:60489]
16/02/29 21:04:55 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 60489.
16/02/29 21:04:55 INFO SparkEnv: Registering MapOutputTracker
16/02/29 21:04:55 INFO SparkEnv: Registering BlockManagerMaster
16/02/29 21:04:55 INFO DiskBlockManager: Created local directory at C:\Users\Orson\AppData\Local\Temp\blockmgr-7fdfa330-9d04-4bdc-a933-30b63c7a1710
16/02/29 21:04:55 INFO MemoryStore: MemoryStore started with capacity 6.4 GB
16/02/29 21:04:55 INFO SparkEnv: Registering OutputCommitCoordinator
16/02/29 21:04:56 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/02/29 21:04:56 INFO SparkUI: Started SparkUI at http://192.168.181.1:4040
16/02/29 21:04:56 INFO Executor: Starting executor ID driver on host localhost
16/02/29 21:04:56 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 60496.
16/02/29 21:04:56 INFO NettyBlockTransferService: Server created on 60496
16/02/29 21:04:56 INFO BlockManagerMaster: Trying to register BlockManager
16/02/29 21:04:56 INFO BlockManagerMasterEndpoint: Registering block manager localhost:60496 with 6.4 GB RAM, BlockManagerId(driver, localhost, 60496)
16/02/29 21:04:56 INFO BlockManagerMaster: Registered BlockManager
16/02/29 21:04:57 INFO HiveContext: Initializing execution hive, version 1.2.1
16/02/29 21:04:57 INFO ClientWrapper: Inspected Hadoop version: 2.2.0
16/02/29 21:04:57 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.2.0
16/02/29 21:04:57 INFO deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
16/02/29 21:04:57 INFO deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
16/02/29 21:04:57 INFO deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
16/02/29 21:04:57 INFO deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
16/02/29 21:04:57 INFO deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
16/02/29 21:04:57 INFO deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
16/02/29 21:04:57 INFO deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
16/02/29 21:04:57 INFO deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
16/02/29 21:04:58 WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist
16/02/29 21:04:58 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/02/29 21:04:58 INFO ObjectStore: ObjectStore, initialize called
16/02/29 21:04:58 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
16/02/29 21:04:58 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
16/02/29 21:05:07 WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist
16/02/29 21:05:07 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/02/29 21:05:09 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/02/29 21:05:09 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/02/29 21:05:16 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/02/29 21:05:16 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/02/29 21:05:18 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
16/02/29 21:05:18 INFO ObjectStore: Initialized ObjectStore
16/02/29 21:05:18 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/02/29 21:05:19 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/02/29 21:05:21 WARN : Your hostname, solvento-orson resolves to a loopback/non-reachable address: fe80:0:0:0:0:5efe:c0a8:4801%42, but we couldn't find any external IP address!
16/02/29 21:05:27 INFO HiveMetaStore: Added admin role in metastore
16/02/29 21:05:27 INFO HiveMetaStore: Added public role in metastore
16/02/29 21:05:28 INFO HiveMetaStore: No user is added in admin role, since config is empty
16/02/29 21:05:28 INFO HiveMetaStore: 0: get_all_databases
16/02/29 21:05:28 INFO audit: ugi=Orson ip=unknown-ip-addr cmd=get_all_databases
16/02/29 21:05:28 INFO HiveMetaStore: 0: get_functions: db=default pat=*
16/02/29 21:05:28 INFO audit: ugi=Orson ip=unknown-ip-addr cmd=get_functions: db=default pat=*
16/02/29 21:05:28 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
16/02/29 21:05:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/02/29 21:05:30 INFO SessionState: Created local directory: C:/Users/Orson/AppData/Local/Temp/5746f851-1a41-433f-9183-380cc74b23e9_resources
16/02/29 21:05:30 INFO SessionState: Created HDFS directory: /tmp/hive/Orson/5746f851-1a41-433f-9183-380cc74b23e9
16/02/29 21:05:30 INFO SessionState: Created local directory: C:/Users/Orson/AppData/Local/Temp/Orson/5746f851-1a41-433f-9183-380cc74b23e9
16/02/29 21:05:30 INFO SessionState: Created HDFS directory: /tmp/hive/Orson/5746f851-1a41-433f-9183-380cc74b23e9/_tmp_space.db
16/02/29 21:05:31 WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist
16/02/29 21:05:31 INFO HiveContext: default warehouse location is /user/hive/warehouse
16/02/29 21:05:31 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
16/02/29 21:05:31 INFO ClientWrapper: Inspected Hadoop version: 2.2.0
16/02/29 21:05:31 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.2.0
16/02/29 21:05:31 INFO deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
16/02/29 21:05:31 INFO deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
16/02/29 21:05:31 INFO deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
16/02/29 21:05:31 INFO deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
16/02/29 21:05:31 INFO deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
16/02/29 21:05:31 INFO deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
16/02/29 21:05:31 INFO deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
16/02/29 21:05:31 INFO deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
16/02/29 21:05:31 WARN HiveConf: HiveConf of name hive.enable.spark.execution.engine does not exist
16/02/29 21:05:31 INFO metastore: Trying to connect to metastore with URI thrift://quickstart.cloudera:9083
16/02/29 21:05:31 INFO metastore: Connected to metastore.
16/02/29 21:05:32 INFO SessionState: Created local directory: C:/Users/Orson/AppData/Local/Temp/634cfd84-fe30-4c5b-bce3-629f998e4c07_resources
16/02/29 21:05:32 INFO SessionState: Created HDFS directory: /tmp/hive/Orson/634cfd84-fe30-4c5b-bce3-629f998e4c07
16/02/29 21:05:32 INFO SessionState: Created local directory: C:/Users/Orson/AppData/Local/Temp/Orson/634cfd84-fe30-4c5b-bce3-629f998e4c07
16/02/29 21:05:32 INFO SessionState: Created HDFS directory: /tmp/hive/Orson/634cfd84-fe30-4c5b-bce3-629f998e4c07/_tmp_space.db
16/02/29 21:05:32 INFO ParseDriver: Parsing command: select * from flume_data limit 1
16/02/29 21:05:33 INFO ParseDriver: Parse Completed
16/02/29 21:05:34 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 423.8 KB, free 423.8 KB)
16/02/29 21:05:34 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 37.4 KB, free 461.2 KB)
16/02/29 21:05:34 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:60496 (size: 37.4 KB, free: 6.4 GB)
16/02/29 21:05:34 INFO SparkContext: Created broadcast 0 from show at SparkPi.scala:23
16/02/29 21:05:35 INFO FileInputFormat: Total input paths to process : 5
16/02/29 21:05:35 INFO SparkContext: Starting job: show at SparkPi.scala:23
16/02/29 21:05:35 INFO DAGScheduler: Got job 0 (show at SparkPi.scala:23) with 1 output partitions
16/02/29 21:05:35 INFO DAGScheduler: Final stage: ResultStage 0 (show at SparkPi.scala:23)
16/02/29 21:05:35 INFO DAGScheduler: Parents of final stage: List()
16/02/29 21:05:35 INFO DAGScheduler: Missing parents: List()
16/02/29 21:05:35 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at show at SparkPi.scala:23), which has no missing parents
16/02/29 21:05:35 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.8 KB, free 467.0 KB)
16/02/29 21:05:35 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.3 KB, free 470.2 KB)
16/02/29 21:05:35 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:60496 (size: 3.3 KB, free: 6.4 GB)
16/02/29 21:05:35 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
16/02/29 21:05:35 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at show at SparkPi.scala:23)
16/02/29 21:05:35 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
16/02/29 21:05:35 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,ANY, 2108 bytes)
16/02/29 21:05:35 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
16/02/29 21:05:35 INFO HadoopRDD: Input split: hdfs://quickstart.cloudera:8020/user/cloudera/flume/landing/FlumeData.1455241486989.log:0+28
16/02/29 21:05:35 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
16/02/29 21:05:35 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
16/02/29 21:05:35 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
16/02/29 21:05:35 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
16/02/29 21:05:35 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
16/02/29 21:05:56 WARN DFSClient: Failed to connect to /172.31.1.118:50010 for block, add to deadNodes and continue. java.net.ConnectException: Connection timed out: no further information
java.net.ConnectException: Connection timed out: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.hdfs.DFSInputStream.newTcpPeer(DFSInputStream.java:955)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1107)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:533)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:749)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:206)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:45)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:246)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:208)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:350)
at scala.collection.Iterator$class.foreach(Iterator.scala:750)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1202)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:295)
at scala.collection.AbstractIterator.to(Iterator.scala:1202)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:287)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1202)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:274)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1202)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
16/02/29 21:05:56 INFO DFSClient: Could not obtain BP-1614789257-127.0.0.1-1447880472993:blk_1073744408_3606 from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...
16/02/29 21:05:56 WARN DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 882.0029599386166 msec.

 

Thanks!

1 ACCEPTED SOLUTION

avatar
Contributor

I moved my development environment relative to hadoop cluster to avoid the issue. 

View solution in original post

6 REPLIES 6

avatar
Contributor

 It appears that it uses private ip of the cloudera vm instead of EC2 elastic IP. Is there a way we can re route this? Thanks!

avatar
Contributor

I moved my development environment relative to hadoop cluster to avoid the issue. 

avatar

Hi Orson, 

 

I am facing similar issue my POC environment edge node.  I didn't understand your solution. can you please be more specific on this.

 

Thanks
Kishore

avatar
Contributor
Hi Kishore,
Make sure that your spark program recognizes the ip of the hadoop cluster.

avatar
Contributor

Means you need to move the program into the cluster network. my in case i can't always move my program inside cluster how to do fix the issue then?

avatar
Expert Contributor

Hi Ranan,

 

Because this is an older thread and already marked as solved, lets keep this conversation on the other thread you opened: http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Debug-Spark-program-in-Eclipse-Data...