Support Questions

ram_ghase · ‎03-12-2018

Hi,

I am working on exercise as part of HDPCS SPARK using python. I am able to start the spark session using pyspark and create RDD collection orders, however when trying to execute the first method of collection it throws below error

WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Below are the list of commands i have executed on hortonworks sandbox running locally. Please guide me to resolve this issue. >>> from pyspark import SparkConf, SparkContext
>>> conf = SparkConf().setMaster("yarn-client").setAppName("Testing").set("spark.ui.port","12356")
>>> sc = SparkContext(conf=conf)
18/03/12 22:34:20 INFO SparkContext: Running Spark version 1.6.0
18/03/12 22:34:20 INFO SecurityManager: Changing view acls to: root
18/03/12 22:34:20 INFO SecurityManager: Changing modify acls to: root
18/03/12 22:34:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
18/03/12 22:34:20 INFO Utils: Successfully started service 'sparkDriver' on port 39811.
18/03/12 22:34:21 INFO Slf4jLogger: Slf4jLogger started
18/03/12 22:34:21 INFO Remoting: Starting remoting
18/03/12 22:34:21 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.0.2.15:53335]
18/03/12 22:34:21 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 53335.
18/03/12 22:34:21 INFO SparkEnv: Registering MapOutputTracker
18/03/12 22:34:21 INFO SparkEnv: Registering BlockManagerMaster
18/03/12 22:34:21 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-b66cea24-c68d-47e2-b34c-1fa0025732ac
18/03/12 22:34:21 INFO MemoryStore: MemoryStore started with capacity 511.5 MB
18/03/12 22:34:21 INFO SparkEnv: Registering OutputCommitCoordinator
18/03/12 22:34:21 INFO Server: jetty-8.y.z-SNAPSHOT
18/03/12 22:34:21 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:12356
18/03/12 22:34:21 INFO Utils: Successfully started service 'SparkUI' on port 12356.
18/03/12 22:34:21 INFO SparkUI: Started SparkUI at http://10.0.2.15:12356
spark.yarn.driver.memoryOverhead is set but does not apply in client mode.
18/03/12 22:34:21 INFO TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
18/03/12 22:34:21 INFO RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
18/03/12 22:34:21 INFO Client: Requesting a new application from cluster with 1 NodeManagers
18/03/12 22:34:21 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (2250 MB per container)
18/03/12 22:34:21 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
18/03/12 22:34:21 INFO Client: Setting up container launch context for our AM
18/03/12 22:34:21 INFO Client: Setting up the launch environment for our AM container
18/03/12 22:34:21 INFO Client: Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://sandbox.hortonworks.com:8020/hdp/apps/2.4.0.0-169/spark/spark-hdp-assembly.jar
18/03/12 22:34:21 INFO Client: Preparing resources for our AM container
18/03/12 22:34:21 INFO Client: Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://sandbox.hortonworks.com:8020/hdp/apps/2.4.0.0-169/spark/spark-hdp-assembly.jar
18/03/12 22:34:21 INFO Client: Source and destination file systems are the same. Not copying hdfs://sandbox.hortonworks.com:8020/hdp/apps/2.4.0.0-169/spark/spark-hdp-assembly.jar
18/03/12 22:34:21 INFO Client: Uploading resource file:/usr/hdp/2.4.0.0-169/spark/python/lib/pyspark.zip -> hdfs://sandbox.hortonworks.com:8020/user/root/.sparkStaging/application_1520892237425_0004/pyspark.zip
18/03/12 22:34:21 INFO Client: Uploading resource file:/usr/hdp/2.4.0.0-169/spark/python/lib/py4j-0.9-src.zip -> hdfs://sandbox.hortonworks.com:8020/user/root/.sparkStaging/application_1520892237425_0004/py4j-0.9-src.zip
18/03/12 22:34:21 INFO Client: Uploading resource file:/tmp/spark-d191e566-891d-4e3f-b9ff-33e90cb075ad/__spark_conf__7367311365358338483.zip -> hdfs://sandbox.hortonworks.com:8020/user/root/.sparkStaging/application_1520892237425_0004/__spark_conf__7367311365358338483.zip
18/03/12 22:34:21 INFO SecurityManager: Changing view acls to: root
18/03/12 22:34:21 INFO SecurityManager: Changing modify acls to: root
18/03/12 22:34:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
18/03/12 22:34:21 INFO Client: Submitting application 4 to ResourceManager
18/03/12 22:34:21 INFO YarnClientImpl: Submitted application application_1520892237425_0004
18/03/12 22:34:21 INFO SchedulerExtensionServices: Starting Yarn extension services with app application_1520892237425_0004 and attemptId None
18/03/12 22:34:22 INFO Client: Application report for application_1520892237425_0004 (state: ACCEPTED)
18/03/12 22:34:22 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1520894061798
final status: UNDEFINED
tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1520892237425_0004/
user: root
18/03/12 22:34:23 INFO Client: Application report for application_1520892237425_0004 (state: ACCEPTED)
18/03/12 22:34:24 INFO Client: Application report for application_1520892237425_0004 (state: ACCEPTED)
18/03/12 22:34:25 INFO Client: Application report for application_1520892237425_0004 (state: ACCEPTED)
18/03/12 22:34:26 INFO Client: Application report for application_1520892237425_0004 (state: ACCEPTED)
18/03/12 22:34:27 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null)
18/03/12 22:34:27 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> sandbox.hortonworks.com, PROXY_URI_BASES -> http://sandbox.hortonworks.com:8088/proxy/application_1520892237425_0004), /proxy/application_1520892237425_0004
18/03/12 22:34:27 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
18/03/12 22:34:27 INFO Client: Application report for application_1520892237425_0004 (state: ACCEPTED)
18/03/12 22:34:28 INFO Client: Application report for application_1520892237425_0004 (state: RUNNING)
18/03/12 22:34:28 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: 10.0.2.15
ApplicationMaster RPC port: 0
queue: default
start time: 1520894061798
final status: UNDEFINED
tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1520892237425_0004/
user: root
18/03/12 22:34:28 INFO YarnClientSchedulerBackend: Application application_1520892237425_0004 has started running.
18/03/12 22:34:28 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 58749.
18/03/12 22:34:28 INFO NettyBlockTransferService: Server created on 58749
18/03/12 22:34:28 INFO BlockManagerMaster: Trying to register BlockManager
18/03/12 22:34:28 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:58749 with 511.5 MB RAM, BlockManagerId(driver, 10.0.2.15, 58749)
18/03/12 22:34:28 INFO BlockManagerMaster: Registered BlockManager
18/03/12 22:34:28 INFO EventLoggingListener: Logging events to hdfs:///spark-history/application_1520892237425_0004
18/03/12 22:34:51 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
>>> orders = sc.textFile("/user/root/data/retail_db/orders")
18/03/12 22:36:47 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 221.0 KB, free 221.0 KB)
18/03/12 22:36:48 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 26.3 KB, free 247.3 KB)
18/03/12 22:36:48 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.0.2.15:58749 (size: 26.3 KB, free: 511.5 MB)
18/03/12 22:36:48 INFO SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:-2
>>> orders.first()
18/03/12 22:36:58 INFO FileInputFormat: Total input paths to process : 1
18/03/12 22:36:58 INFO SparkContext: Starting job: runJob at PythonRDD.scala:393
18/03/12 22:36:58 INFO DAGScheduler: Got job 0 (runJob at PythonRDD.scala:393) with 1 output partitions
18/03/12 22:36:58 INFO DAGScheduler: Final stage: ResultStage 0 (runJob at PythonRDD.scala:393)
18/03/12 22:36:58 INFO DAGScheduler: Parents of final stage: List()
18/03/12 22:36:58 INFO DAGScheduler: Missing parents: List()
18/03/12 22:36:58 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] at RDD at PythonRDD.scala:43), which has no missing parents
18/03/12 22:36:58 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.8 KB, free 252.1 KB)
18/03/12 22:36:58 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.0 KB, free 255.1 KB)
18/03/12 22:36:58 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.0.2.15:58749 (size: 3.0 KB, free: 511.5 MB)
18/03/12 22:36:58 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
18/03/12 22:36:58 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (PythonRDD[2] at RDD at PythonRDD.scala:43)
18/03/12 22:36:58 INFO YarnScheduler: Adding task set 0.0 with 1 tasks
18/03/12 22:37:13 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

ahallam · ‎03-18-2018

It's more likely you don't have enough ram memory, double check the size of your queue.