Member since
02-09-2017
12
Posts
1
Kudos Received
0
Solutions
03-12-2018
10:44 PM
Hi, I am working on exercise as part of HDPCS SPARK using python. I am able to start the spark session using pyspark and create RDD collection orders, however when trying to execute the first method of collection it throws below error WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources Below are the list of commands i have executed on hortonworks sandbox running locally.
Please guide me to resolve this issue.
>>> from pyspark import SparkConf, SparkContext >>> conf = SparkConf().setMaster("yarn-client").setAppName("Testing").set("spark.ui.port","12356") >>> sc = SparkContext(conf=conf) 18/03/12 22:34:20 INFO SparkContext: Running Spark version 1.6.0 18/03/12 22:34:20 INFO SecurityManager: Changing view acls to: root 18/03/12 22:34:20 INFO SecurityManager: Changing modify acls to: root 18/03/12 22:34:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 18/03/12 22:34:20 INFO Utils: Successfully started service 'sparkDriver' on port 39811. 18/03/12 22:34:21 INFO Slf4jLogger: Slf4jLogger started 18/03/12 22:34:21 INFO Remoting: Starting remoting 18/03/12 22:34:21 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.0.2.15:53335] 18/03/12 22:34:21 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 53335. 18/03/12 22:34:21 INFO SparkEnv: Registering MapOutputTracker 18/03/12 22:34:21 INFO SparkEnv: Registering BlockManagerMaster 18/03/12 22:34:21 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-b66cea24-c68d-47e2-b34c-1fa0025732ac 18/03/12 22:34:21 INFO MemoryStore: MemoryStore started with capacity 511.5 MB 18/03/12 22:34:21 INFO SparkEnv: Registering OutputCommitCoordinator 18/03/12 22:34:21 INFO Server: jetty-8.y.z-SNAPSHOT 18/03/12 22:34:21 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:12356 18/03/12 22:34:21 INFO Utils: Successfully started service 'SparkUI' on port 12356. 18/03/12 22:34:21 INFO SparkUI: Started SparkUI at http://10.0.2.15:12356 spark.yarn.driver.memoryOverhead is set but does not apply in client mode. 18/03/12 22:34:21 INFO TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 18/03/12 22:34:21 INFO RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050 18/03/12 22:34:21 INFO Client: Requesting a new application from cluster with 1 NodeManagers 18/03/12 22:34:21 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (2250 MB per container) 18/03/12 22:34:21 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 18/03/12 22:34:21 INFO Client: Setting up container launch context for our AM 18/03/12 22:34:21 INFO Client: Setting up the launch environment for our AM container 18/03/12 22:34:21 INFO Client: Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://sandbox.hortonworks.com:8020/hdp/apps/2.4.0.0-169/spark/spark-hdp-assembly.jar 18/03/12 22:34:21 INFO Client: Preparing resources for our AM container 18/03/12 22:34:21 INFO Client: Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://sandbox.hortonworks.com:8020/hdp/apps/2.4.0.0-169/spark/spark-hdp-assembly.jar 18/03/12 22:34:21 INFO Client: Source and destination file systems are the same. Not copying hdfs://sandbox.hortonworks.com:8020/hdp/apps/2.4.0.0-169/spark/spark-hdp-assembly.jar 18/03/12 22:34:21 INFO Client: Uploading resource file:/usr/hdp/2.4.0.0-169/spark/python/lib/pyspark.zip -> hdfs://sandbox.hortonworks.com:8020/user/root/.sparkStaging/application_1520892237425_0004/pyspark.zip 18/03/12 22:34:21 INFO Client: Uploading resource file:/usr/hdp/2.4.0.0-169/spark/python/lib/py4j-0.9-src.zip -> hdfs://sandbox.hortonworks.com:8020/user/root/.sparkStaging/application_1520892237425_0004/py4j-0.9-src.zip 18/03/12 22:34:21 INFO Client: Uploading resource file:/tmp/spark-d191e566-891d-4e3f-b9ff-33e90cb075ad/__spark_conf__7367311365358338483.zip -> hdfs://sandbox.hortonworks.com:8020/user/root/.sparkStaging/application_1520892237425_0004/__spark_conf__7367311365358338483.zip 18/03/12 22:34:21 INFO SecurityManager: Changing view acls to: root 18/03/12 22:34:21 INFO SecurityManager: Changing modify acls to: root 18/03/12 22:34:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 18/03/12 22:34:21 INFO Client: Submitting application 4 to ResourceManager 18/03/12 22:34:21 INFO YarnClientImpl: Submitted application application_1520892237425_0004 18/03/12 22:34:21 INFO SchedulerExtensionServices: Starting Yarn extension services with app application_1520892237425_0004 and attemptId None 18/03/12 22:34:22 INFO Client: Application report for application_1520892237425_0004 (state: ACCEPTED) 18/03/12 22:34:22 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1520894061798 final status: UNDEFINED tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1520892237425_0004/ user: root 18/03/12 22:34:23 INFO Client: Application report for application_1520892237425_0004 (state: ACCEPTED) 18/03/12 22:34:24 INFO Client: Application report for application_1520892237425_0004 (state: ACCEPTED) 18/03/12 22:34:25 INFO Client: Application report for application_1520892237425_0004 (state: ACCEPTED) 18/03/12 22:34:26 INFO Client: Application report for application_1520892237425_0004 (state: ACCEPTED) 18/03/12 22:34:27 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null) 18/03/12 22:34:27 INFO YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> sandbox.hortonworks.com, PROXY_URI_BASES -> http://sandbox.hortonworks.com:8088/proxy/application_1520892237425_0004), /proxy/application_1520892237425_0004 18/03/12 22:34:27 INFO JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 18/03/12 22:34:27 INFO Client: Application report for application_1520892237425_0004 (state: ACCEPTED) 18/03/12 22:34:28 INFO Client: Application report for application_1520892237425_0004 (state: RUNNING) 18/03/12 22:34:28 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: 10.0.2.15 ApplicationMaster RPC port: 0 queue: default start time: 1520894061798 final status: UNDEFINED tracking URL: http://sandbox.hortonworks.com:8088/proxy/application_1520892237425_0004/ user: root 18/03/12 22:34:28 INFO YarnClientSchedulerBackend: Application application_1520892237425_0004 has started running. 18/03/12 22:34:28 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 58749. 18/03/12 22:34:28 INFO NettyBlockTransferService: Server created on 58749 18/03/12 22:34:28 INFO BlockManagerMaster: Trying to register BlockManager 18/03/12 22:34:28 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:58749 with 511.5 MB RAM, BlockManagerId(driver, 10.0.2.15, 58749) 18/03/12 22:34:28 INFO BlockManagerMaster: Registered BlockManager 18/03/12 22:34:28 INFO EventLoggingListener: Logging events to hdfs:///spark-history/application_1520892237425_0004 18/03/12 22:34:51 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms) >>> orders = sc.textFile("/user/root/data/retail_db/orders") 18/03/12 22:36:47 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 221.0 KB, free 221.0 KB) 18/03/12 22:36:48 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 26.3 KB, free 247.3 KB) 18/03/12 22:36:48 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.0.2.15:58749 (size: 26.3 KB, free: 511.5 MB) 18/03/12 22:36:48 INFO SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:-2 >>> orders.first() 18/03/12 22:36:58 INFO FileInputFormat: Total input paths to process : 1 18/03/12 22:36:58 INFO SparkContext: Starting job: runJob at PythonRDD.scala:393 18/03/12 22:36:58 INFO DAGScheduler: Got job 0 (runJob at PythonRDD.scala:393) with 1 output partitions 18/03/12 22:36:58 INFO DAGScheduler: Final stage: ResultStage 0 (runJob at PythonRDD.scala:393) 18/03/12 22:36:58 INFO DAGScheduler: Parents of final stage: List() 18/03/12 22:36:58 INFO DAGScheduler: Missing parents: List() 18/03/12 22:36:58 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] at RDD at PythonRDD.scala:43), which has no missing parents 18/03/12 22:36:58 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.8 KB, free 252.1 KB) 18/03/12 22:36:58 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.0 KB, free 255.1 KB) 18/03/12 22:36:58 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.0.2.15:58749 (size: 3.0 KB, free: 511.5 MB) 18/03/12 22:36:58 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 18/03/12 22:36:58 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (PythonRDD[2] at RDD at PythonRDD.scala:43) 18/03/12 22:36:58 INFO YarnScheduler: Adding task set 0.0 with 1 tasks 18/03/12 22:37:13 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
... View more
Labels:
03-02-2017
08:14 AM
1 Kudo
[root@sandbox spark2-client]# echo $SPARK_HOME
/usr/hdp/current/spark2-client
[root@sandbox spark2-client]# echo $SPARK_MAJOR_VERSION
2
[root@sandbox spark2-client]# spark-submit examples/src/main/python/sql/hive.py
SPARK_MAJOR_VERSION is set to 2, using Spark2
/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/context.py:477: DeprecationWarning: HiveContext is deprecated in Spark 2.0.0. Please use SparkSession.builder.enableHiveSupport().getOrCreate() instead.
17/03/02 05:49:39 INFO SparkContext: Running Spark version 2.0.0.2.5.0.0-1245
17/03/02 05:49:39 INFO SecurityManager: Changing view acls to: root
17/03/02 05:49:39 INFO SecurityManager: Changing modify acls to: root
17/03/02 05:49:39 INFO SecurityManager: Changing view acls groups to:
17/03/02 05:49:39 INFO SecurityManager: Changing modify acls groups to:
17/03/02 05:49:39 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
17/03/02 05:49:40 INFO Utils: Successfully started service 'sparkDriver' on port 45030.
17/03/02 05:49:40 INFO SparkEnv: Registering MapOutputTracker
17/03/02 05:49:40 INFO SparkEnv: Registering BlockManagerMaster
17/03/02 05:49:40 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-f8e6d5f0-b6c0-4a75-b907-80c673e542e6
17/03/02 05:49:40 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
17/03/02 05:49:40 INFO SparkEnv: Registering OutputCommitCoordinator
17/03/02 05:49:40 INFO log: Logging initialized @2924ms
17/03/02 05:49:40 INFO Server: jetty-9.2.z-SNAPSHOT
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@59be7bdf{/jobs,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2751e757{/jobs/json,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@443ba0f8{/jobs/job,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@31f66cbf{/jobs/job/json,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1b3c89dd{/stages,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@337faeb2{/stages/json,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2a5ae445{/stages/stage,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@284bf625{/stages/stage/json,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5746e090{/stages/pool,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1902def4{/stages/pool/json,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6407695d{/storage,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@138a4126{/storage/json,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@73a861a7{/storage/rdd,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@7c52e458{/storage/rdd/json,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2340270e{/environment,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6282d131{/environment/json,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5217219f{/executors,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@398c7fa1{/executors/json,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1bd647c9{/executors/threadDump,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6fa906db{/executors/threadDump/json,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@459d968{/static,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5ef2df35{/,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@7733232d{/api,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@500b7cce{/stages/stage/kill,null,AVAILABLE}
17/03/02 05:49:40 INFO ServerConnector: Started ServerConnector@47de86bf{HTTP/1.1}{0.0.0.0:4040}
17/03/02 05:49:40 INFO Server: Started @3123ms
17/03/02 05:49:40 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/03/02 05:49:40 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.17.0.2:4040
17/03/02 05:49:41 INFO Utils: Copying /usr/hdp/2.5.0.0-1245/spark2/examples/src/main/python/sql/hive.py to /tmp/spark-7f3c621f-df4b-4a65-9415-837b1061251b/userFiles-8cb69f6f-def0-4cd1-bf2a-bb0d2b188789/hive.py
17/03/02 05:49:41 INFO SparkContext: Added file file:/usr/hdp/2.5.0.0-1245/spark2/examples/src/main/python/sql/hive.py at file:/usr/hdp/2.5.0.0-1245/spark2/examples/src/main/python/sql/hive.py with timestamp 1488433781202
17/03/02 05:49:41 INFO Executor: Starting executor ID driver on host localhost
17/03/02 05:49:41 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 58192.
17/03/02 05:49:41 INFO NettyBlockTransferService: Server created on 172.17.0.2:58192
17/03/02 05:49:41 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.17.0.2, 58192)
17/03/02 05:49:41 INFO BlockManagerMasterEndpoint: Registering block manager 172.17.0.2:58192 with 366.3 MB RAM, BlockManagerId(driver, 172.17.0.2, 58192)
17/03/02 05:49:41 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.17.0.2, 58192)
17/03/02 05:49:41 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@21d06d8d{/metrics/json,null,AVAILABLE}
17/03/02 05:49:42 INFO EventLoggingListener: Logging events to hdfs:///spark2-history/local-1488433781315
17/03/02 05:49:43 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6d88da34{/SQL,null,AVAILABLE}
17/03/02 05:49:43 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@648a572{/SQL/json,null,AVAILABLE}
17/03/02 05:49:43 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@8fcafb4{/SQL/execution,null,AVAILABLE}
17/03/02 05:49:43 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@55585ab9{/SQL/execution/json,null,AVAILABLE}
17/03/02 05:49:43 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@586ddab5{/static/sql,null,AVAILABLE}
17/03/02 05:49:43 INFO HiveSharedState: Warehouse path is '/usr/hdp/2.5.0.0-1245/spark2/spark-warehouse'.
17/03/02 05:49:43 INFO SparkSqlParser: Parsing command: CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive
17/03/02 05:49:45 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
17/03/02 05:49:46 INFO metastore: Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
17/03/02 05:49:46 INFO metastore: Connected to metastore.
17/03/02 05:49:46 INFO SessionState: Created local directory: /tmp/49a0b52b-7510-447c-ac41-888f1984cfb5_resources
17/03/02 05:49:46 INFO SessionState: Created HDFS directory: /tmp/hive/root/49a0b52b-7510-447c-ac41-888f1984cfb5
17/03/02 05:49:46 INFO SessionState: Created local directory: /tmp/root/49a0b52b-7510-447c-ac41-888f1984cfb5
17/03/02 05:49:46 INFO SessionState: Created HDFS directory: /tmp/hive/root/49a0b52b-7510-447c-ac41-888f1984cfb5/_tmp_space.db
17/03/02 05:49:46 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /usr/hdp/2.5.0.0-1245/spark2/spark-warehouse
17/03/02 05:49:46 INFO SessionState: Created local directory: /tmp/367a3eb3-d83b-46ba-96ae-7763226e38ef_resources
17/03/02 05:49:47 INFO SessionState: Created HDFS directory: /tmp/hive/root/367a3eb3-d83b-46ba-96ae-7763226e38ef
17/03/02 05:49:47 INFO SessionState: Created local directory: /tmp/root/367a3eb3-d83b-46ba-96ae-7763226e38ef
17/03/02 05:49:47 INFO SessionState: Created HDFS directory: /tmp/hive/root/367a3eb3-d83b-46ba-96ae-7763226e38ef/_tmp_space.db
17/03/02 05:49:47 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /usr/hdp/2.5.0.0-1245/spark2/spark-warehouse
Traceback (most recent call last):
File "/usr/hdp/2.5.0.0-1245/spark2/examples/src/main/python/sql/hive.py", line 85, in <module>
spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive")
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/session.py", line 541, in sql
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o40.sql.
: java.lang.ClassNotFoundException: Failed to find data source: hive. Please find packages at http://spark-packages.org
at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:145)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:78)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:78)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:310)
at org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:103)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:186)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:167)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:211)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: hive.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:130)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:130)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:130)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:130)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:130)
... 30 more
17/03/02 05:49:48 INFO SparkContext: Invoking stop() from shutdown hook
17/03/02 05:49:48 INFO ServerConnector: Stopped ServerConnector@47de86bf{HTTP/1.1}{0.0.0.0:4040}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@500b7cce{/stages/stage/kill,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@7733232d{/api,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@5ef2df35{/,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@459d968{/static,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@6fa906db{/executors/threadDump/json,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@1bd647c9{/executors/threadDump,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@398c7fa1{/executors/json,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@5217219f{/executors,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@6282d131{/environment/json,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@2340270e{/environment,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@7c52e458{/storage/rdd/json,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@73a861a7{/storage/rdd,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@138a4126{/storage/json,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@6407695d{/storage,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@1902def4{/stages/pool/json,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@5746e090{/stages/pool,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@284bf625{/stages/stage/json,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@2a5ae445{/stages/stage,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@337faeb2{/stages/json,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@1b3c89dd{/stages,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@31f66cbf{/jobs/job/json,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@443ba0f8{/jobs/job,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@2751e757{/jobs/json,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@59be7bdf{/jobs,null,UNAVAILABLE}
17/03/02 05:49:48 INFO SparkUI: Stopped Spark web UI at http://172.17.0.2:4040
17/03/02 05:49:48 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/03/02 05:49:48 INFO MemoryStore: MemoryStore cleared
17/03/02 05:49:48 INFO BlockManager: BlockManager stopped
17/03/02 05:49:48 INFO BlockManagerMaster: BlockManagerMaster stopped
17/03/02 05:49:48 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/03/02 05:49:48 INFO SparkContext: Successfully stopped SparkContext
17/03/02 05:49:48 INFO ShutdownHookManager: Shutdown hook called
17/03/02 05:49:48 INFO ShutdownHookManager: Deleting directory /tmp/spark-7f3c621f-df4b-4a65-9415-837b1061251b
17/03/02 05:49:48 INFO ShutdownHookManager: Deleting directory /tmp/spark-7f3c621f-df4b-4a65-9415-837b1061251b/pyspark-c7630113-ce82-4002-ae65-9f9a4f7edc07
I am trying to follow the spark tutorial using python for Hive Tables. Tutorial is available at http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables. I am running the python program hive.py but it eroded out with traceback error. Can someone guide me to resolve this issue. I am new to SPARK and HADOOP. I feel this issue has something related to spark-warhouse but I am not able to connect. Thanks Ram G.
... View more
Labels:
- Labels:
-
Apache Spark
02-12-2017
05:14 AM
[spark@sandbox spark2-client]$ echo $SPARK_HOME
/usr/hdp/current/spark2-client
[spark@sandbox spark2-client]$ echo $SPARK_MAJOR_VERSION
2
[spark@sandbox spark2-client]$ ./bin/spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/02/12 03:58:15 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://172.17.0.2:4040
Spark context available as 'sc' (master = local[*], app id = local-1486871892126).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.0.2.5.0.0-1245
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_111)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val df = sqlContext.jsonFile("people.json")
<console>:23: error: not found: value sqlContext
val df = sqlContext.jsonFile("people.json")
^
@Predrag Minovic I tried setting SPARK_MAJOR_VERSION=2. its not working.
... View more
02-12-2017
05:00 AM
Yes, I used export SPARK MAJOR VERSION=2. without "_". I guess that is the mistake. I will try with export SPARK_MAJOR_VERSION=2
... View more
02-12-2017
04:42 AM
[spark@sandbox spark2-client]$ hdfs dfs -ls /user/spark
Found 3 items
drwxr-xr-x - spark hdfs 0 2017-02-11 17:04 /user/spark/.sparkStaging
-rwxrwxrwx 1 spark hdfs 73 2017-02-12 00:21 /user/spark/people.json
-rwxrwxrwx 1 spark hdfs 32 2017-02-12 00:18 /user/spark/people.txt
[spark@sandbox spark2-client]$ ./bin/spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
17/02/12 03:00:11 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://172.17.0.2:4040
Spark context available as 'sc' (master = local[*], app id = local-1486868408718).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.0.2.5.0.0-1245
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_111)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val df = sqlContext.jsonFile("people.json")
<console>:23: error: not found: value sqlContext
val df = sqlContext.jsonFile("people.json")
... View more
02-12-2017
04:36 AM
I am following tutorial a-lap-around-apache-spark,
For the section "Using the Spark DataFrame API" I am stucked at below step
val df = sqlContext.jsonFile("people.json") It throws error
<console>:23: error: not found: value sqlContext. Please guide me if you have come across similar issue. I tried few posts but no luck. Below are actual steps.
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
02-11-2017
11:05 PM
please see the set of commands I am trying to executing.
[root@sandbox tmp]# pwd
/tmp
[root@sandbox tmp]# ls -ltr | grep word
-rw-r--r-- 1 root root 2 Oct 25 08:13 words.txt
-rw-r--r-- 1 root root 128 Feb 11 19:39 wordFile.txt
[root@sandbox tmp]# hdfs dfs -ls /tmp/data/
-rwxrwxrwx 1 root hdfs 10411 2017-02-10 03:55 /tmp/data
[root@sandbox tmp]# su spark
[spark@sandbox tmp]$ hdfs dfs -put wordFile.txt /tmp/data/
put: `/tmp/data': File exists
[spark@sandbox tmp]$ hdfs dfs -copyFromLocal wordFile.txt /tmp/data/
copyFromLocal: `/tmp/data': File exists
[spark@sandbox tmp]$ sudo -u spark hdfs dfs -ls /tmp/data/
spark is not in the sudoers file. This incident will be reported.
[spark@sandbox tmp]$ exit
exit
[root@sandbox tmp]# sudo -u spark hdfs dfs -ls /tmp/data/
-rwxrwxrwx 1 root hdfs 10411 2017-02-10 03:55 /tmp/data
[root@sandbox tmp]# su -u spark hdfs dfs -ls /tmp/data/
su: invalid option -- 'u'
... View more