Created 03-02-2017 08:14 AM
[root@sandbox spark2-client]# echo $SPARK_HOME /usr/hdp/current/spark2-client [root@sandbox spark2-client]# echo $SPARK_MAJOR_VERSION 2 [root@sandbox spark2-client]# spark-submit examples/src/main/python/sql/hive.py SPARK_MAJOR_VERSION is set to 2, using Spark2 /usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/context.py:477: DeprecationWarning: HiveContext is deprecated in Spark 2.0.0. Please use SparkSession.builder.enableHiveSupport().getOrCreate() instead. 17/03/02 05:49:39 INFO SparkContext: Running Spark version 2.0.0.2.5.0.0-1245 17/03/02 05:49:39 INFO SecurityManager: Changing view acls to: root 17/03/02 05:49:39 INFO SecurityManager: Changing modify acls to: root 17/03/02 05:49:39 INFO SecurityManager: Changing view acls groups to: 17/03/02 05:49:39 INFO SecurityManager: Changing modify acls groups to: 17/03/02 05:49:39 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 17/03/02 05:49:40 INFO Utils: Successfully started service 'sparkDriver' on port 45030. 17/03/02 05:49:40 INFO SparkEnv: Registering MapOutputTracker 17/03/02 05:49:40 INFO SparkEnv: Registering BlockManagerMaster 17/03/02 05:49:40 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-f8e6d5f0-b6c0-4a75-b907-80c673e542e6 17/03/02 05:49:40 INFO MemoryStore: MemoryStore started with capacity 366.3 MB 17/03/02 05:49:40 INFO SparkEnv: Registering OutputCommitCoordinator 17/03/02 05:49:40 INFO log: Logging initialized @2924ms 17/03/02 05:49:40 INFO Server: jetty-9.2.z-SNAPSHOT 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@59be7bdf{/jobs,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2751e757{/jobs/json,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@443ba0f8{/jobs/job,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@31f66cbf{/jobs/job/json,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1b3c89dd{/stages,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@337faeb2{/stages/json,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2a5ae445{/stages/stage,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@284bf625{/stages/stage/json,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5746e090{/stages/pool,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1902def4{/stages/pool/json,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6407695d{/storage,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@138a4126{/storage/json,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@73a861a7{/storage/rdd,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@7c52e458{/storage/rdd/json,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2340270e{/environment,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6282d131{/environment/json,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5217219f{/executors,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@398c7fa1{/executors/json,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1bd647c9{/executors/threadDump,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6fa906db{/executors/threadDump/json,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@459d968{/static,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5ef2df35{/,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@7733232d{/api,null,AVAILABLE} 17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@500b7cce{/stages/stage/kill,null,AVAILABLE} 17/03/02 05:49:40 INFO ServerConnector: Started ServerConnector@47de86bf{HTTP/1.1}{0.0.0.0:4040} 17/03/02 05:49:40 INFO Server: Started @3123ms 17/03/02 05:49:40 INFO Utils: Successfully started service 'SparkUI' on port 4040. 17/03/02 05:49:40 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.17.0.2:4040 17/03/02 05:49:41 INFO Utils: Copying /usr/hdp/2.5.0.0-1245/spark2/examples/src/main/python/sql/hive.py to /tmp/spark-7f3c621f-df4b-4a65-9415-837b1061251b/userFiles-8cb69f6f-def0-4cd1-bf2a-bb0d2b188789/hive.py 17/03/02 05:49:41 INFO SparkContext: Added file file:/usr/hdp/2.5.0.0-1245/spark2/examples/src/main/python/sql/hive.py at file:/usr/hdp/2.5.0.0-1245/spark2/examples/src/main/python/sql/hive.py with timestamp 1488433781202 17/03/02 05:49:41 INFO Executor: Starting executor ID driver on host localhost 17/03/02 05:49:41 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 58192. 17/03/02 05:49:41 INFO NettyBlockTransferService: Server created on 172.17.0.2:58192 17/03/02 05:49:41 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.17.0.2, 58192) 17/03/02 05:49:41 INFO BlockManagerMasterEndpoint: Registering block manager 172.17.0.2:58192 with 366.3 MB RAM, BlockManagerId(driver, 172.17.0.2, 58192) 17/03/02 05:49:41 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.17.0.2, 58192) 17/03/02 05:49:41 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@21d06d8d{/metrics/json,null,AVAILABLE} 17/03/02 05:49:42 INFO EventLoggingListener: Logging events to hdfs:///spark2-history/local-1488433781315 17/03/02 05:49:43 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6d88da34{/SQL,null,AVAILABLE} 17/03/02 05:49:43 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@648a572{/SQL/json,null,AVAILABLE} 17/03/02 05:49:43 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@8fcafb4{/SQL/execution,null,AVAILABLE} 17/03/02 05:49:43 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@55585ab9{/SQL/execution/json,null,AVAILABLE} 17/03/02 05:49:43 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@586ddab5{/static/sql,null,AVAILABLE} 17/03/02 05:49:43 INFO HiveSharedState: Warehouse path is '/usr/hdp/2.5.0.0-1245/spark2/spark-warehouse'. 17/03/02 05:49:43 INFO SparkSqlParser: Parsing command: CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive 17/03/02 05:49:45 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes. 17/03/02 05:49:46 INFO metastore: Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083 17/03/02 05:49:46 INFO metastore: Connected to metastore. 17/03/02 05:49:46 INFO SessionState: Created local directory: /tmp/49a0b52b-7510-447c-ac41-888f1984cfb5_resources 17/03/02 05:49:46 INFO SessionState: Created HDFS directory: /tmp/hive/root/49a0b52b-7510-447c-ac41-888f1984cfb5 17/03/02 05:49:46 INFO SessionState: Created local directory: /tmp/root/49a0b52b-7510-447c-ac41-888f1984cfb5 17/03/02 05:49:46 INFO SessionState: Created HDFS directory: /tmp/hive/root/49a0b52b-7510-447c-ac41-888f1984cfb5/_tmp_space.db 17/03/02 05:49:46 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /usr/hdp/2.5.0.0-1245/spark2/spark-warehouse 17/03/02 05:49:46 INFO SessionState: Created local directory: /tmp/367a3eb3-d83b-46ba-96ae-7763226e38ef_resources 17/03/02 05:49:47 INFO SessionState: Created HDFS directory: /tmp/hive/root/367a3eb3-d83b-46ba-96ae-7763226e38ef 17/03/02 05:49:47 INFO SessionState: Created local directory: /tmp/root/367a3eb3-d83b-46ba-96ae-7763226e38ef 17/03/02 05:49:47 INFO SessionState: Created HDFS directory: /tmp/hive/root/367a3eb3-d83b-46ba-96ae-7763226e38ef/_tmp_space.db 17/03/02 05:49:47 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /usr/hdp/2.5.0.0-1245/spark2/spark-warehouse Traceback (most recent call last): File "/usr/hdp/2.5.0.0-1245/spark2/examples/src/main/python/sql/hive.py", line 85, in <module> spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive") File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/session.py", line 541, in sql File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__ File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o40.sql. : java.lang.ClassNotFoundException: Failed to find data source: hive. Please find packages at http://spark-packages.org at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:145) at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:78) at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:78) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:310) at org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:103) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:186) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:167) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:211) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: hive.DefaultSource at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:130) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:130) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:130) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:130) at scala.util.Try.orElse(Try.scala:84) at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:130) ... 30 more 17/03/02 05:49:48 INFO SparkContext: Invoking stop() from shutdown hook 17/03/02 05:49:48 INFO ServerConnector: Stopped ServerConnector@47de86bf{HTTP/1.1}{0.0.0.0:4040} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@500b7cce{/stages/stage/kill,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@7733232d{/api,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@5ef2df35{/,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@459d968{/static,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@6fa906db{/executors/threadDump/json,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@1bd647c9{/executors/threadDump,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@398c7fa1{/executors/json,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@5217219f{/executors,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@6282d131{/environment/json,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@2340270e{/environment,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@7c52e458{/storage/rdd/json,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@73a861a7{/storage/rdd,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@138a4126{/storage/json,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@6407695d{/storage,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@1902def4{/stages/pool/json,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@5746e090{/stages/pool,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@284bf625{/stages/stage/json,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@2a5ae445{/stages/stage,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@337faeb2{/stages/json,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@1b3c89dd{/stages,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@31f66cbf{/jobs/job/json,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@443ba0f8{/jobs/job,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@2751e757{/jobs/json,null,UNAVAILABLE} 17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@59be7bdf{/jobs,null,UNAVAILABLE} 17/03/02 05:49:48 INFO SparkUI: Stopped Spark web UI at http://172.17.0.2:4040 17/03/02 05:49:48 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 17/03/02 05:49:48 INFO MemoryStore: MemoryStore cleared 17/03/02 05:49:48 INFO BlockManager: BlockManager stopped 17/03/02 05:49:48 INFO BlockManagerMaster: BlockManagerMaster stopped 17/03/02 05:49:48 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 17/03/02 05:49:48 INFO SparkContext: Successfully stopped SparkContext 17/03/02 05:49:48 INFO ShutdownHookManager: Shutdown hook called 17/03/02 05:49:48 INFO ShutdownHookManager: Deleting directory /tmp/spark-7f3c621f-df4b-4a65-9415-837b1061251b 17/03/02 05:49:48 INFO ShutdownHookManager: Deleting directory /tmp/spark-7f3c621f-df4b-4a65-9415-837b1061251b/pyspark-c7630113-ce82-4002-ae65-9f9a4f7edc07
I am trying to follow the spark tutorial using python for Hive Tables. Tutorial is available at http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables. I am running the python program hive.py but it eroded out with traceback error. Can someone guide me to resolve this issue. I am new to SPARK and HADOOP.
I feel this issue has something related to spark-warhouse but I am not able to connect.
Thanks
Ram G.
Created 03-08-2017 07:14 PM
You are trying to follow-up a demo with Spark 2.1, but your sandbox is at best at 2.0. You should follow tutorials that are supported by the version of Spark deployed on HDP 2.5 sandbox, Spark 1.6.2. Spark 2.0 is also possible, but I would wait for HDP 2.6 sandbox which is to be released probably next month.
The error is self-explanatory. If you wish to address it, you could add those missing libraries.
Created 03-08-2017 07:14 PM
You are trying to follow-up a demo with Spark 2.1, but your sandbox is at best at 2.0. You should follow tutorials that are supported by the version of Spark deployed on HDP 2.5 sandbox, Spark 1.6.2. Spark 2.0 is also possible, but I would wait for HDP 2.6 sandbox which is to be released probably next month.
The error is self-explanatory. If you wish to address it, you could add those missing libraries.