Support Questions

Find answers, ask questions, and share your expertise

: java.lang.ClassNotFoundException: Failed to find data source: hive. Please find packages at http://spark-packages.org

avatar
Contributor
[root@sandbox spark2-client]# echo $SPARK_HOME
/usr/hdp/current/spark2-client
[root@sandbox spark2-client]# echo $SPARK_MAJOR_VERSION
2
[root@sandbox spark2-client]# spark-submit examples/src/main/python/sql/hive.py
SPARK_MAJOR_VERSION is set to 2, using Spark2
/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/context.py:477: DeprecationWarning: HiveContext is deprecated in Spark 2.0.0. Please use SparkSession.builder.enableHiveSupport().getOrCreate() instead.
17/03/02 05:49:39 INFO SparkContext: Running Spark version 2.0.0.2.5.0.0-1245
17/03/02 05:49:39 INFO SecurityManager: Changing view acls to: root
17/03/02 05:49:39 INFO SecurityManager: Changing modify acls to: root
17/03/02 05:49:39 INFO SecurityManager: Changing view acls groups to:
17/03/02 05:49:39 INFO SecurityManager: Changing modify acls groups to:
17/03/02 05:49:39 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
17/03/02 05:49:40 INFO Utils: Successfully started service 'sparkDriver' on port 45030.
17/03/02 05:49:40 INFO SparkEnv: Registering MapOutputTracker
17/03/02 05:49:40 INFO SparkEnv: Registering BlockManagerMaster
17/03/02 05:49:40 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-f8e6d5f0-b6c0-4a75-b907-80c673e542e6
17/03/02 05:49:40 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
17/03/02 05:49:40 INFO SparkEnv: Registering OutputCommitCoordinator
17/03/02 05:49:40 INFO log: Logging initialized @2924ms
17/03/02 05:49:40 INFO Server: jetty-9.2.z-SNAPSHOT
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@59be7bdf{/jobs,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2751e757{/jobs/json,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@443ba0f8{/jobs/job,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@31f66cbf{/jobs/job/json,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1b3c89dd{/stages,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@337faeb2{/stages/json,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2a5ae445{/stages/stage,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@284bf625{/stages/stage/json,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5746e090{/stages/pool,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1902def4{/stages/pool/json,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6407695d{/storage,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@138a4126{/storage/json,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@73a861a7{/storage/rdd,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@7c52e458{/storage/rdd/json,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@2340270e{/environment,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6282d131{/environment/json,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5217219f{/executors,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@398c7fa1{/executors/json,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@1bd647c9{/executors/threadDump,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6fa906db{/executors/threadDump/json,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@459d968{/static,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@5ef2df35{/,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@7733232d{/api,null,AVAILABLE}
17/03/02 05:49:40 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@500b7cce{/stages/stage/kill,null,AVAILABLE}
17/03/02 05:49:40 INFO ServerConnector: Started ServerConnector@47de86bf{HTTP/1.1}{0.0.0.0:4040}
17/03/02 05:49:40 INFO Server: Started @3123ms
17/03/02 05:49:40 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/03/02 05:49:40 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.17.0.2:4040
17/03/02 05:49:41 INFO Utils: Copying /usr/hdp/2.5.0.0-1245/spark2/examples/src/main/python/sql/hive.py to /tmp/spark-7f3c621f-df4b-4a65-9415-837b1061251b/userFiles-8cb69f6f-def0-4cd1-bf2a-bb0d2b188789/hive.py
17/03/02 05:49:41 INFO SparkContext: Added file file:/usr/hdp/2.5.0.0-1245/spark2/examples/src/main/python/sql/hive.py at file:/usr/hdp/2.5.0.0-1245/spark2/examples/src/main/python/sql/hive.py with timestamp 1488433781202
17/03/02 05:49:41 INFO Executor: Starting executor ID driver on host localhost
17/03/02 05:49:41 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 58192.
17/03/02 05:49:41 INFO NettyBlockTransferService: Server created on 172.17.0.2:58192
17/03/02 05:49:41 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 172.17.0.2, 58192)
17/03/02 05:49:41 INFO BlockManagerMasterEndpoint: Registering block manager 172.17.0.2:58192 with 366.3 MB RAM, BlockManagerId(driver, 172.17.0.2, 58192)
17/03/02 05:49:41 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 172.17.0.2, 58192)
17/03/02 05:49:41 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@21d06d8d{/metrics/json,null,AVAILABLE}
17/03/02 05:49:42 INFO EventLoggingListener: Logging events to hdfs:///spark2-history/local-1488433781315
17/03/02 05:49:43 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@6d88da34{/SQL,null,AVAILABLE}
17/03/02 05:49:43 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@648a572{/SQL/json,null,AVAILABLE}
17/03/02 05:49:43 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@8fcafb4{/SQL/execution,null,AVAILABLE}
17/03/02 05:49:43 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@55585ab9{/SQL/execution/json,null,AVAILABLE}
17/03/02 05:49:43 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@586ddab5{/static/sql,null,AVAILABLE}
17/03/02 05:49:43 INFO HiveSharedState: Warehouse path is '/usr/hdp/2.5.0.0-1245/spark2/spark-warehouse'.
17/03/02 05:49:43 INFO SparkSqlParser: Parsing command: CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive
17/03/02 05:49:45 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
17/03/02 05:49:46 INFO metastore: Trying to connect to metastore with URI thrift://sandbox.hortonworks.com:9083
17/03/02 05:49:46 INFO metastore: Connected to metastore.
17/03/02 05:49:46 INFO SessionState: Created local directory: /tmp/49a0b52b-7510-447c-ac41-888f1984cfb5_resources
17/03/02 05:49:46 INFO SessionState: Created HDFS directory: /tmp/hive/root/49a0b52b-7510-447c-ac41-888f1984cfb5
17/03/02 05:49:46 INFO SessionState: Created local directory: /tmp/root/49a0b52b-7510-447c-ac41-888f1984cfb5
17/03/02 05:49:46 INFO SessionState: Created HDFS directory: /tmp/hive/root/49a0b52b-7510-447c-ac41-888f1984cfb5/_tmp_space.db
17/03/02 05:49:46 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /usr/hdp/2.5.0.0-1245/spark2/spark-warehouse
17/03/02 05:49:46 INFO SessionState: Created local directory: /tmp/367a3eb3-d83b-46ba-96ae-7763226e38ef_resources
17/03/02 05:49:47 INFO SessionState: Created HDFS directory: /tmp/hive/root/367a3eb3-d83b-46ba-96ae-7763226e38ef
17/03/02 05:49:47 INFO SessionState: Created local directory: /tmp/root/367a3eb3-d83b-46ba-96ae-7763226e38ef
17/03/02 05:49:47 INFO SessionState: Created HDFS directory: /tmp/hive/root/367a3eb3-d83b-46ba-96ae-7763226e38ef/_tmp_space.db
17/03/02 05:49:47 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /usr/hdp/2.5.0.0-1245/spark2/spark-warehouse
Traceback (most recent call last):
  File "/usr/hdp/2.5.0.0-1245/spark2/examples/src/main/python/sql/hive.py", line 85, in <module>
    spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive")
  File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/session.py", line 541, in sql
  File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
  File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o40.sql.
: java.lang.ClassNotFoundException: Failed to find data source: hive. Please find packages at http://spark-packages.org
        at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:145)
        at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:78)
        at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:78)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:310)
        at org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:103)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:186)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:167)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:211)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: hive.DefaultSource
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:130)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:130)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:130)
        at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:130)
        at scala.util.Try.orElse(Try.scala:84)
        at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:130)
        ... 30 more
17/03/02 05:49:48 INFO SparkContext: Invoking stop() from shutdown hook
17/03/02 05:49:48 INFO ServerConnector: Stopped ServerConnector@47de86bf{HTTP/1.1}{0.0.0.0:4040}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@500b7cce{/stages/stage/kill,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@7733232d{/api,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@5ef2df35{/,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@459d968{/static,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@6fa906db{/executors/threadDump/json,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@1bd647c9{/executors/threadDump,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@398c7fa1{/executors/json,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@5217219f{/executors,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@6282d131{/environment/json,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@2340270e{/environment,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@7c52e458{/storage/rdd/json,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@73a861a7{/storage/rdd,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@138a4126{/storage/json,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@6407695d{/storage,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@1902def4{/stages/pool/json,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@5746e090{/stages/pool,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@284bf625{/stages/stage/json,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@2a5ae445{/stages/stage,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@337faeb2{/stages/json,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@1b3c89dd{/stages,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@31f66cbf{/jobs/job/json,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@443ba0f8{/jobs/job,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@2751e757{/jobs/json,null,UNAVAILABLE}
17/03/02 05:49:48 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@59be7bdf{/jobs,null,UNAVAILABLE}
17/03/02 05:49:48 INFO SparkUI: Stopped Spark web UI at http://172.17.0.2:4040
17/03/02 05:49:48 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/03/02 05:49:48 INFO MemoryStore: MemoryStore cleared
17/03/02 05:49:48 INFO BlockManager: BlockManager stopped
17/03/02 05:49:48 INFO BlockManagerMaster: BlockManagerMaster stopped
17/03/02 05:49:48 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/03/02 05:49:48 INFO SparkContext: Successfully stopped SparkContext
17/03/02 05:49:48 INFO ShutdownHookManager: Shutdown hook called
17/03/02 05:49:48 INFO ShutdownHookManager: Deleting directory /tmp/spark-7f3c621f-df4b-4a65-9415-837b1061251b
17/03/02 05:49:48 INFO ShutdownHookManager: Deleting directory /tmp/spark-7f3c621f-df4b-4a65-9415-837b1061251b/pyspark-c7630113-ce82-4002-ae65-9f9a4f7edc07

I am trying to follow the spark tutorial using python for Hive Tables. Tutorial is available at http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables. I am running the python program hive.py but it eroded out with traceback error. Can someone guide me to resolve this issue. I am new to SPARK and HADOOP.

I feel this issue has something related to spark-warhouse but I am not able to connect.

Thanks

Ram G.

1 ACCEPTED SOLUTION

avatar
Super Guru

@Ram Ghase

You are trying to follow-up a demo with Spark 2.1, but your sandbox is at best at 2.0. You should follow tutorials that are supported by the version of Spark deployed on HDP 2.5 sandbox, Spark 1.6.2. Spark 2.0 is also possible, but I would wait for HDP 2.6 sandbox which is to be released probably next month.

The error is self-explanatory. If you wish to address it, you could add those missing libraries.

View solution in original post

1 REPLY 1

avatar
Super Guru

@Ram Ghase

You are trying to follow-up a demo with Spark 2.1, but your sandbox is at best at 2.0. You should follow tutorials that are supported by the version of Spark deployed on HDP 2.5 sandbox, Spark 1.6.2. Spark 2.0 is also possible, but I would wait for HDP 2.6 sandbox which is to be released probably next month.

The error is self-explanatory. If you wish to address it, you could add those missing libraries.