Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Error with Zeppelin/SparkR when querying hive table with JSON Serde format..help?

Highlighted

Error with Zeppelin/SparkR when querying hive table with JSON Serde format..help?

Contributor

Hi All,

I'm using SparkR in Zeppelin to access a hive table using the following query.

y <- sql(sqlContext, "select * from db.node_sample") 

head(y)

The hive table uses the JSON SerDe format (as the data is stored in JSON files under the hood) but I get the following error:

INFO [2016-09-12 15:33:02,285] ({pool-2-thread-2} ZeppelinR.java[createRScript]:362) - File /tmp/zeppelin_sparkr-3550873602551042380.R created
 INFO [2016-09-12 15:33:03,084] ({nioEventLoopGroup-2-2} ParseDriver.java[parse]:185) - Parsing command: select * from smartclean_raw.node_sample LIMIT 1
 INFO [2016-09-12 15:33:03,877] ({nioEventLoopGroup-2-2} ParseDriver.java[parse]:209) - Parse Completed
ERROR [2016-09-12 15:33:04,015] ({nioEventLoopGroup-2-2} MetaStoreUtils.java[getDeserializer]:397) - error in initSerDe: java.lang.ClassNotFoundException Class org.apache.hive.hcatalog.data.JsonSerDe not found
java.lang.ClassNotFoundException: Class org.apache.hive.hcatalog.data.JsonSerDe not found

I'd assume I need to specify a dependency in the Spark interpreter in Zeppelin? I did try setting org.apache.hive:hive-jdbc:0.14.0 as a dependency on the spark interpreter but this did not resolve the situation.

Any thoughts?

Mike

2 REPLIES 2

Re: Error with Zeppelin/SparkR when querying hive table with JSON Serde format..help?

Contributor
Using Python version 2.7.6 (default, Jun 22 2015 17:58:13)SparkContext available as sc, HiveContext available as sqlContext.>>> sample = sqlContext.table("db.node_sample")16/09/13 10:24:55 INFO HiveContext: Initializing execution hive, version 1.2.116/09/13 10:24:55 INFO ClientWrapper: Inspected Hadoop version: 2.7.1.2.4.0.0-16916/09/13 10:24:55 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.1.2.4.0.0-16916/09/13 10:24:55 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore16/09/13 10:24:55 INFO ObjectStore: ObjectStore, initialize called16/09/13 10:24:56 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored16/09/13 10:24:56 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored16/09/13 10:24:56 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)16/09/13 10:24:56 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)16/09/13 10:24:57 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"16/09/13 10:24:58 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.16/09/13 10:24:58 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.16/09/13 10:24:59 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.16/09/13 10:24:59 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.16/09/13 10:24:59 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY16/09/13 10:24:59 INFO ObjectStore: Initialized ObjectStore16/09/13 10:24:59 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.016/09/13 10:24:59 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException16/09/13 10:24:59 INFO HiveMetaStore: Added admin role in metastore16/09/13 10:24:59 INFO HiveMetaStore: Added public role in metastore16/09/13 10:24:59 INFO HiveMetaStore: No user is added in admin role, since config is empty16/09/13 10:24:59 INFO HiveMetaStore: 0: get_all_databases16/09/13 10:24:59 INFO audit: ugi=x=unknown-ip-addrcmd=get_all_databases16/09/13 10:24:59 INFO HiveMetaStore: 0: get_functions: db=default pat=*16/09/13 10:24:59 INFO audit: ugi=x=unknown-ip-addrcmd=get_functions: db=default pat=*16/09/13 10:24:59 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.16/09/13 10:24:59 INFO SessionState: Created local directory: /tmp/x/09/13 10:24:59 INFO SessionState: Created local directory: /tmp/b0e77e51-1c8e-4e53-87c0-13eeb8db13fd_resources16/09/13 10:25:00 INFO SessionState: Created HDFS directory: /tmp/hive/x/b0e77e51-1c8e-4e53-87c0-13eeb8db13fd16/09/13 10:25:00 INFO SessionState: Created local directory: /tmp/x/b0e77e51-1c8e-4e53-87c0-13eeb8db13fd16/09/13 10:25:00 INFO SessionState: Created HDFS directory: /tmp/hive/x/b0e77e51-1c8e-4e53-87c0-13eeb8db13fd/_tmp_space.db16/09/13 10:25:00 INFO HiveContext: default warehouse location is /user/hive/warehouse16/09/13 10:25:00 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.16/09/13 10:25:00 INFO ClientWrapper: Inspected Hadoop version: 2.7.1.2.4.0.0-16916/09/13 10:25:00 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.1.2.4.0.0-16916/09/13 10:25:00 INFO metastore: Trying to connect to metastore with URI thrift://<domainname>:908316/09/13 10:25:00 INFO metastore: Connected to metastore.16/09/13 10:25:00 INFO SessionState: Created local directory: /tmp/24904629-f2ca-42f0-8cc0-a67c96f14580_resources16/09/13 10:25:00 INFO SessionState: Created HDFS directory: /tmp/hive/x/24904629-f2ca-42f0-8cc0-a67c96f1458016/09/13 10:25:01 INFO SessionState: Created local directory: /tmp/x/24904629-f2ca-42f0-8cc0-a67c96f1458016/09/13 10:25:01 INFO SessionState: Created HDFS directory: /tmp/hive/x/24904629-f2ca-42f0-8cc0-a67c96f14580/_tmp_space.db16/09/13 10:25:01 ERROR log: error in initSerDe: java.lang.ClassNotFoundException Class org.apache.hive.hcatalog.data.JsonSerDe not foundjava.lang.ClassNotFoundException: Class org.apache.hive.hcatalog.data.JsonSerDe not found



The issue isnt specific to zeppelin/SparkR as I've just trying running the same command in pyspark CLI and receive the same error. Any ideas?

Highlighted

Re: Error with Zeppelin/SparkR when querying hive table with JSON Serde format..help?

Expert Contributor
@mike harding

try adding hive-hcatalog-core.jar. if your zeppelin version is > 0.6 you can add this jar in Dependencies in Spark interpreter.

Don't have an account?
Coming from Hortonworks? Activate your account here