Reply
Jug
New Contributor
Posts: 6
Registered: ‎02-06-2015

I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

Exception in thread "Thread-2194" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at py4j.reflection.TypeUtil.getClass(TypeUtil.java:265)
at py4j.reflection.TypeUtil.forName(TypeUtil.java:245)
at py4j.commands.ReflectionCommand.getUnknownMember(ReflectionCommand.java:153)
at py4j.commands.ReflectionCommand.execute(ReflectionCommand.java:82)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 
Having read articles online it says that i need to add hive jar path in spark conf files
spark.executor.extraClassPath
spark.executor.extraLibraryPath
 
what is the hive jar path and what should be values of above configuration setings.
Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

Coincidentally, at virtually the same moment in another conversation, Juliet here solved a nearly identical problem. I'm posting her reply:

 

 

I ran in to a similar problem last week when running a spark program on the data science cluster. In order to use the hive context in spark, I had to add hive jars to the class path of my driver program. For the spark-shell, I think this amounts to using the `--jars` flag when you launch the shell. The following command generates a comma separated list of hive jars (excluding guava, which I was warned was a conflicting version that would cause trouble), for me on my cluster. The path in the VM may vary, but this is a way to parse it out:
 
#need to explicitly set hive jars on drivers, seemingly also executors. Grab everything in hive lib *except guava*
HIVE_CLASSPATH=$(find /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hive/lib/ -name '*.jar' \
-not -name 'guava*' -print0 | sed 's/\x0/,/g')
 
Then your launch command would be ` spark-shell --master yarn-client --jars $HIVE_CLASSPATH`
Jug
New Contributor
Posts: 6
Registered: ‎02-06-2015

Re: I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

I dont have any parcels folder as shown in the reply path

 

 

[root@quickstart cloudera]# pwd
/opt/cloudera
[root@quickstart cloudera]# ls
csd parcel-repo

Jug
New Contributor
Posts: 6
Registered: ‎02-06-2015

Re: I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

I tried the following but still i get the same error, am I doing something incorrectly

 

HIVE_CLASSPATH=$(find /usr/lib/hive/lib/ -name '*.jar' \
-not -name 'guava*' -print0 | sed 's/\x0/,/g')

echo $HIVE_CLASSPATH
/usr/lib/hive/lib/hive-shims-common.jar,/usr/lib/hive/lib/snappy-java-1.0.4.1.jar,/usr/lib/hive/lib/datanucleus-api-jdo-3.2.6.jar,/usr/lib/hive/lib/bonecp-0.8.0.RELEASE.jar,/usr/lib/hive/lib/log4j-1.2.16.jar,/usr/lib/hive/lib/hive-exec.jar,/usr/lib/hive/lib/hive-ant.jar,/usr/lib/hive/lib/htrace-core.jar,/usr/lib/hive/lib/jta-1.1.jar,/usr/lib/hive/lib/hive-testutils-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/hive-jdbc-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/oro-2.0.8.jar,/usr/lib/hive/lib/hive-shims-scheduler-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/hive-beeline-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/asm-commons-3.1.jar,/usr/lib/hive/lib/avro.jar,/usr/lib/hive/lib/jetty-util-6.1.26.jar,/usr/lib/hive/lib/hive-hwi.jar,/usr/lib/hive/lib/hbase-hadoop-compat.jar,/usr/lib/hive/lib/commons-compress-1.4.1.jar,/usr/lib/hive/lib/datanucleus-rdbms-3.2.9.jar,/usr/lib/hive/lib/jetty-all-7.6.0.v20120127.jar,/usr/lib/hive/lib/hive-service.jar,/usr/lib/hive/lib/hive-testutils.jar,/usr/lib/hive/lib/zookeeper.jar,/usr/lib/hive/lib/hive-jdbc.jar,/usr/lib/hive/lib/servlet-api-2.5.jar,/usr/lib/hive/lib/mysql-connector-java.jar,/usr/lib/hive/lib/jersey-server-1.14.jar,/usr/lib/hive/lib/httpclient-4.2.5.jar,/usr/lib/hive/lib/hive-cli.jar,/usr/lib/hive/lib/servlet-api-2.5-20081211.jar,/usr/lib/hive/lib/junit-4.10.jar,/usr/lib/hive/lib/hive-shims-common-secure-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/hive-hwi-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/geronimo-annotation_1.0_spec-1.1.1.jar,/usr/lib/hive/lib/hive-metastore.jar,/usr/lib/hive/lib/hbase-client.jar,/usr/lib/hive/lib/hive-serde.jar,/usr/lib/hive/lib/hbase-hadoop2-compat.jar,/usr/lib/hive/lib/tempus-fugit-1.1.jar,/usr/lib/hive/lib/hive-shims-common-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/stringtemplate-3.2.1.jar,/usr/lib/hive/lib/hive-ant-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/datanucleus-core-3.2.10.jar,/usr/lib/hive/lib/commons-httpclient-3.0.1.jar,/usr/lib/hive/lib/hive-cli-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/commons-codec-1.4.jar,/usr/lib/hive/lib/hive-shims-common-secure.jar,/usr/lib/hive/lib/hive-shims-0.23-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/hive-contrib.jar,/usr/lib/hive/lib/groovy-all-2.1.6.jar,/usr/lib/hive/lib/hive-common.jar,/usr/lib/hive/lib/ant-1.9.1.jar,/usr/lib/hive/lib/httpcore-4.2.5.jar,/usr/lib/hive/lib/hive-beeline.jar,/usr/lib/hive/lib/jline-0.9.94.jar,/usr/lib/hive/lib/jersey-servlet-1.14.jar,/usr/lib/hive/lib/asm-3.2.jar,/usr/lib/hive/lib/asm-tree-3.1.jar,/usr/lib/hive/lib/hive-service-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/activation-1.1.jar,/usr/lib/hive/lib/hive-shims-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/hamcrest-core-1.1.jar,/usr/lib/hive/lib/geronimo-jaspic_1.0_spec-1.0.jar,/usr/lib/hive/lib/paranamer-2.3.jar,/usr/lib/hive/lib/hive-serde-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/hive-metastore-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/hive-hbase-handler-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/hbase-protocol.jar,/usr/lib/hive/lib/commons-cli-1.2.jar,/usr/lib/hive/lib/jpam-1.1.jar,/usr/lib/hive/lib/derby-10.10.1.1.jar,/usr/lib/hive/lib/velocity-1.5.jar,/usr/lib/hive/lib/super-csv-2.2.0.jar,/usr/lib/hive/lib/hive-shims-scheduler.jar,/usr/lib/hive/lib/hive-exec-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/hive-shims.jar,/usr/lib/hive/lib/commons-logging-1.1.3.jar,/usr/lib/hive/lib/hive-shims-0.23.jar,/usr/lib/hive/lib/hbase-server.jar,/usr/lib/hive/lib/ant-launcher-1.9.1.jar,/usr/lib/hive/lib/commons-lang3-3.1.jar,/usr/lib/hive/lib/opencsv-2.3.jar,/usr/lib/hive/lib/geronimo-jta_1.1_spec-1.1.1.jar,/usr/lib/hive/lib/jsr305-1.3.9.jar,/usr/lib/hive/lib/libfb303-0.9.0.jar,/usr/lib/hive/lib/antlr-2.7.7.jar,/usr/lib/hive/lib/hive-common-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/xz-1.0.jar,/usr/lib/hive/lib/mail-1.4.1.jar,/usr/lib/hive/lib/jdo-api-3.0.1.jar,/usr/lib/hive/lib/stax-api-1.0.1.jar,/usr/lib/hive/lib/hive-contrib-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/ST4-4.0.4.jar,/usr/lib/hive/lib/hbase-common.jar,/usr/lib/hive/lib/antlr-runtime-3.4.jar,/usr/lib/hive/lib/commons-io-2.4.jar,/usr/lib/hive/lib/jetty-6.1.26.jar,/usr/lib/hive/lib/libthrift-0.9.0-cdh5-2.jar,/usr/lib/hive/lib/hive-hbase-handler.jar,/usr/lib/hive/lib/commons-lang-2.6.jar,


spark-submit xyz.py --jars $HIVE_CLASSPATH

 

 

i get the same error.

Cloudera Employee
Posts: 366
Registered: ‎07-29-2013

Re: I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

Try this instead, courtesy of Diana, who just said this should work:

 

export SPARK_CLASSPATH=$(find /usr/lib/hive/lib/ -name '*.jar' -print0 | sed 's/\x0/:/g')
spark-shell
Jug
New Contributor
Posts: 6
Registered: ‎02-06-2015

Re: I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

seems like error is gone but still i dont see the result of hive context execution

 

/usr/lib/spark/python/pyspark/sql.py:1691: DeprecationWarning: hql() is deprecated as the sql function now parses using HiveQL bydefault. The SQL dialect for parsing can be set using 'spark.sql.dialect'
DeprecationWarning)
/usr/lib/spark/python/pyspark/sql.py:1682: DeprecationWarning: hiveql() is deprecated as the sql function now parses using HiveQL bydefault. The SQL dialect for parsing can be set using 'spark.sql.dialect'
DeprecationWarning)
2015-02-06 11:51:36,971 INFO [Thread-2] parse.ParseDriver (ParseDriver.java:parse(185)) - Parsing command: CREATE TABLE IF NOT EXISTS src24 (key INT, value STRING)
2015-02-06 11:51:37,394 INFO [Thread-2] parse.ParseDriver (ParseDriver.java:parse(206)) - Parse Completed
2015-02-06 11:51:37,877 INFO [Thread-2] metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(502)) - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
2015-02-06 11:51:37,924 INFO [Thread-2] metastore.ObjectStore (ObjectStore.java:initialize(247)) - ObjectStore, initialize called
2015-02-06 11:51:38,425 INFO [Thread-2] DataNucleus.Persistence (Log4JLogger.java:info(77)) - Property datanucleus.cache.level2 unknown - will be ignored
2015-02-06 11:51:38,425 INFO [Thread-2] DataNucleus.Persistence (Log4JLogger.java:info(77)) - Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
2015-02-06 11:51:40,847 INFO [Thread-2] metastore.ObjectStore (ObjectStore.java:getPMF(318)) - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
2015-02-06 11:51:41,057 INFO [Thread-2] metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:<init>(110)) - MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5. Encountered: "@" (64), after : "".
2015-02-06 11:51:44,281 INFO [Thread-2] DataNucleus.Datastore (Log4JLogger.java:info(77)) - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
2015-02-06 11:51:44,282 INFO [Thread-2] DataNucleus.Datastore (Log4JLogger.java:info(77)) - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
2015-02-06 11:51:45,039 INFO [Thread-2] DataNucleus.Datastore (Log4JLogger.java:info(77)) - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
2015-02-06 11:51:45,040 INFO [Thread-2] DataNucleus.Datastore (Log4JLogger.java:info(77)) - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
2015-02-06 11:51:45,326 INFO [Thread-2] DataNucleus.Query (Log4JLogger.java:info(77)) - Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
2015-02-06 11:51:45,329 INFO [Thread-2] metastore.ObjectStore (ObjectStore.java:setConf(230)) - Initialized ObjectStore
2015-02-06 11:51:45,902 INFO [Thread-2] metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(560)) - Added admin role in metastore
2015-02-06 11:51:45,904 INFO [Thread-2] metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(569)) - Added public role in metastore
2015-02-06 11:51:46,174 INFO [Thread-2] metastore.HiveMetaStore (HiveMetaStore.java:addAdminUsers(597)) - No user is added in admin role, since config is empty
2015-02-06 11:51:46,670 INFO [Thread-2] session.SessionState (SessionState.java:start(381)) - No Tez session required at this point. hive.execution.engine=mr.
2015-02-06 11:51:48,819 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:48,820 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:48,820 INFO [Thread-2] ql.Driver (Driver.java:checkConcurrency(165)) - Concurrency mode is disabled, not creating a lock manager
2015-02-06 11:51:48,824 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:48,875 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:48,876 INFO [Thread-2] parse.ParseDriver (ParseDriver.java:parse(185)) - Parsing command: CREATE TABLE IF NOT EXISTS src24 (key INT, value STRING)
2015-02-06 11:51:48,876 INFO [Thread-2] parse.ParseDriver (ParseDriver.java:parse(206)) - Parse Completed
2015-02-06 11:51:48,877 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - </PERFLOG method=parse start=1423252308875 end=1423252308877 duration=2 from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:48,877 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:48,952 INFO [Thread-2] parse.SemanticAnalyzer (SemanticAnalyzer.java:analyzeInternal(9333)) - Starting Semantic Analysis
2015-02-06 11:51:48,967 INFO [Thread-2] parse.SemanticAnalyzer (SemanticAnalyzer.java:analyzeCreateTable(9991)) - Creating table src24 position=27
2015-02-06 11:51:49,013 INFO [Thread-2] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(632)) - 0: get_table : db=default tbl=src24
2015-02-06 11:51:49,014 INFO [Thread-2] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(314)) - ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=src24
2015-02-06 11:51:49,114 INFO [Thread-2] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(632)) - 0: get_database: default
2015-02-06 11:51:49,115 INFO [Thread-2] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(314)) - ugi=root ip=unknown-ip-addr cmd=get_database: default
2015-02-06 11:51:49,153 INFO [Thread-2] ql.Driver (Driver.java:compile(446)) - Semantic Analysis Completed
2015-02-06 11:51:49,200 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - </PERFLOG method=semanticAnalyze start=1423252308877 end=1423252309200 duration=323 from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:49,228 INFO [Thread-2] ql.Driver (Driver.java:getSchema(245)) - Returning Hive schema: Schema(fieldSchemas:null, properties:null)
2015-02-06 11:51:49,231 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - </PERFLOG method=compile start=1423252308824 end=1423252309231 duration=407 from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:49,234 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:49,234 INFO [Thread-2] ql.Driver (Driver.java:execute(1243)) - Starting command: CREATE TABLE IF NOT EXISTS src24 (key INT, value STRING)
2015-02-06 11:51:49,250 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - </PERFLOG method=TimeToSubmit start=1423252308820 end=1423252309250 duration=430 from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:49,251 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:49,251 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:49,644 INFO [Thread-2] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(632)) - 0: create_table: Table(tableName:src24, dbName:default, owner:root, createTime:1423252309, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:key, type:int, comment:null), FieldSchema(name:value, type:string, comment:null)], location:null, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
2015-02-06 11:51:49,650 INFO [Thread-2] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(314)) - ugi=root ip=unknown-ip-addr cmd=create_table: Table(tableName:src24, dbName:default, owner:root, createTime:1423252309, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:key, type:int, comment:null), FieldSchema(name:value, type:string, comment:null)], location:null, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
2015-02-06 11:51:49,679 INFO [Thread-2] common.FileUtils (FileUtils.java:mkdir(532)) - Creating directory if it doesn't exist: file:/user/hive/warehouse/src24
2015-02-06 11:51:50,644 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - </PERFLOG method=runTasks start=1423252309251 end=1423252310644 duration=1393 from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:50,645 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - </PERFLOG method=Driver.execute start=1423252309234 end=1423252310645 duration=1411 from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:50,658 INFO [Thread-2] ql.Driver (SessionState.java:printInfo(556)) - OK
2015-02-06 11:51:50,659 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:50,659 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - </PERFLOG method=releaseLocks start=1423252310659 end=1423252310659 duration=0 from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:50,661 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - </PERFLOG method=Driver.run start=1423252308819 end=1423252310661 duration=1842 from=org.apache.hadoop.hive.ql.Driver>
MapPartitionsRDD[5] at mapPartitions at SerDeUtil.scala:143

 

 

hiveCtx = HiveContext(sc)
hiveCtx.hql("CREATE TABLE IF NOT EXISTS src24 (key INT, value STRING)")

hive> show tables;
OK
src
src22

 

 

 

is there anything else i am missing.

Highlighted
Contributor
Posts: 56
Registered: ‎02-09-2015

Re: I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

i guess this might helps , as i am having the same exact problem , also i am gonna try this fix now , so hope it fix our problem 

 

http://stackoverflow.com/questions/13333519/error-starting-hive-java-lang-noclassdeffounderror-org-a...

Contributor
Posts: 56
Registered: ‎02-09-2015

Re: I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

i tried this possible solution but it failed ,, 

here's the command i run :

( i am having flume spool directory as source of streaming , and spark to process the data and save it in HDFS )

 

sudo spark-submit --class "WordCount" --master local[*] --jars /usr/local/WordCount/target/scala-2.10/spark-streaming-flume_2.11-1.2.0.jar,/usr/lib/avro/avro-ipc-1.7.6-cdh5.3.0.jar,/usr/lib/flume-ng/lib/flume-ng-sdk-1.5.0-cdh5.3.0.jar,/usr/lib/hive/lib/hive-common-0.13.1-cdh5.3.0.jar,/usr/local/WordCount/target/scala-2.10/spark-hive_2.10-1.2.0-cdh5.3.0.jar /usr/local/WordCount/target/scala-2.10/wordcount_2.10-1.0.jar 127.0.0.1 9999

 

 

and that's the error i get : 


Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
at WordCount$.main(WordCount.scala:68)
at WordCount.main(WordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 9 more

 

 

My scala code is : 

 

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.flume._
import org.apache.spark.util.IntParam
import org.apache.spark.sql._
import org.apache.spark.sql.SQLContext
import org.apache.hadoop.hive.conf.HiveConf
import org.apache.spark.sql.hive.HiveContext

 

 

object WordCount {
def main(args: Array[String]) {
if (args.length < 2) {
System.err.println(
"Usage: WordCount <host> <port>")
System.exit(1)
}


val Array(host, port) = args

val batchInterval = Milliseconds(2000)

// Create the context and set the batch size
val sparkConf = new SparkConf().setAppName("WordCount")
val sc = new SparkContext(sparkConf)
val ssc = new StreamingContext(sc, batchInterval)

// Create a flume stream
val stream = FlumeUtils.createStream(ssc, host, port.toInt)

// Print out the count of events received from this server in each batch
stream.count().map(cnt => "Received !!!:::::" + cnt + " flume events." ).print()
val body = stream.map(e => new String(e.event.getBody.array))
val counts = body.flatMap(line => line.toLowerCase.replaceAll("[^a-zA-Z0-9\\s]", "").split("\\s+"))
.map(word => (word, 1))
.reduceByKey(_ + _)

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)  // This line gives the above error
sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
ssc.start()
ssc.awaitTermination()
}
}

Cloudera Employee
Posts: 1
Registered: ‎02-27-2015

Re: I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

Here's a full incantation for spark-shell on CDH5.3.  Remember this is still unsupported, but you may find it helpful:

 

HADOOP_CONF_DIR=/etc/hive/conf spark-shell --master yarn-client --driver-class-path '/opt/cloudera/parcels/CDH/lib/hive/lib/*' --driver-java-options '-Dspark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hive/lib/*'

Explorer
Posts: 9
Registered: ‎02-24-2015

Re: I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

Hello,

 

I get the same error. I tried all the suggestions but it seems not to be working. Did someone find a solution?

 

Thanks in advance!

Announcements