Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

New Contributor
Exception in thread "Thread-2194" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at py4j.reflection.TypeUtil.getClass(TypeUtil.java:265)
at py4j.reflection.TypeUtil.forName(TypeUtil.java:245)
at py4j.commands.ReflectionCommand.getUnknownMember(ReflectionCommand.java:153)
at py4j.commands.ReflectionCommand.execute(ReflectionCommand.java:82)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 
Having read articles online it says that i need to add hive jar path in spark conf files
spark.executor.extraClassPath
spark.executor.extraLibraryPath
 
what is the hive jar path and what should be values of above configuration setings.
11 REPLIES 11
Highlighted

Re: I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

Master Collaborator

Coincidentally, at virtually the same moment in another conversation, Juliet here solved a nearly identical problem. I'm posting her reply:

 

 

I ran in to a similar problem last week when running a spark program on the data science cluster. In order to use the hive context in spark, I had to add hive jars to the class path of my driver program. For the spark-shell, I think this amounts to using the `--jars` flag when you launch the shell. The following command generates a comma separated list of hive jars (excluding guava, which I was warned was a conflicting version that would cause trouble), for me on my cluster. The path in the VM may vary, but this is a way to parse it out:
 
#need to explicitly set hive jars on drivers, seemingly also executors. Grab everything in hive lib *except guava*
HIVE_CLASSPATH=$(find /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hive/lib/ -name '*.jar' \
-not -name 'guava*' -print0 | sed 's/\x0/,/g')
 
Then your launch command would be ` spark-shell --master yarn-client --jars $HIVE_CLASSPATH`

Re: I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

New Contributor

I dont have any parcels folder as shown in the reply path

 

 

[root@quickstart cloudera]# pwd
/opt/cloudera
[root@quickstart cloudera]# ls
csd parcel-repo

Re: I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

New Contributor

I tried the following but still i get the same error, am I doing something incorrectly

 

HIVE_CLASSPATH=$(find /usr/lib/hive/lib/ -name '*.jar' \
-not -name 'guava*' -print0 | sed 's/\x0/,/g')

echo $HIVE_CLASSPATH
/usr/lib/hive/lib/hive-shims-common.jar,/usr/lib/hive/lib/snappy-java-1.0.4.1.jar,/usr/lib/hive/lib/datanucleus-api-jdo-3.2.6.jar,/usr/lib/hive/lib/bonecp-0.8.0.RELEASE.jar,/usr/lib/hive/lib/log4j-1.2.16.jar,/usr/lib/hive/lib/hive-exec.jar,/usr/lib/hive/lib/hive-ant.jar,/usr/lib/hive/lib/htrace-core.jar,/usr/lib/hive/lib/jta-1.1.jar,/usr/lib/hive/lib/hive-testutils-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/hive-jdbc-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/oro-2.0.8.jar,/usr/lib/hive/lib/hive-shims-scheduler-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/hive-beeline-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/asm-commons-3.1.jar,/usr/lib/hive/lib/avro.jar,/usr/lib/hive/lib/jetty-util-6.1.26.jar,/usr/lib/hive/lib/hive-hwi.jar,/usr/lib/hive/lib/hbase-hadoop-compat.jar,/usr/lib/hive/lib/commons-compress-1.4.1.jar,/usr/lib/hive/lib/datanucleus-rdbms-3.2.9.jar,/usr/lib/hive/lib/jetty-all-7.6.0.v20120127.jar,/usr/lib/hive/lib/hive-service.jar,/usr/lib/hive/lib/hive-testutils.jar,/usr/lib/hive/lib/zookeeper.jar,/usr/lib/hive/lib/hive-jdbc.jar,/usr/lib/hive/lib/servlet-api-2.5.jar,/usr/lib/hive/lib/mysql-connector-java.jar,/usr/lib/hive/lib/jersey-server-1.14.jar,/usr/lib/hive/lib/httpclient-4.2.5.jar,/usr/lib/hive/lib/hive-cli.jar,/usr/lib/hive/lib/servlet-api-2.5-20081211.jar,/usr/lib/hive/lib/junit-4.10.jar,/usr/lib/hive/lib/hive-shims-common-secure-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/hive-hwi-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/geronimo-annotation_1.0_spec-1.1.1.jar,/usr/lib/hive/lib/hive-metastore.jar,/usr/lib/hive/lib/hbase-client.jar,/usr/lib/hive/lib/hive-serde.jar,/usr/lib/hive/lib/hbase-hadoop2-compat.jar,/usr/lib/hive/lib/tempus-fugit-1.1.jar,/usr/lib/hive/lib/hive-shims-common-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/stringtemplate-3.2.1.jar,/usr/lib/hive/lib/hive-ant-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/datanucleus-core-3.2.10.jar,/usr/lib/hive/lib/commons-httpclient-3.0.1.jar,/usr/lib/hive/lib/hive-cli-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/commons-codec-1.4.jar,/usr/lib/hive/lib/hive-shims-common-secure.jar,/usr/lib/hive/lib/hive-shims-0.23-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/hive-contrib.jar,/usr/lib/hive/lib/groovy-all-2.1.6.jar,/usr/lib/hive/lib/hive-common.jar,/usr/lib/hive/lib/ant-1.9.1.jar,/usr/lib/hive/lib/httpcore-4.2.5.jar,/usr/lib/hive/lib/hive-beeline.jar,/usr/lib/hive/lib/jline-0.9.94.jar,/usr/lib/hive/lib/jersey-servlet-1.14.jar,/usr/lib/hive/lib/asm-3.2.jar,/usr/lib/hive/lib/asm-tree-3.1.jar,/usr/lib/hive/lib/hive-service-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/activation-1.1.jar,/usr/lib/hive/lib/hive-shims-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/hamcrest-core-1.1.jar,/usr/lib/hive/lib/geronimo-jaspic_1.0_spec-1.0.jar,/usr/lib/hive/lib/paranamer-2.3.jar,/usr/lib/hive/lib/hive-serde-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/hive-metastore-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/hive-hbase-handler-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/hbase-protocol.jar,/usr/lib/hive/lib/commons-cli-1.2.jar,/usr/lib/hive/lib/jpam-1.1.jar,/usr/lib/hive/lib/derby-10.10.1.1.jar,/usr/lib/hive/lib/velocity-1.5.jar,/usr/lib/hive/lib/super-csv-2.2.0.jar,/usr/lib/hive/lib/hive-shims-scheduler.jar,/usr/lib/hive/lib/hive-exec-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/hive-shims.jar,/usr/lib/hive/lib/commons-logging-1.1.3.jar,/usr/lib/hive/lib/hive-shims-0.23.jar,/usr/lib/hive/lib/hbase-server.jar,/usr/lib/hive/lib/ant-launcher-1.9.1.jar,/usr/lib/hive/lib/commons-lang3-3.1.jar,/usr/lib/hive/lib/opencsv-2.3.jar,/usr/lib/hive/lib/geronimo-jta_1.1_spec-1.1.1.jar,/usr/lib/hive/lib/jsr305-1.3.9.jar,/usr/lib/hive/lib/libfb303-0.9.0.jar,/usr/lib/hive/lib/antlr-2.7.7.jar,/usr/lib/hive/lib/hive-common-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/xz-1.0.jar,/usr/lib/hive/lib/mail-1.4.1.jar,/usr/lib/hive/lib/jdo-api-3.0.1.jar,/usr/lib/hive/lib/stax-api-1.0.1.jar,/usr/lib/hive/lib/hive-contrib-0.13.1-cdh5.3.0.jar,/usr/lib/hive/lib/ST4-4.0.4.jar,/usr/lib/hive/lib/hbase-common.jar,/usr/lib/hive/lib/antlr-runtime-3.4.jar,/usr/lib/hive/lib/commons-io-2.4.jar,/usr/lib/hive/lib/jetty-6.1.26.jar,/usr/lib/hive/lib/libthrift-0.9.0-cdh5-2.jar,/usr/lib/hive/lib/hive-hbase-handler.jar,/usr/lib/hive/lib/commons-lang-2.6.jar,


spark-submit xyz.py --jars $HIVE_CLASSPATH

 

 

i get the same error.

Re: I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

Master Collaborator

Try this instead, courtesy of Diana, who just said this should work:

 

export SPARK_CLASSPATH=$(find /usr/lib/hive/lib/ -name '*.jar' -print0 | sed 's/\x0/:/g')
spark-shell

Re: I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

New Contributor

seems like error is gone but still i dont see the result of hive context execution

 

/usr/lib/spark/python/pyspark/sql.py:1691: DeprecationWarning: hql() is deprecated as the sql function now parses using HiveQL bydefault. The SQL dialect for parsing can be set using 'spark.sql.dialect'
DeprecationWarning)
/usr/lib/spark/python/pyspark/sql.py:1682: DeprecationWarning: hiveql() is deprecated as the sql function now parses using HiveQL bydefault. The SQL dialect for parsing can be set using 'spark.sql.dialect'
DeprecationWarning)
2015-02-06 11:51:36,971 INFO [Thread-2] parse.ParseDriver (ParseDriver.java:parse(185)) - Parsing command: CREATE TABLE IF NOT EXISTS src24 (key INT, value STRING)
2015-02-06 11:51:37,394 INFO [Thread-2] parse.ParseDriver (ParseDriver.java:parse(206)) - Parse Completed
2015-02-06 11:51:37,877 INFO [Thread-2] metastore.HiveMetaStore (HiveMetaStore.java:newRawStore(502)) - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
2015-02-06 11:51:37,924 INFO [Thread-2] metastore.ObjectStore (ObjectStore.java:initialize(247)) - ObjectStore, initialize called
2015-02-06 11:51:38,425 INFO [Thread-2] DataNucleus.Persistence (Log4JLogger.java:info(77)) - Property datanucleus.cache.level2 unknown - will be ignored
2015-02-06 11:51:38,425 INFO [Thread-2] DataNucleus.Persistence (Log4JLogger.java:info(77)) - Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
2015-02-06 11:51:40,847 INFO [Thread-2] metastore.ObjectStore (ObjectStore.java:getPMF(318)) - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
2015-02-06 11:51:41,057 INFO [Thread-2] metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:<init>(110)) - MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5. Encountered: "@" (64), after : "".
2015-02-06 11:51:44,281 INFO [Thread-2] DataNucleus.Datastore (Log4JLogger.java:info(77)) - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
2015-02-06 11:51:44,282 INFO [Thread-2] DataNucleus.Datastore (Log4JLogger.java:info(77)) - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
2015-02-06 11:51:45,039 INFO [Thread-2] DataNucleus.Datastore (Log4JLogger.java:info(77)) - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
2015-02-06 11:51:45,040 INFO [Thread-2] DataNucleus.Datastore (Log4JLogger.java:info(77)) - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
2015-02-06 11:51:45,326 INFO [Thread-2] DataNucleus.Query (Log4JLogger.java:info(77)) - Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
2015-02-06 11:51:45,329 INFO [Thread-2] metastore.ObjectStore (ObjectStore.java:setConf(230)) - Initialized ObjectStore
2015-02-06 11:51:45,902 INFO [Thread-2] metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(560)) - Added admin role in metastore
2015-02-06 11:51:45,904 INFO [Thread-2] metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(569)) - Added public role in metastore
2015-02-06 11:51:46,174 INFO [Thread-2] metastore.HiveMetaStore (HiveMetaStore.java:addAdminUsers(597)) - No user is added in admin role, since config is empty
2015-02-06 11:51:46,670 INFO [Thread-2] session.SessionState (SessionState.java:start(381)) - No Tez session required at this point. hive.execution.engine=mr.
2015-02-06 11:51:48,819 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:48,820 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:48,820 INFO [Thread-2] ql.Driver (Driver.java:checkConcurrency(165)) - Concurrency mode is disabled, not creating a lock manager
2015-02-06 11:51:48,824 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:48,875 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:48,876 INFO [Thread-2] parse.ParseDriver (ParseDriver.java:parse(185)) - Parsing command: CREATE TABLE IF NOT EXISTS src24 (key INT, value STRING)
2015-02-06 11:51:48,876 INFO [Thread-2] parse.ParseDriver (ParseDriver.java:parse(206)) - Parse Completed
2015-02-06 11:51:48,877 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - </PERFLOG method=parse start=1423252308875 end=1423252308877 duration=2 from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:48,877 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:48,952 INFO [Thread-2] parse.SemanticAnalyzer (SemanticAnalyzer.java:analyzeInternal(9333)) - Starting Semantic Analysis
2015-02-06 11:51:48,967 INFO [Thread-2] parse.SemanticAnalyzer (SemanticAnalyzer.java:analyzeCreateTable(9991)) - Creating table src24 position=27
2015-02-06 11:51:49,013 INFO [Thread-2] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(632)) - 0: get_table : db=default tbl=src24
2015-02-06 11:51:49,014 INFO [Thread-2] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(314)) - ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=src24
2015-02-06 11:51:49,114 INFO [Thread-2] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(632)) - 0: get_database: default
2015-02-06 11:51:49,115 INFO [Thread-2] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(314)) - ugi=root ip=unknown-ip-addr cmd=get_database: default
2015-02-06 11:51:49,153 INFO [Thread-2] ql.Driver (Driver.java:compile(446)) - Semantic Analysis Completed
2015-02-06 11:51:49,200 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - </PERFLOG method=semanticAnalyze start=1423252308877 end=1423252309200 duration=323 from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:49,228 INFO [Thread-2] ql.Driver (Driver.java:getSchema(245)) - Returning Hive schema: Schema(fieldSchemas:null, properties:null)
2015-02-06 11:51:49,231 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - </PERFLOG method=compile start=1423252308824 end=1423252309231 duration=407 from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:49,234 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:49,234 INFO [Thread-2] ql.Driver (Driver.java:execute(1243)) - Starting command: CREATE TABLE IF NOT EXISTS src24 (key INT, value STRING)
2015-02-06 11:51:49,250 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - </PERFLOG method=TimeToSubmit start=1423252308820 end=1423252309250 duration=430 from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:49,251 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:49,251 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:49,644 INFO [Thread-2] metastore.HiveMetaStore (HiveMetaStore.java:logInfo(632)) - 0: create_table: Table(tableName:src24, dbName:default, owner:root, createTime:1423252309, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:key, type:int, comment:null), FieldSchema(name:value, type:string, comment:null)], location:null, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
2015-02-06 11:51:49,650 INFO [Thread-2] HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(314)) - ugi=root ip=unknown-ip-addr cmd=create_table: Table(tableName:src24, dbName:default, owner:root, createTime:1423252309, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:key, type:int, comment:null), FieldSchema(name:value, type:string, comment:null)], location:null, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
2015-02-06 11:51:49,679 INFO [Thread-2] common.FileUtils (FileUtils.java:mkdir(532)) - Creating directory if it doesn't exist: file:/user/hive/warehouse/src24
2015-02-06 11:51:50,644 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - </PERFLOG method=runTasks start=1423252309251 end=1423252310644 duration=1393 from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:50,645 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - </PERFLOG method=Driver.execute start=1423252309234 end=1423252310645 duration=1411 from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:50,658 INFO [Thread-2] ql.Driver (SessionState.java:printInfo(556)) - OK
2015-02-06 11:51:50,659 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:50,659 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - </PERFLOG method=releaseLocks start=1423252310659 end=1423252310659 duration=0 from=org.apache.hadoop.hive.ql.Driver>
2015-02-06 11:51:50,661 INFO [Thread-2] log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - </PERFLOG method=Driver.run start=1423252308819 end=1423252310661 duration=1842 from=org.apache.hadoop.hive.ql.Driver>
MapPartitionsRDD[5] at mapPartitions at SerDeUtil.scala:143

 

 

hiveCtx = HiveContext(sc)
hiveCtx.hql("CREATE TABLE IF NOT EXISTS src24 (key INT, value STRING)")

hive> show tables;
OK
src
src22

 

 

 

is there anything else i am missing.

Re: I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

Cloudera Employee

Here's a full incantation for spark-shell on CDH5.3.  Remember this is still unsupported, but you may find it helpful:

 

HADOOP_CONF_DIR=/etc/hive/conf spark-shell --master yarn-client --driver-class-path '/opt/cloudera/parcels/CDH/lib/hive/lib/*' --driver-java-options '-Dspark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hive/lib/*'

Re: I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

Expert Contributor

i guess this might helps , as i am having the same exact problem , also i am gonna try this fix now , so hope it fix our problem 

 

http://stackoverflow.com/questions/13333519/error-starting-hive-java-lang-noclassdeffounderror-org-a...

Re: I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

Expert Contributor

i tried this possible solution but it failed ,, 

here's the command i run :

( i am having flume spool directory as source of streaming , and spark to process the data and save it in HDFS )

 

sudo spark-submit --class "WordCount" --master local[*] --jars /usr/local/WordCount/target/scala-2.10/spark-streaming-flume_2.11-1.2.0.jar,/usr/lib/avro/avro-ipc-1.7.6-cdh5.3.0.jar,/usr/lib/flume-ng/lib/flume-ng-sdk-1.5.0-cdh5.3.0.jar,/usr/lib/hive/lib/hive-common-0.13.1-cdh5.3.0.jar,/usr/local/WordCount/target/scala-2.10/spark-hive_2.10-1.2.0-cdh5.3.0.jar /usr/local/WordCount/target/scala-2.10/wordcount_2.10-1.0.jar 127.0.0.1 9999

 

 

and that's the error i get : 


Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
at WordCount$.main(WordCount.scala:68)
at WordCount.main(WordCount.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 9 more

 

 

My scala code is : 

 

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.flume._
import org.apache.spark.util.IntParam
import org.apache.spark.sql._
import org.apache.spark.sql.SQLContext
import org.apache.hadoop.hive.conf.HiveConf
import org.apache.spark.sql.hive.HiveContext

 

 

object WordCount {
def main(args: Array[String]) {
if (args.length < 2) {
System.err.println(
"Usage: WordCount <host> <port>")
System.exit(1)
}


val Array(host, port) = args

val batchInterval = Milliseconds(2000)

// Create the context and set the batch size
val sparkConf = new SparkConf().setAppName("WordCount")
val sc = new SparkContext(sparkConf)
val ssc = new StreamingContext(sc, batchInterval)

// Create a flume stream
val stream = FlumeUtils.createStream(ssc, host, port.toInt)

// Print out the count of events received from this server in each batch
stream.count().map(cnt => "Received !!!:::::" + cnt + " flume events." ).print()
val body = stream.map(e => new String(e.event.getBody.array))
val counts = body.flatMap(line => line.toLowerCase.replaceAll("[^a-zA-Z0-9\\s]", "").split("\\s+"))
.map(word => (word, 1))
.reduceByKey(_ + _)

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)  // This line gives the above error
sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
ssc.start()
ssc.awaitTermination()
}
}

Re: I am using a hive cotext in pyspark cdh5.3 virtual box and i get the error

Explorer

Hello,

 

I get the same error. I tried all the suggestions but it seems not to be working. Did someone find a solution?

 

Thanks in advance!

Don't have an account?
Coming from Hortonworks? Activate your account here