Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Error running Hivecontext in pyspark

Error running Hivecontext in pyspark

New Contributor

Using Cloudera Quickstart VM 5.10
Spark version 1.6.0
Copied hive-site.xml to spark directory


>>> from pyspark.sql import HiveContext
>>> sqlContext = HiveContext(sc)
>>> cnt = sqlContext.sql("select count(1) from customers")


When I am trying to get Hive DB data from PySpark context , I am getting the below error.

17/05/05 15:05:01 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.1.0
17/05/05 15:05:01 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
_17/05/05 15:05:03 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.


Re: Error running Hivecontext in pyspark

  1. You can either turn short-cricuit feature - which will have preformance hit 
  2. by false 



Enable native lib by following the link . 

 meanitime you can also check if it loaded or not by firing the below command 


hadoop checknative -a


but it is more of a WARN . You should be able to by pass it and stil would be able to get the results .