Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Error running Hivecontext in pyspark

Highlighted

Error running Hivecontext in pyspark

New Contributor

Using Cloudera Quickstart VM 5.10
Spark version 1.6.0
Copied hive-site.xml to spark directory

 

>>> from pyspark.sql import HiveContext
>>> sqlContext = HiveContext(sc)
>>> cnt = sqlContext.sql("select count(1) from customers")

 

When I am trying to get Hive DB data from PySpark context , I am getting the below error.

17/05/05 15:05:01 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.1.0
17/05/05 15:05:01 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
_17/05/05 15:05:03 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.

1 REPLY 1

Re: Error running Hivecontext in pyspark

Champion
  1. You can either turn short-cricuit feature - which will have preformance hit 
  2. by false 
<property>
    <name>dfs.client.read.shortcircuit</name>
    <value>true</value>
  </property>

or 

 

Enable native lib by following the link . 

 meanitime you can also check if it loaded or not by firing the below command 

 

hadoop checknative -a

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/NativeLibraries.html

 

but it is more of a WARN . You should be able to by pass it and stil would be able to get the results .