Reply
New Contributor
Posts: 3
Registered: ‎05-10-2017

Error running Hivecontext in pyspark

Using Cloudera Quickstart VM 5.10
Spark version 1.6.0
Copied hive-site.xml to spark directory

 

>>> from pyspark.sql import HiveContext
>>> sqlContext = HiveContext(sc)
>>> cnt = sqlContext.sql("select count(1) from customers")

 

When I am trying to get Hive DB data from PySpark context , I am getting the below error.

17/05/05 15:05:01 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.1.0
17/05/05 15:05:01 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
_17/05/05 15:05:03 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.

Highlighted
Champion
Posts: 462
Registered: ‎05-16-2016

Re: Error running Hivecontext in pyspark

[ Edited ]
  1. You can either turn short-cricuit feature - which will have preformance hit 
  2. by false 
<property>
    <name>dfs.client.read.shortcircuit</name>
    <value>true</value>
  </property>

or 

 

Enable native lib by following the link . 

 meanitime you can also check if it loaded or not by firing the below command 

 

hadoop checknative -a

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/NativeLibraries.html

 

but it is more of a WARN . You should be able to by pass it and stil would be able to get the results . 

Announcements