Support Questions

Find answers, ask questions, and share your expertise

Spark hive warehouse connector not loading data when using exceuteQuery

New Contributor
  1. spark-shell --master yarn \ --jars /usr/hdp/3.0.1.0-183/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-183.jar \ --conf spark.security.credentials.hiveserver2.enabled=false

import com.hortonworks.hwc.HiveWarehouseSession

val hive =HiveWarehouseSession.session(spark).build()

hive.setDatabase("my_test_db")

hive.executeQuery("SELECT * FROM my_table where code is null and loaddt='mmddyyyy'").show()

this statement is not running at all , it will just hangs

I also tried this way

hive.execute("SELECT * FROM my_table where code is null and loaddt='mmddyyyy'").show()

this would return only 1000 rows even though table has more records .

2 REPLIES 2

New Contributor

As per my understanding and reading their documentation and example, if table is an external table

then only --conf "spark.sql.hive.hiveserver2.jdbc.url=jdbc:hive2://abc.comt:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" should be passed since LLAP is not required for external table. In that case hive.execute("select * from tableA") should return all rows. Its only returning 1000 only in my case also. This might be a bug in HW end. I am also waiting for a response.

New Contributor

Infuriatingly, the connector defaults to only returning 1000 rows. This doesn't seem to be documented (anywhere I've found). The relevant configuration is exec.results.max, which can be passed in by setting spark.datasource.hive.warehouse.exec.results.max in spark shell.

 

Add the following config to increases the maximum to 20000:

 

--conf "spark.datasource.hive.warehouse.exec.results.max=20000"