Support Questions

cjervis · ‎12-09-2019

I am migrating the spark jobs running in HDP 2.6 to HDP 3.1. When executing the spark jobs in HDP3.1 I am getting the following error.

java.util.NoSuchElementException: spark.sql.hive.hiveserver2.jdbc.url
at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1571)
at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1571)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.internal.SQLConf.getConfString(SQLConf.scala:1571)
at org.apache.spark.sql.RuntimeConfig.get(RuntimeConfig.scala:74)
at com.hortonworks.spark.sql.hive.llap.HWConf.getConnectionUrlFromConf(HWConf.java:143)
at com.hortonworks.spark.sql.hive.llap.HWConf.getConnectionUrl(HWConf.java:107)
at com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.build(HiveWarehouseBuilder.java:97)
at com.wunderman.hdp.Hdp3MigrationMain.main(Hdp3MigrationMain.java:16)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

The way I m creating the spark session

private static SparkSession getSparkSession() {
		/**
		 * Create an instance of SparkSession to connect to the cluster
		 */

		SparkSession sparkSession = SparkSession.builder().appName("Hdp3 Migration").master("yarn").getOrCreate();
		return sparkSession;
	 }

But the hiveserver2 jdbc url is configured in the spark config. I have added the following dependency in the pom.xml

<dependency>
			<groupId>com.hortonworks.hive</groupId>
			<artifactId>hive-warehouse-connector_2.11</artifactId>
			<version>1.0.0.3.1.0.0-78</version>
		</dependency>

And I am trying to execute the below code

String hdp3Enabled = args[0];
		Dataset<Row> dataset;
		String query="SELECT * FROM  schema.tablename where col1='abc' ; //Sample query
		try {
			if ("Y".equalsIgnoreCase(hdp3Enabled)) {
				HiveWarehouseSession hive = HiveWarehouseSession.session(sparkSession).build();
				dataset = hive.executeQuery(query);
			} else {
				dataset = sparkSession.sql(query);
			}
			dataset.show();
		} catch(Exception e) {
			e.printStackTrace();
		}

Share your suggestions to fix the issue.

Shelton · ‎12-09-2019

@eswarloges

From HDP 3.x onwards, to work with hive databases you should use the HiveWarehouseConnector library /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.0.0-1634.jar as show in the below example

spark-shell --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://FQDN or IP:10000/" spark.datasource.hive.warehouse.load.staging.dir="/staging_dir" spark.hadoop.hive.zookeeper.quorum="zk_Quorum_ip's:2181" --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.0.0-1634.jar
val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build()
hive.showDatabases().show(100, false)

Could you try that and revert

eswarloges · ‎12-10-2019

@Shelton I have tried as you suggested still getting the same error.

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import com.hortonworks.hwc.HiveWarehouseSession;

import com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder;

public class Hdp3MigrationMain extends CommonUtilities{

	public static void main(String[] args) {
		String hdp3Enabled = args[0];
		Dataset<Row> dataset;
		String query="select * from hive_schema.table1"; 
		try {
			if ("Y".equalsIgnoreCase(hdp3Enabled)) {
				HiveWarehouseSession hive = HiveWarehouseBuilder.session(sparkSession).build();
				dataset = hive.executeQuery(query);
			} else {
				dataset = sparkSession.sql(query);
			}
			dataset.show();
		} catch(Exception e) {
			e.printStackTrace();
		}

	}

And the same error occurs.

java.util.NoSuchElementException: spark.sql.hive.hiveserver2.jdbc.url
        at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1571)
        at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1571)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.sql.internal.SQLConf.getConfString(SQLConf.scala:1571)
        at org.apache.spark.sql.RuntimeConfig.get(RuntimeConfig.scala:74)
        at com.hortonworks.spark.sql.hive.llap.HWConf.getConnectionUrlFromConf(HWConf.java:143)
        at com.hortonworks.spark.sql.hive.llap.HWConf.getConnectionUrl(HWConf.java:107)
        at com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.build(HiveWarehouseBuilder.java:97)
        at com.wunderman.hdp.Hdp3MigrationMain.main(Hdp3MigrationMain.java:18)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

jsensharma · ‎12-10-2019

@eswarloges

In one of your previous update you have mentioned that you have added "the hiveserver2 jdbc url is configured in the spark config."

However, looks like the error that you are getting is because the mentioned properties are not found in the spark2-defaults config which is there in your classpath.

So can you please make sure that you have included the correct CLASSPATH which is pointing to correct spark-defaults which has the following properties added as mentioned in "Required properties" Section of the following Doc: https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_configure_a_spar...

.

You must add several Spark properties through spark2-defaults in Ambari to use the Hive Warehouse Connector for accessing data in Hive. Alternatively, configuration can be provided for each job using --conf.

spark.sql.hive.hiveserver2.jdbc.url
The URL for HiveServer2 Interactive
spark.datasource.hive.warehouse.metastoreUri
The URI for the metastore
spark.datasource.hive.warehouse.load.staging.dir
The HDFS temp directory for batch writes to Hive, /tmp for example
spark.hadoop.hive.llap.daemon.service.hosts
The application name for LLAP service
spark.hadoop.hive.zookeeper.quorum
The ZooKeeper hosts used by LLAP

Set the values of these properties as follows:

spark.sql.hive.hiveserver2.jdbc.url
In Ambari, copy the value from Services > Hive > Summary > HIVESERVER2 INTERACTIVE JDBC URL.
spark.datasource.hive.warehouse.metastoreUri
Copy the value from hive.metastore.uris. In Hive, at the hive> prompt, enter set hive.metastore.uris and copy the output. For example, thrift://mycluster-1.com:9083.
spark.hadoop.hive.llap.daemon.service.hosts
Copy value from Advanced hive-interactive-site > hive.llap.daemon.service.hosts.
spark.hadoop.hive.zookeeper.quorum
Copy the value from Advanced hive-sitehive.zookeeper.quorum.

.

eswarloges · ‎12-10-2019

@jsensharma All the configs are set properly in our cluster. I am trying to access the external hive tables using hive ware house session.

Is that the error because of that as the documentation says its not needed to use HiveWarehouseSession for the external tables.

Cloudera Community

Support Questions

HDP 3.1 Hive Connectivity Issue