Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Unable to connect Spark 2.0 via jdbc to Teradata 15.0

Highlighted

Unable to connect Spark 2.0 via jdbc to Teradata 15.0

New Contributor

We are trying to read a teradata table from spark2.0 using jdbc using the following code :

import sys
spark_home = os.environ.get('SPARK_HOME', None)
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.10.1-src.zip'))
sys.path.insert(0, os.path.join(spark_home, 'python/lib/pyspark.zip'))
filename = os.path.join(spark_home, 'python/pyspark/shell.py')
print(os.environ.get('SPARK_HOME', None))
exec(compile(open(filename, "rb").read(), filename, 'exec'))
spark_release_file = spark_home + "/RELEASE"
if os.path.exists(spark_release_file) and "Spark 2" in open(spark_release_file).read():
    print("Spark is there.")
    argsstr= "--master yarn-client --deploy-mode cluster pyspark-shell --driver-class-path /path/to/teradata/terajdbc4.jar,/path/to/teradata/tdgssconfig.jar --driver-library-path /path/to/teradata/terajdbc4.jar,/path/to/teradata/tdgssconfig.jar --jars /path/to/teradata/terajdbc4.jar,/path/to/teradata/tdgssconfig.jar"
    pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", argsstr)
    if not "pyspark-shell" in pyspark_submit_args: 
        pyspark_submit_args += " pyspark-shell"
    print(pyspark_submit_args)
    os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args
    os.environ["SPARK_SUBMIT_ARGS"] = pyspark_submit_args
from pyspark.sql import SQLContext
from pyspark import SparkConf, SparkContext
url = 'jdbc:teradata://teradata.server.com'
user='username'
password=''
driver = 'com.teradata.jdbc.TeraDriver'
dbtable_read = 'mi_temp.bd_test_spark_read'
sqlContext = SQLContext(sc)
df = sqlContext.read.format("jdbc").options(url=url, user=user, password=password, driver=driver, dbtable=dbtable_read).load()

We get the follwoing error :

Py4JJavaError: An error occurred while calling o48.load. : java.lang.ClassNotFoundException: com.teradata.jdbc.TeraDriver at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:38) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:49) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:49) at scala.Option.foreach(Option.scala:257) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createConnectionFactory(JdbcUtils.scala:49) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:123) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:117) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:53) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:315) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:211) at java.lang.Thread.run(Thread.java:745)

However the if we run the same code, via command line it works.

Can you please give us some pointers?

2 REPLIES 2

Re: Unable to connect Spark 2.0 via jdbc to Teradata 15.0

New Contributor

Hi Anilkumar,

While submitting spark application , you need to pass your Teradata jdbc Driver jar file that with --jar option

Thanks

Vinod

Re: Unable to connect Spark 2.0 via jdbc to Teradata 15.0

Cloudera Employee

Hi,

 

Have you tried passing the Teradata jdbc Driver jar file that with --jar option and whether the issue got solved?

 

Please share the error messages if you are still facing the issue.

 

Thanks

AKR

Don't have an account?
Coming from Hortonworks? Activate your account here