Support Questions

Find answers, ask questions, and share your expertise

Hi, Is there any connector for teradata to spark.We have scenarios to get the data from teradata by using SparkSQl. I am using spark 1.6.0.Please let me know if anyone tired connecting teradata $ spark.Thanks!

avatar
Contributor
 
1 ACCEPTED SOLUTION

avatar
Super Collaborator

Make sure you add the jar to your class path and include it when you run the application.

sc.addJar("yourDriver.jar")

val jdbcDF = sqlContext.load("jdbc", Map(
  "url" -> "jdbc:teradata://<server_name>, TMODE=TERA, user=my_user, password=*****",
  "dbtable" -> "schema.table_name",
  "driver" -> "com.teradata.jdbc.TeraDriver"))

View solution in original post

6 REPLIES 6

avatar
Super Collaborator

Sounds like a JDBC connection is in order. There is an api for creating a dataframe from jdbc connection.

jdbc(url: String, table: String, predicates: Array[String], connectionProperties:Properties): DataFrame

The issue with JDBC is reading data from teradata will be much slower compared to HDFS. Is it possible to run a sqoop job to move data to hdfs prior to starting your spark application?

avatar
Contributor

Hi Joe,

We want to use spark sql instead sqoop.I tried teradata JDBC driver .Unable to download dependencies .

Thanks

<dependency>
    <groupId>com.teradata.jdbc</groupId>
    <artifactId>terajdbc4</artifactId>
    <version>15.10.00.22</version>
  </dependency>
<dependency>
    <groupId>com.teradata.jdbc</groupId>
    <artifactId>tdgssconfig</artifactId>
    <version>15.00.00.22</version>
  </dependency>

avatar
Super Collaborator

Make sure you add the jar to your class path and include it when you run the application.

sc.addJar("yourDriver.jar")

val jdbcDF = sqlContext.load("jdbc", Map(
  "url" -> "jdbc:teradata://<server_name>, TMODE=TERA, user=my_user, password=*****",
  "dbtable" -> "schema.table_name",
  "driver" -> "com.teradata.jdbc.TeraDriver"))

avatar
Contributor

Yes ..able to get the tables from teradata .its working fine .Thanks 🙂

avatar
New Contributor

Add jars in the spark-defaults.conf:

spark.driver.extraClassPath /opt/spark/jars/terajdbc4.jar:/opt/spark/jars/tdgssconfig.jar

spark.executor.extraClassPath /opt/spark/jars/terajdbc4.jar:/opt/spark/jars/tdgssconfig.jar

But get the invalid IP for the following commands:

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff} span.s1 {font-variant-ligatures: no-common-ligatures}

scala> val jdbcDF = sqlcontext.load("jdbc", Map("url" -> "jdbc:teradata://****my**:1025/john, TMODE=TERA, user=john, password=pass", "dbtable" -> "john.abc", "driver" -> "com.teradata.jdbc.TeraDriver"))

warning: there was one deprecation warning; re-run with -deprecation for details

2017-03-20.08:14:29.170 TERAJDBC4 ERROR [main] com.teradata.jdbc.jdk6.JDK6_SQL_Connection@1504b493 Connection to 9.26.74.151:1025 Mon Mar 20 08:14:29 EDT 2017 invalid IPv6 address at java.net.InetAddress.getAllByName(InetAddress.java:1169) at java.net.InetAddress.getAllByName(InetAddress.java:1126) at com.teradata.jdbc.jdbc_4.io.TDNetworkIOIF$Lookup.doLookup(TDNetworkIOIF.java:222) at com.teradata.jdbc.jdbc_4.io.TDNetworkIOIF$Lookup.isLiteralIpAddress(TDNetworkIOIF.java:248) at com.teradata.jdbc.jdbc_4.io.TDNetworkIOIF.connectToHost(TDNetworkIOIF.java:335) at com.teradata.jdbc.jdbc_4.io.TDNetworkIOIF.createSocketConnection(TDNetworkIOIF.java:155) at com.teradata.jdbc.jdbc_4.io.TDNetworkIOIF.<init>(TDNetworkIOIF.java:141) at com.terada

,

avatar
Cloudera Employee

Hi,

 

We wont provide any connectors for Teradata to spark. but if you want to get data from Teradata into Spark, you can probably use any JDBC driver that Teradata provides.

 

Thanks

AKR