Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Expert Contributor

Overview

There are a couple of different options for importing data from Teradata into Hadoop:

  1. Sqoop and the Teradata JDBC driver (documentation)
  2. Hortonworks Connector for Teradata (documentation)
  3. Teradata Connector for Hadoop (TDCH) (download README here or get the PDF doc)

This article will explore examples of each of the three above along with some of the pros/cons. The aim of this document is help understand when to use each option while keeping the following in mind:

  • Goal is to create a schema which is as close to Teradata as possible
  • Hive tables will usually end up as ORC but may need intermediate tables stored as TEXTFILE. The examples below use textfile.
  • I have not tested all possible column types. If you have more information, please let me know.
  • Versions tested with: HDP 2.3.2, Sqoop 1.4.6, TDCH 1.4.1 or 1.4.4

All of the examples will use the following common environment variables which you would define at the top of a bash script. The variables are plugged in to the various commands below:

HADOOP_CLASSPATH=$(hcat -classpath) export HADOOP_CLASSPATH
SQOOP_HOME=/usr/hdp/current/sqoop-client
TDUSER=myuser
#TDPASS=****** <-- use an env var instead?
TDHOST=mytdhost
TDDB=METRICS
TDTABLE=ERD
HIVEDB=thirty_day_tables
HIVETABLE=SQOOP_ERD
JDBCURL=jdbc:teradata://$TDHOST/DATABASE=$TDDB
TDCHJAR=/usr/lib/tdch/1.4/lib/teradata-connector-1.4.4.jar
TDCHJARS=$TDCHJAR,/usr/lib/tdch/1.4/lib/tdgssconfig.jar,/usr/lib/tdch/1.4/lib/terajdbc4.jar
TDWHERE="STATUS IN ('A','B','C') and GROUP = 'USA'"

Sqoop and the Teradata JDBC driver

Import schema and data from Teradata to Hive using plain sqoop and JDBC.

Required components:

  • Sqoop (comes with HDP out of the box)
  • Teradata JDBC driver (download from Teradata)

Example command:

sqoop import \
-libjars ${LIB_JARS},${TDCHJARS} \
--driver com.teradata.jdbc.TeraDriver \
--connect $JDBCURL \
--table $TDTABLE \
--hive-table ${HIVEDB}.${HIVETABLE} \
--where "${TDWHERE}" \
--username $TDUSER \
--password $TDPASS \
--map-column-java EFF_TSP=String \
--num-mappers 1 \
--map-column-hive EFF_TSP=STRING \
--hive-import

Pros

  • This only requires the Teradata JDBC driver to be installed, so it is easy to get started with.
  • Schema can be created for you automatically (although with issues... see cons)

Cons

  • Plain JDBC doesn't have some of the smarts that are built in to TDCH.
  • Some fields are not supported and will have to be mapped. For example, TIMESTAMP and DATE fields in Teradata will get mapped to Hive STRING.

Hortonworks Connector for Teradata

Import data from Teradata to Hive using sqoop and the Hortonworks Connector for Teradata. The Hortonworks Connector for Teradata wraps TDCH.

Required components:

  • Sqoop (comes with HDP out of the box)
  • Hortonworks Connector for Teradata (download addon)
    • Comes with version 1.4.1 of TDCH and the Teradata JDBC drivers installed into SQOOP_HOME/lib

If you want the schema created, it is a separate command:

sqoop create-hive-table \
-libjars ${LIB_JARS} \
--connect $JDBCURL \
--connection-manager org.apache.sqoop.teradata.TeradataConnManager \
--username $TDUSER \
--password $TDPASS \
--table $TDTABLE \
--map-column-hive EFF_TSP=STRING \
--hive-table ${HIVEDB}.${HIVETABLE}

Example command (requires the schema to have been created in hive already):

sqoop import \
-libjars ${LIB_JARS} \
-Dtdch.input.teradata.conditions="${TDWHERE}" \
--connect $JDBCURL \
--connection-manager org.apache.sqoop.teradata.TeradataConnManager \
--username $TDUSER \
--password $TDPASS \
--table $TDTABLE \
--hive-import \
--hive-table ${HIVEDB}.${HIVETABLE} \
--num-mappers 1

Pros

  • Use sqoop but get some of the smarts that come with TDCH
  • Can create the hive schema using sqoop create-hive-table

Cons

  • Can't create the hive schema in one step but it is possible in two using create-hive-table and then import
    • Dates and timestamps are converted to STRING automatically or throw an error which requires using the map-column-hive
  • Some Sqoop commands are not supported in this mode like --where and --hive-overwrite.
    • If supported in TDCH then you need to specify the java property which gets messy sometimes.
  • Version of TDCH included is out of date (newest available is TDCH 1.4.4 and version 1.4.1 is included right now)

Teradata Connector for Hadoop (TDCH)

Import data from Teradata to Hive using TDCH.

Required components:

  • Teradata Connector for Hadoop (TDCH)

Example command (requires the schema to have been created in hive already):

hadoop jar $TDCHJAR com.teradata.connector.common.tool.ConnectorImportTool \
-libjars $LIB_JARS \
-url ${JDBCURL} \
-username $TDUSER \
-password $TDPASS \
-jobtype hive \
-fileformat textfile \
-nummappers 1 \
-sourcetable ${TDTABLE} \
-sourceconditions "${TDWHERE}" \
-targetdatabase ${HIVEDB} \
-targettable ${HIVETABLE}

Pros

  • TDCH supports a bunch of different methods for getting in/out of teradata

Cons

  • Can't use TDCH to create the Hive schema automatically
44,735 Views
Comments
avatar
New Contributor

Hi Kit,

Very useful tips!! However I've encountered some problems when using the second way to pull my Teradata table into hive. The first step 'create hive table' successfully created a table in hive with correct schema while the second step failed and got an error message "Can not create a path from an empty string.". Do you have any idea on this? p.s. I've also used the same configuration to import the table into hdfs which succeeded and the only change I made is "--hive-import --hive-table hivetablename". Thank you!

avatar
Expert Contributor

@Pengqing Bao I haven't run into that error yet. Maybe something is up with the schema? If you import the table to HDFS are you able to load it into your Hive table and query it without issues?

avatar
New Contributor

@Kit Menke I've tried importing from hdfs to hive using 'load data inpath' or just create an external table, both work well and could be queried with no issues. Below is the code, do you see any problems? Thanks!

sqoop import \

-libjars ${lib_jars} \

-D mapreduce.job.queuename=default \

-verbose \

--connect jdbc:teradata://td/DATABASE=$1 \

--connection-manager org.apache.sqoop.teradata.TeradataConnManager \

--username user \

--password pw \

--table $2 \

--hive-import \

--hive-overwrite \

--hive-table $1.$2 \

avatar
Contributor

Hi Kit,

I have a teradata table having near to 100 columns. As its not feasible for us create the schema manually. We wanted to create schema using create-hive-table flag. But it doesnt seems to be working

I followed your syntax:

sqoop create-hive-table \ -libjars ${LIB_JARS} \ --connect $JDBCURL \ --connection-manager org.apache.sqoop.teradata.TeradataConnManager \ --username $TDUSER \ --password $TDPASS \ --table $TDTABLE \ --map-column-hive EFF_TSP=STRING \ --hive-table ${HIVEDB}.${HIVETABLE}

I see the below error

"Error parsing arguments for create-hive-table"

17/03/03 10:35:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 17/03/03 10:35:48 ERROR tool.BaseSqoopTool: Error parsing arguments for create-hive-table: 17/03/03 10:35:48 ERROR tool.BaseSqoopTool: Unrecognized argument: -p 17/03/03 10:35:48 ERROR tool.BaseSqoopTool: Unrecognized argument: --table 17/03/03 10:35:48 ERROR tool.BaseSqoopTool: Unrecognized argument: pdcrdata.DBQLogTbl_Hst_1 17/03/03 10:35:48 ERROR tool.BaseSqoopTool: Unrecognized argument: --map-column-hive 17/03/03 10:35:48 ERROR tool.BaseSqoopTool: Unrecognized argument: EFF_TSP=STRING 17/03/03 10:35:48 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-table 17/03/03 10:35:48 ERROR tool.BaseSqoopTool: Unrecognized argument: pi_talend_test.teradata_import Try --help for usage instructions.

Thanks

Suresh

avatar
Explorer

Thanks for posting this - your source file had no field delimiters. It was a text file with one field. Does this strike you as real world? You should post your source data with the definition of your target table. And the example should have a variety of data types. Your example doesn't provide any help the community who has a silly requirement that a HDFS file should load into a table that would have a number or date/timestamp table.