Created on 08-29-2016 06:14 AM
Overview
There are a couple of different options for importing data from Teradata into Hadoop:
This article will explore examples of each of the three above along with some of the pros/cons. The aim of this document is help understand when to use each option while keeping the following in mind:
All of the examples will use the following common environment variables which you would define at the top of a bash script. The variables are plugged in to the various commands below:
HADOOP_CLASSPATH=$(hcat -classpath) export HADOOP_CLASSPATH SQOOP_HOME=/usr/hdp/current/sqoop-client TDUSER=myuser #TDPASS=****** <-- use an env var instead? TDHOST=mytdhost TDDB=METRICS TDTABLE=ERD HIVEDB=thirty_day_tables HIVETABLE=SQOOP_ERD JDBCURL=jdbc:teradata://$TDHOST/DATABASE=$TDDB TDCHJAR=/usr/lib/tdch/1.4/lib/teradata-connector-1.4.4.jar TDCHJARS=$TDCHJAR,/usr/lib/tdch/1.4/lib/tdgssconfig.jar,/usr/lib/tdch/1.4/lib/terajdbc4.jar TDWHERE="STATUS IN ('A','B','C') and GROUP = 'USA'"
Sqoop and the Teradata JDBC driver
Import schema and data from Teradata to Hive using plain sqoop and JDBC.
Required components:
Example command:
sqoop import \ -libjars ${LIB_JARS},${TDCHJARS} \ --driver com.teradata.jdbc.TeraDriver \ --connect $JDBCURL \ --table $TDTABLE \ --hive-table ${HIVEDB}.${HIVETABLE} \ --where "${TDWHERE}" \ --username $TDUSER \ --password $TDPASS \ --map-column-java EFF_TSP=String \ --num-mappers 1 \ --map-column-hive EFF_TSP=STRING \ --hive-import
Pros
Cons
Hortonworks Connector for Teradata
Import data from Teradata to Hive using sqoop and the Hortonworks Connector for Teradata. The Hortonworks Connector for Teradata wraps TDCH.
Required components:
If you want the schema created, it is a separate command:
sqoop create-hive-table \ -libjars ${LIB_JARS} \ --connect $JDBCURL \ --connection-manager org.apache.sqoop.teradata.TeradataConnManager \ --username $TDUSER \ --password $TDPASS \ --table $TDTABLE \ --map-column-hive EFF_TSP=STRING \ --hive-table ${HIVEDB}.${HIVETABLE}
Example command (requires the schema to have been created in hive already):
sqoop import \ -libjars ${LIB_JARS} \ -Dtdch.input.teradata.conditions="${TDWHERE}" \ --connect $JDBCURL \ --connection-manager org.apache.sqoop.teradata.TeradataConnManager \ --username $TDUSER \ --password $TDPASS \ --table $TDTABLE \ --hive-import \ --hive-table ${HIVEDB}.${HIVETABLE} \ --num-mappers 1
Pros
Cons
Teradata Connector for Hadoop (TDCH)
Import data from Teradata to Hive using TDCH.
Required components:
Example command (requires the schema to have been created in hive already):
hadoop jar $TDCHJAR com.teradata.connector.common.tool.ConnectorImportTool \ -libjars $LIB_JARS \ -url ${JDBCURL} \ -username $TDUSER \ -password $TDPASS \ -jobtype hive \ -fileformat textfile \ -nummappers 1 \ -sourcetable ${TDTABLE} \ -sourceconditions "${TDWHERE}" \ -targetdatabase ${HIVEDB} \ -targettable ${HIVETABLE}
Pros
Cons
Created on 08-31-2016 08:37 AM
Hi Kit,
Very useful tips!! However I've encountered some problems when using the second way to pull my Teradata table into hive. The first step 'create hive table' successfully created a table in hive with correct schema while the second step failed and got an error message "Can not create a path from an empty string.". Do you have any idea on this? p.s. I've also used the same configuration to import the table into hdfs which succeeded and the only change I made is "--hive-import --hive-table hivetablename". Thank you!
Created on 08-31-2016 10:32 PM
@Pengqing Bao I haven't run into that error yet. Maybe something is up with the schema? If you import the table to HDFS are you able to load it into your Hive table and query it without issues?
Created on 09-01-2016 01:36 AM
@Kit Menke I've tried importing from hdfs to hive using 'load data inpath' or just create an external table, both work well and could be queried with no issues. Below is the code, do you see any problems? Thanks!
sqoop import \
-libjars ${lib_jars} \
-D mapreduce.job.queuename=default \
-verbose \
--connect jdbc:teradata://td/DATABASE=$1 \
--connection-manager org.apache.sqoop.teradata.TeradataConnManager \
--username user \
--password pw \
--table $2 \
--hive-import \
--hive-overwrite \
--hive-table $1.$2 \
Created on 03-03-2017 03:44 PM
Hi Kit,
I have a teradata table having near to 100 columns. As its not feasible for us create the schema manually. We wanted to create schema using create-hive-table flag. But it doesnt seems to be working
I followed your syntax:
sqoop create-hive-table \ -libjars ${LIB_JARS} \ --connect $JDBCURL \ --connection-manager org.apache.sqoop.teradata.TeradataConnManager \ --username $TDUSER \ --password $TDPASS \ --table $TDTABLE \ --map-column-hive EFF_TSP=STRING \ --hive-table ${HIVEDB}.${HIVETABLE}
I see the below error
"Error parsing arguments for create-hive-table"
17/03/03 10:35:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 17/03/03 10:35:48 ERROR tool.BaseSqoopTool: Error parsing arguments for create-hive-table: 17/03/03 10:35:48 ERROR tool.BaseSqoopTool: Unrecognized argument: -p 17/03/03 10:35:48 ERROR tool.BaseSqoopTool: Unrecognized argument: --table 17/03/03 10:35:48 ERROR tool.BaseSqoopTool: Unrecognized argument: pdcrdata.DBQLogTbl_Hst_1 17/03/03 10:35:48 ERROR tool.BaseSqoopTool: Unrecognized argument: --map-column-hive 17/03/03 10:35:48 ERROR tool.BaseSqoopTool: Unrecognized argument: EFF_TSP=STRING 17/03/03 10:35:48 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-table 17/03/03 10:35:48 ERROR tool.BaseSqoopTool: Unrecognized argument: pi_talend_test.teradata_import Try --help for usage instructions.
Thanks
Suresh
Created on 12-10-2017 03:49 AM
Thanks for posting this - your source file had no field delimiters. It was a text file with one field. Does this strike you as real world? You should post your source data with the definition of your target table. And the example should have a variety of data types. Your example doesn't provide any help the community who has a silly requirement that a HDFS file should load into a table that would have a number or date/timestamp table.