Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What tool is the best tool for extract data from Teradata database to hortonworks HDFS?

avatar
Expert Contributor

our hortonworks cluster look for an ETL tool to extract data from tara data to HDFS

1 ACCEPTED SOLUTION

avatar

There is no one best tool.

For an Opensource offering use the Hortonworks Connector for Teradata. This is the Sqoop Implementation to take data from Teradata to HDP. Documentation for the connector is here.

Hortonworks documentation is here.

Teradata driver can be found here

Save the files from the download to the SQOOP library folder /usr/hdp/current/sqoop-client/lib

#Set Classpath

export HIVE_HOME=/usr/hdp/current/hiveserver2 
export HADOOP_HOME=.usr/hdp/current/
export SQOOP_HOME=/usr/hdp/current/sqoop-client/lib
export HADOOP_CLASSPATH=$(hcat -classpath) 
export LIB_JARS=$(echo ${HADOOP_CLASSPATH} | sed -e 's/::*/,/g’)

Some command examples # Hive Import:

sqoop —hive-import —hive-overwrite - -create-hive-table —hive-table <table-name> —null-string ‘\\N' —null-non-string ‘\\N'

#Define a Table based on one in a database (Eg. MySQL):
#Important because the Teradata Connector needs to have a Table exists first before importing data
sqoop create-hive-table --connect jdbc:mysql://db.example.com/corp  --table employees --hive-table emps
 Store as ORC file

sqoop import -libjars ${LIB_JARS} 
-Dteradata.db.input.target.table.schema="cust_id int, acct_type string, 
acct_nbr string, acct_start_date date, acct_end_date date" 
-Dteradata.db.input.file.format=orcfile --connect 
jdbc:teradata://<teradata host ip address>/Database=financial 
--connection-manager org.apache.sqoop.teradata.TeradataConnManager 
--username dbc --password dbc --table accts --hive-import --hive-table 
financial.acctssqoop import —connect 
jdbc:teradata://192.168.1.13/Database=retail --connection-manager 
org.apache.sqoop.teradata.TeradataConnManager --username dbc --password 
dbc --table accts --hive-import --hive-table financial.accts

Info on Teradata offerings beyond Sqoop can be found at this link

https://community.hortonworks.com/questions/4418/access-modes-for-teradata-beyond-sqoop-ingestion.ht...

https://community.hortonworks.com/questions/8411/sqoop-job-too-slow-importing-data-from-teradata-to....

From HDFS to Teradata can be found here

https://community.hortonworks.com/articles/6161/hdfs-to-teradata-example.html

Other ETL tools available are Talend Big Data Edition, Pentaho, Oracle Data Integrator

View solution in original post

8 REPLIES 8

avatar
Master Mentor

@Robin Dong

take a look at our add-on for teradata and hdp. Link. Read the docs on the Connector. In general you would use Sqoop to ingest into and out of an EDW or RDBMS.

avatar

There is no one best tool.

For an Opensource offering use the Hortonworks Connector for Teradata. This is the Sqoop Implementation to take data from Teradata to HDP. Documentation for the connector is here.

Hortonworks documentation is here.

Teradata driver can be found here

Save the files from the download to the SQOOP library folder /usr/hdp/current/sqoop-client/lib

#Set Classpath

export HIVE_HOME=/usr/hdp/current/hiveserver2 
export HADOOP_HOME=.usr/hdp/current/
export SQOOP_HOME=/usr/hdp/current/sqoop-client/lib
export HADOOP_CLASSPATH=$(hcat -classpath) 
export LIB_JARS=$(echo ${HADOOP_CLASSPATH} | sed -e 's/::*/,/g’)

Some command examples # Hive Import:

sqoop —hive-import —hive-overwrite - -create-hive-table —hive-table <table-name> —null-string ‘\\N' —null-non-string ‘\\N'

#Define a Table based on one in a database (Eg. MySQL):
#Important because the Teradata Connector needs to have a Table exists first before importing data
sqoop create-hive-table --connect jdbc:mysql://db.example.com/corp  --table employees --hive-table emps
 Store as ORC file

sqoop import -libjars ${LIB_JARS} 
-Dteradata.db.input.target.table.schema="cust_id int, acct_type string, 
acct_nbr string, acct_start_date date, acct_end_date date" 
-Dteradata.db.input.file.format=orcfile --connect 
jdbc:teradata://<teradata host ip address>/Database=financial 
--connection-manager org.apache.sqoop.teradata.TeradataConnManager 
--username dbc --password dbc --table accts --hive-import --hive-table 
financial.acctssqoop import —connect 
jdbc:teradata://192.168.1.13/Database=retail --connection-manager 
org.apache.sqoop.teradata.TeradataConnManager --username dbc --password 
dbc --table accts --hive-import --hive-table financial.accts

Info on Teradata offerings beyond Sqoop can be found at this link

https://community.hortonworks.com/questions/4418/access-modes-for-teradata-beyond-sqoop-ingestion.ht...

https://community.hortonworks.com/questions/8411/sqoop-job-too-slow-importing-data-from-teradata-to....

From HDFS to Teradata can be found here

https://community.hortonworks.com/articles/6161/hdfs-to-teradata-example.html

Other ETL tools available are Talend Big Data Edition, Pentaho, Oracle Data Integrator

avatar
Master Mentor

@Ancil McBarnett Impressive response!!!

avatar
Expert Contributor

@Ancil McBarnett

Great, appreciated your quick respond. we will try this tool to see what comes out.

Thanks again.

Robin

avatar
Master Mentor

@Robin Dong has this been resolved? Can you post your solution or accept best answer?

avatar
New Contributor

Hi,

I am getting the following ERROR:

16/02/25 04:06:39 ERROR tool.BaseSqoopTool: Unrecognized argument: -Dteradata.db.input.file.format

The command I am running:

sqoop import -libjars ${LIB_JARS} --connect jdbc:teradata://<host_address>/TMODE=ANSI,CHARSET=UTF8,database=datatypes --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password <password> --table test_datatype_numeric --split-by col_num --target-dir /sqoop/teratest/orc_new -Dteradata.db.input.file.format orcfile

Basically I am using Hortonworks Connector for Teradata. I want to store the result in ORC format.

Can some one help.

avatar
New Contributor

@Ancil McBarnett :

Can you plese help me out.

avatar
Master Mentor

You're saying the format coming from teradata is orc what you need to do is import into Hive orc table. Look at sqoop user guide for Hive https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_importing_data_into_hive

And our documentation for sqoop