Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What tool is the best tool for extract data from Teradata database to hortonworks HDFS?

Solved Go to solution
Highlighted

What tool is the best tool for extract data from Teradata database to hortonworks HDFS?

Contributor

our hortonworks cluster look for an ETL tool to extract data from tara data to HDFS

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: What tool is the best tool for extract data from Teradata database to hortonworks HDFS?

There is no one best tool.

For an Opensource offering use the Hortonworks Connector for Teradata. This is the Sqoop Implementation to take data from Teradata to HDP. Documentation for the connector is here.

Hortonworks documentation is here.

Teradata driver can be found here

Save the files from the download to the SQOOP library folder /usr/hdp/current/sqoop-client/lib

#Set Classpath

export HIVE_HOME=/usr/hdp/current/hiveserver2 
export HADOOP_HOME=.usr/hdp/current/
export SQOOP_HOME=/usr/hdp/current/sqoop-client/lib
export HADOOP_CLASSPATH=$(hcat -classpath) 
export LIB_JARS=$(echo ${HADOOP_CLASSPATH} | sed -e 's/::*/,/g’)

Some command examples # Hive Import:

sqoop —hive-import —hive-overwrite - -create-hive-table —hive-table <table-name> —null-string ‘\\N' —null-non-string ‘\\N'

#Define a Table based on one in a database (Eg. MySQL):
#Important because the Teradata Connector needs to have a Table exists first before importing data
sqoop create-hive-table --connect jdbc:mysql://db.example.com/corp  --table employees --hive-table emps
 Store as ORC file

sqoop import -libjars ${LIB_JARS} 
-Dteradata.db.input.target.table.schema="cust_id int, acct_type string, 
acct_nbr string, acct_start_date date, acct_end_date date" 
-Dteradata.db.input.file.format=orcfile --connect 
jdbc:teradata://<teradata host ip address>/Database=financial 
--connection-manager org.apache.sqoop.teradata.TeradataConnManager 
--username dbc --password dbc --table accts --hive-import --hive-table 
financial.acctssqoop import —connect 
jdbc:teradata://192.168.1.13/Database=retail --connection-manager 
org.apache.sqoop.teradata.TeradataConnManager --username dbc --password 
dbc --table accts --hive-import --hive-table financial.accts

Info on Teradata offerings beyond Sqoop can be found at this link

https://community.hortonworks.com/questions/4418/access-modes-for-teradata-beyond-sqoop-ingestion.ht...

https://community.hortonworks.com/questions/8411/sqoop-job-too-slow-importing-data-from-teradata-to....

From HDFS to Teradata can be found here

https://community.hortonworks.com/articles/6161/hdfs-to-teradata-example.html

Other ETL tools available are Talend Big Data Edition, Pentaho, Oracle Data Integrator

View solution in original post

8 REPLIES 8
Highlighted

Re: What tool is the best tool for extract data from Teradata database to hortonworks HDFS?

Mentor

@Robin Dong

take a look at our add-on for teradata and hdp. Link. Read the docs on the Connector. In general you would use Sqoop to ingest into and out of an EDW or RDBMS.

Highlighted

Re: What tool is the best tool for extract data from Teradata database to hortonworks HDFS?

There is no one best tool.

For an Opensource offering use the Hortonworks Connector for Teradata. This is the Sqoop Implementation to take data from Teradata to HDP. Documentation for the connector is here.

Hortonworks documentation is here.

Teradata driver can be found here

Save the files from the download to the SQOOP library folder /usr/hdp/current/sqoop-client/lib

#Set Classpath

export HIVE_HOME=/usr/hdp/current/hiveserver2 
export HADOOP_HOME=.usr/hdp/current/
export SQOOP_HOME=/usr/hdp/current/sqoop-client/lib
export HADOOP_CLASSPATH=$(hcat -classpath) 
export LIB_JARS=$(echo ${HADOOP_CLASSPATH} | sed -e 's/::*/,/g’)

Some command examples # Hive Import:

sqoop —hive-import —hive-overwrite - -create-hive-table —hive-table <table-name> —null-string ‘\\N' —null-non-string ‘\\N'

#Define a Table based on one in a database (Eg. MySQL):
#Important because the Teradata Connector needs to have a Table exists first before importing data
sqoop create-hive-table --connect jdbc:mysql://db.example.com/corp  --table employees --hive-table emps
 Store as ORC file

sqoop import -libjars ${LIB_JARS} 
-Dteradata.db.input.target.table.schema="cust_id int, acct_type string, 
acct_nbr string, acct_start_date date, acct_end_date date" 
-Dteradata.db.input.file.format=orcfile --connect 
jdbc:teradata://<teradata host ip address>/Database=financial 
--connection-manager org.apache.sqoop.teradata.TeradataConnManager 
--username dbc --password dbc --table accts --hive-import --hive-table 
financial.acctssqoop import —connect 
jdbc:teradata://192.168.1.13/Database=retail --connection-manager 
org.apache.sqoop.teradata.TeradataConnManager --username dbc --password 
dbc --table accts --hive-import --hive-table financial.accts

Info on Teradata offerings beyond Sqoop can be found at this link

https://community.hortonworks.com/questions/4418/access-modes-for-teradata-beyond-sqoop-ingestion.ht...

https://community.hortonworks.com/questions/8411/sqoop-job-too-slow-importing-data-from-teradata-to....

From HDFS to Teradata can be found here

https://community.hortonworks.com/articles/6161/hdfs-to-teradata-example.html

Other ETL tools available are Talend Big Data Edition, Pentaho, Oracle Data Integrator

View solution in original post

Highlighted

Re: What tool is the best tool for extract data from Teradata database to hortonworks HDFS?

@Ancil McBarnett Impressive response!!!

Highlighted

Re: What tool is the best tool for extract data from Teradata database to hortonworks HDFS?

Contributor

@Ancil McBarnett

Great, appreciated your quick respond. we will try this tool to see what comes out.

Thanks again.

Robin

Highlighted

Re: What tool is the best tool for extract data from Teradata database to hortonworks HDFS?

Mentor

@Robin Dong has this been resolved? Can you post your solution or accept best answer?

Highlighted

Re: What tool is the best tool for extract data from Teradata database to hortonworks HDFS?

New Contributor

Hi,

I am getting the following ERROR:

16/02/25 04:06:39 ERROR tool.BaseSqoopTool: Unrecognized argument: -Dteradata.db.input.file.format

The command I am running:

sqoop import -libjars ${LIB_JARS} --connect jdbc:teradata://<host_address>/TMODE=ANSI,CHARSET=UTF8,database=datatypes --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password <password> --table test_datatype_numeric --split-by col_num --target-dir /sqoop/teratest/orc_new -Dteradata.db.input.file.format orcfile

Basically I am using Hortonworks Connector for Teradata. I want to store the result in ORC format.

Can some one help.

Highlighted

Re: What tool is the best tool for extract data from Teradata database to hortonworks HDFS?

New Contributor

@Ancil McBarnett :

Can you plese help me out.

Highlighted

Re: What tool is the best tool for extract data from Teradata database to hortonworks HDFS?

Mentor

You're saying the format coming from teradata is orc what you need to do is import into Hive orc table. Look at sqoop user guide for Hive https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_importing_data_into_hive

And our documentation for sqoop

Don't have an account?
Coming from Hortonworks? Activate your account here