Created 01-23-2016 07:11 PM
our hortonworks cluster look for an ETL tool to extract data from tara data to HDFS
Created 01-23-2016 07:32 PM
There is no one best tool.
For an Opensource offering use the Hortonworks Connector for Teradata. This is the Sqoop Implementation to take data from Teradata to HDP. Documentation for the connector is here.
Hortonworks documentation is here.
Teradata driver can be found here
Save the files from the download to the SQOOP library folder /usr/hdp/current/sqoop-client/lib
#Set Classpath
export HIVE_HOME=/usr/hdp/current/hiveserver2 export HADOOP_HOME=.usr/hdp/current/ export SQOOP_HOME=/usr/hdp/current/sqoop-client/lib export HADOOP_CLASSPATH=$(hcat -classpath) export LIB_JARS=$(echo ${HADOOP_CLASSPATH} | sed -e 's/::*/,/g’)
Some command examples # Hive Import:
sqoop —hive-import —hive-overwrite - -create-hive-table —hive-table <table-name> —null-string ‘\\N' —null-non-string ‘\\N' #Define a Table based on one in a database (Eg. MySQL): #Important because the Teradata Connector needs to have a Table exists first before importing data sqoop create-hive-table --connect jdbc:mysql://db.example.com/corp --table employees --hive-table emps
Store as ORC file sqoop import -libjars ${LIB_JARS} -Dteradata.db.input.target.table.schema="cust_id int, acct_type string, acct_nbr string, acct_start_date date, acct_end_date date" -Dteradata.db.input.file.format=orcfile --connect jdbc:teradata://<teradata host ip address>/Database=financial --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password dbc --table accts --hive-import --hive-table financial.acctssqoop import —connect jdbc:teradata://192.168.1.13/Database=retail --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password dbc --table accts --hive-import --hive-table financial.accts
Info on Teradata offerings beyond Sqoop can be found at this link
From HDFS to Teradata can be found here
https://community.hortonworks.com/articles/6161/hdfs-to-teradata-example.html
Other ETL tools available are Talend Big Data Edition, Pentaho, Oracle Data Integrator
Created 01-23-2016 07:28 PM
Created 01-23-2016 07:32 PM
There is no one best tool.
For an Opensource offering use the Hortonworks Connector for Teradata. This is the Sqoop Implementation to take data from Teradata to HDP. Documentation for the connector is here.
Hortonworks documentation is here.
Teradata driver can be found here
Save the files from the download to the SQOOP library folder /usr/hdp/current/sqoop-client/lib
#Set Classpath
export HIVE_HOME=/usr/hdp/current/hiveserver2 export HADOOP_HOME=.usr/hdp/current/ export SQOOP_HOME=/usr/hdp/current/sqoop-client/lib export HADOOP_CLASSPATH=$(hcat -classpath) export LIB_JARS=$(echo ${HADOOP_CLASSPATH} | sed -e 's/::*/,/g’)
Some command examples # Hive Import:
sqoop —hive-import —hive-overwrite - -create-hive-table —hive-table <table-name> —null-string ‘\\N' —null-non-string ‘\\N' #Define a Table based on one in a database (Eg. MySQL): #Important because the Teradata Connector needs to have a Table exists first before importing data sqoop create-hive-table --connect jdbc:mysql://db.example.com/corp --table employees --hive-table emps
Store as ORC file sqoop import -libjars ${LIB_JARS} -Dteradata.db.input.target.table.schema="cust_id int, acct_type string, acct_nbr string, acct_start_date date, acct_end_date date" -Dteradata.db.input.file.format=orcfile --connect jdbc:teradata://<teradata host ip address>/Database=financial --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password dbc --table accts --hive-import --hive-table financial.acctssqoop import —connect jdbc:teradata://192.168.1.13/Database=retail --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password dbc --table accts --hive-import --hive-table financial.accts
Info on Teradata offerings beyond Sqoop can be found at this link
From HDFS to Teradata can be found here
https://community.hortonworks.com/articles/6161/hdfs-to-teradata-example.html
Other ETL tools available are Talend Big Data Edition, Pentaho, Oracle Data Integrator
Created 01-23-2016 07:51 PM
@Ancil McBarnett Impressive response!!!
Created 01-23-2016 08:22 PM
Created 02-02-2016 03:19 PM
@Robin Dong has this been resolved? Can you post your solution or accept best answer?
Created 02-25-2016 11:08 AM
Hi,
I am getting the following ERROR:
16/02/25 04:06:39 ERROR tool.BaseSqoopTool: Unrecognized argument: -Dteradata.db.input.file.format
The command I am running:
sqoop import -libjars ${LIB_JARS} --connect jdbc:teradata://<host_address>/TMODE=ANSI,CHARSET=UTF8,database=datatypes --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password <password> --table test_datatype_numeric --split-by col_num --target-dir /sqoop/teratest/orc_new -Dteradata.db.input.file.format orcfile
Basically I am using Hortonworks Connector for Teradata. I want to store the result in ORC format.
Can some one help.
Created 02-25-2016 11:08 AM
Created 02-25-2016 12:27 PM
You're saying the format coming from teradata is orc what you need to do is import into Hive orc table. Look at sqoop user guide for Hive https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_importing_data_into_hive
And our documentation for sqoop