Support Questions

Find answers, ask questions, and share your expertise

we are going to extract data from teradata to Hortonworks for an initial load. I was wondering what would be the fast and efficient way to do this load?

avatar
Expert Contributor

I need hundreds tables from teradata, so I need to create all these in hive tables first and all data can be directly import to hive tables, shall we? if troubleshooting I need aware?

1 ACCEPTED SOLUTION

avatar

@Robin Dong

There are many ways.

Use the Teradata Connector: Download connector http://hortonworks.com/hdp/addons/

Documentation

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_HortonworksConnectorForTeradata/content/i...

You would need then to script to get data for 100 tables. There is a limit to how many you can run in parallel.

Or use an ETL tool like Talend.

SUPPORTING FILES:  Copy the following files (attached) to the SQOOP library folder

/user/lib/sqoop/lib/hortonworks-teradata-connector-xxxxxx.jar
/user/lib/sqoop/lib/teradata-connector-xxxxhadoopxxxx.jar
/user/lib/sqoop/lib/terajdbc4.jar
/user/lib/sqoop/lib/tdgssconfig.jar

#Note this may already be installed in the TDH


#Place the JDBC Drivers in /usr/lib/sqoop/lib

#Set Classpath

export HIVE_HOME=/usr/lib/hive
export HADOOP_HOME=/usr/lib/hadoop
export SQOOP_HOME=/usr/lib/sqoop
export HADOOP_CLASSPATH=$(hcat -classpath)
export LIB_JARS=$(echo ${HADOOP_CLASSPATH} | sed -e 's/::*/,/g’)


# Hive Import:

sqoop —hive-import —hive-overwrite - -create-hive-table —hive-table <table-name> —null-string ‘\\N' —null-non-string ‘\\N'

#Define a Table based on one in a database:
sqoop create-hive-table --connect jdbc:mysql://db.example.com/corp \ 
  --table employees --hive-table emps

#Other Examples
sqoop import -libjars ${LIB_JARS} -Dteradata.db.input.target.table.schema="cust_id int, acct_type string, acct_nbr string, acct_start_date date, acct_end_date date" -Dteradata.db.input.file.format=orcfile --connect jdbc:teradata://<teradata host ip address>/Database=financial --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password dbc --table accts --hive-import --hive-table financial.accts


sqoop import —connect jdbc:teradata://192.168.1.13/Database=retail --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password dbc --table accts --hive-import --hive-table financial.accts

View solution in original post

4 REPLIES 4

avatar

@Robin Dong has anyone been in contact with you from HW? From the myriad of questions it does seem you need some assistance in your deployment. While it is admirable you are doing so much on your own, and we want to continue to provide assistance in this forum, perhaps the most efficient way we can provide assistance is to understand your use case offline and see how we can support you.

avatar

@Robin Dong

There are many ways.

Use the Teradata Connector: Download connector http://hortonworks.com/hdp/addons/

Documentation

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_HortonworksConnectorForTeradata/content/i...

You would need then to script to get data for 100 tables. There is a limit to how many you can run in parallel.

Or use an ETL tool like Talend.

SUPPORTING FILES:  Copy the following files (attached) to the SQOOP library folder

/user/lib/sqoop/lib/hortonworks-teradata-connector-xxxxxx.jar
/user/lib/sqoop/lib/teradata-connector-xxxxhadoopxxxx.jar
/user/lib/sqoop/lib/terajdbc4.jar
/user/lib/sqoop/lib/tdgssconfig.jar

#Note this may already be installed in the TDH


#Place the JDBC Drivers in /usr/lib/sqoop/lib

#Set Classpath

export HIVE_HOME=/usr/lib/hive
export HADOOP_HOME=/usr/lib/hadoop
export SQOOP_HOME=/usr/lib/sqoop
export HADOOP_CLASSPATH=$(hcat -classpath)
export LIB_JARS=$(echo ${HADOOP_CLASSPATH} | sed -e 's/::*/,/g’)


# Hive Import:

sqoop —hive-import —hive-overwrite - -create-hive-table —hive-table <table-name> —null-string ‘\\N' —null-non-string ‘\\N'

#Define a Table based on one in a database:
sqoop create-hive-table --connect jdbc:mysql://db.example.com/corp \ 
  --table employees --hive-table emps

#Other Examples
sqoop import -libjars ${LIB_JARS} -Dteradata.db.input.target.table.schema="cust_id int, acct_type string, acct_nbr string, acct_start_date date, acct_end_date date" -Dteradata.db.input.file.format=orcfile --connect jdbc:teradata://<teradata host ip address>/Database=financial --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password dbc --table accts --hive-import --hive-table financial.accts


sqoop import —connect jdbc:teradata://192.168.1.13/Database=retail --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password dbc --table accts --hive-import --hive-table financial.accts

avatar
Master Mentor

avatar
Super Collaborator

@Robin Dong

As mentioned by Ancil, you might want to have a script to do the sqoop download in parallel. And you need to control quite well how big is your parallelism. Above all if you want to avoid the typical "No more spool space in...".

Here's a script to do that: https://community.hortonworks.com/articles/23602/sqoop-fetching-lot-of-tables-in-parallel.html

Another problem I saw in Teradata, is that it is some data types are not supported when you try to directly insert the data into Hive from Sqoop.

So the solution I took was the traditional one:

1) Sqoop to HDFS.

2) Build external tables on top of them

3) Create ORC file and then insert the data or the external tables