Created on 02-19-2016 01:44 AM - edited 09-16-2022 03:04 AM
I need hundreds tables from teradata, so I need to create all these in hive tables first and all data can be directly import to hive tables, shall we? if troubleshooting I need aware?
Created 02-19-2016 01:59 AM
There are many ways.
Use the Teradata Connector: Download connector http://hortonworks.com/hdp/addons/
Documentation
You would need then to script to get data for 100 tables. There is a limit to how many you can run in parallel.
Or use an ETL tool like Talend.
SUPPORTING FILES: Copy the following files (attached) to the SQOOP library folder /user/lib/sqoop/lib/hortonworks-teradata-connector-xxxxxx.jar /user/lib/sqoop/lib/teradata-connector-xxxxhadoopxxxx.jar /user/lib/sqoop/lib/terajdbc4.jar /user/lib/sqoop/lib/tdgssconfig.jar #Note this may already be installed in the TDH #Place the JDBC Drivers in /usr/lib/sqoop/lib #Set Classpath export HIVE_HOME=/usr/lib/hive export HADOOP_HOME=/usr/lib/hadoop export SQOOP_HOME=/usr/lib/sqoop export HADOOP_CLASSPATH=$(hcat -classpath) export LIB_JARS=$(echo ${HADOOP_CLASSPATH} | sed -e 's/::*/,/g’) # Hive Import: sqoop —hive-import —hive-overwrite - -create-hive-table —hive-table <table-name> —null-string ‘\\N' —null-non-string ‘\\N' #Define a Table based on one in a database: sqoop create-hive-table --connect jdbc:mysql://db.example.com/corp \ --table employees --hive-table emps #Other Examples sqoop import -libjars ${LIB_JARS} -Dteradata.db.input.target.table.schema="cust_id int, acct_type string, acct_nbr string, acct_start_date date, acct_end_date date" -Dteradata.db.input.file.format=orcfile --connect jdbc:teradata://<teradata host ip address>/Database=financial --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password dbc --table accts --hive-import --hive-table financial.accts sqoop import —connect jdbc:teradata://192.168.1.13/Database=retail --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password dbc --table accts --hive-import --hive-table financial.accts
Created 02-19-2016 01:53 AM
@Robin Dong has anyone been in contact with you from HW? From the myriad of questions it does seem you need some assistance in your deployment. While it is admirable you are doing so much on your own, and we want to continue to provide assistance in this forum, perhaps the most efficient way we can provide assistance is to understand your use case offline and see how we can support you.
Created 02-19-2016 01:59 AM
There are many ways.
Use the Teradata Connector: Download connector http://hortonworks.com/hdp/addons/
Documentation
You would need then to script to get data for 100 tables. There is a limit to how many you can run in parallel.
Or use an ETL tool like Talend.
SUPPORTING FILES: Copy the following files (attached) to the SQOOP library folder /user/lib/sqoop/lib/hortonworks-teradata-connector-xxxxxx.jar /user/lib/sqoop/lib/teradata-connector-xxxxhadoopxxxx.jar /user/lib/sqoop/lib/terajdbc4.jar /user/lib/sqoop/lib/tdgssconfig.jar #Note this may already be installed in the TDH #Place the JDBC Drivers in /usr/lib/sqoop/lib #Set Classpath export HIVE_HOME=/usr/lib/hive export HADOOP_HOME=/usr/lib/hadoop export SQOOP_HOME=/usr/lib/sqoop export HADOOP_CLASSPATH=$(hcat -classpath) export LIB_JARS=$(echo ${HADOOP_CLASSPATH} | sed -e 's/::*/,/g’) # Hive Import: sqoop —hive-import —hive-overwrite - -create-hive-table —hive-table <table-name> —null-string ‘\\N' —null-non-string ‘\\N' #Define a Table based on one in a database: sqoop create-hive-table --connect jdbc:mysql://db.example.com/corp \ --table employees --hive-table emps #Other Examples sqoop import -libjars ${LIB_JARS} -Dteradata.db.input.target.table.schema="cust_id int, acct_type string, acct_nbr string, acct_start_date date, acct_end_date date" -Dteradata.db.input.file.format=orcfile --connect jdbc:teradata://<teradata host ip address>/Database=financial --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password dbc --table accts --hive-import --hive-table financial.accts sqoop import —connect jdbc:teradata://192.168.1.13/Database=retail --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password dbc --table accts --hive-import --hive-table financial.accts
Created 02-19-2016 02:08 AM
Created 03-18-2016 11:06 AM
As mentioned by Ancil, you might want to have a script to do the sqoop download in parallel. And you need to control quite well how big is your parallelism. Above all if you want to avoid the typical "No more spool space in...".
Here's a script to do that: https://community.hortonworks.com/articles/23602/sqoop-fetching-lot-of-tables-in-parallel.html
Another problem I saw in Teradata, is that it is some data types are not supported when you try to directly insert the data into Hive from Sqoop.
So the solution I took was the traditional one:
1) Sqoop to HDFS.
2) Build external tables on top of them
3) Create ORC file and then insert the data or the external tables