Support Questions

Find answers, ask questions, and share your expertise

Sqoop Export to teradata random results

Hello All

We are trying to export from Hive to teradata using this command

sqoop export -Dorg.apache.sqoop.export.text.dump_data_on_error=true -Dhadoop.security.logger=DEBUG,NullAppender -Dhadoop.root.logger=DEBUG,console -Dsqoop.export.records.per.statement=20 -Dsqoop.export.statements.per.transaction=20 \ --driver com.teradata.jdbc.TeraDriver \ --connect jdbc:teradata://xxx.xxx.xxx.xxx/teradata_database \ --username dwh_tbda \ --password Teradata_2017 \ --table teratada_table \ --export-dir "hdfs_file" \ --input-null-non-string '\\N' \ --input-null-string '\\N' \ --input-fields-terminated-by '|' \ --num-mappers 20 \ --verbose \ --direct \ -- --output-method internal.fastload

It works ok some time, the results are random and some times export all the records, some times 0 records, or any value beetwen 0 and total. The yarn log is like :

0b1f030 sess=0 java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at com.teradata.jdbc.jdbc_4.io.TDNetworkIOIF$ConnectThread.run(TDNetworkIOIF.java:1229) at org.apache.sqoop.mapreduce.ExportOutputFormat.getRecordWriter(ExportOutputFormat.java:79) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.sql.SQLException: [Teradata JDBC Driver] [TeraJDBC 16.00.00.23] [Error 1277] [SQLState 08S01] Login timeout for Connection to 10.80.4.20 Wed Apr 11 16:39:10 COT 2018 socket orig=10.80.4.20 cid=50b1f030 sess=0 java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at com.teradata.jdbc.jdbc_4.io.TDNetworkIOIF$ConnectThread.run(TDNetworkIOIF.java:1229)

Some one can suggets a reason to explain whats is happening?

Thank you

2 REPLIES 2

Super Collaborator

hi @alvaro andres tovar martinez,

Apparently the issue is with tenacity in teradata, which controls the retry duration for a fastload/mlod operation in teradata.

when you use the fastlad, that plays a major role in busy cluster, the best thing you can do is to ensure that your teradata workload is capable enough to accept the connection when you requested it, no throttles controls your connection at that time or turn off the fastload and do it in normal mode (only if your data is small ) or just keep retry till it get connected in your logic.

more on the teradata jdbc dirver can be found here, but not much help on implementing the tenacity/sleep options as that was not possible when you use JDBC mode.

http://developer.teradata.com/doc/connectivity/jdbc/reference/current/jdbcug_chapter_2.html#BABFGFAF

Hope this helps !!

Thank you for your help.

Y have changed to --output-method batch.insert and run ok, but it takes too many time (all the day).

I am looking for alternatives to load 20 millons of rows. I am trying to create a spark/scala/sqoop script, so that using jdbc and sqoop make a parallel load in multiple tables, and exporting all the partitions of the table. At the side of Teradata i will make a View to consult all the exported tables.

I will try also the fast load for csv files using de jdbc driver instance of sqoop.

Thank you