I'm trying to develop a small proof of concept to use Hadoop to store our older data stored in a Teradata DB. I want to store the content of some tables to know if is possible and feasible to do it. I would like to use Sqoop with HUE to achieve that. ¿Is that possible?
I'm new to Linux and Hadoop.
I've downloaded the VMWare version of Cloudera QuickStart VM CDH 4 (CDH 5 works too slow in my PC) from
It's running and ok.
I understand that the next step is to install the Teradata connector in that VM to getting Sqoop communicated with our Teradata DB. I've found this tutorial in Clouderas documentation
I'm following the Installation without Cloudera Manager section, as I've not CDH 5 nor Internet access from the VM. I will explain what I have done in every step (in blue goes the text from the documentation, in red what I've done)
1) Install the Sqoop connector by opening the distribution archive in a convenient location such as /usr/lib. Opening the distribution creates a directory that contains the jar file of the compiled version of the connector. Note the path to this jar file. The directory that is created when the file is expanded varies according to which connector you are using. Examples of typical resulting paths include:
Cloudera Connector Powered by Teradata 1.2cX: /usr/lib/sqoop-connector-teradata-1.2cX/sqoop-connector-teradata-1.2cX.jar
Cloudera Connector for Teradata 1.2cX: /usr/lib/sqoop-td-connector-1.2cX/sqoop-td-connector-1.2cX.jar
I've chose the Cloudera Connector because it's free and is for a small demo. I've downloaded it from
and deployed the content in
2) Copy the Teradata JDBC drivers (terajdbc4.jar and tdgssconfig.jar) to the lib directory of Sqoop installation. You can obtain these drivers from the Teradata download website: http://downloads.teradata.com/download/connectivity/jdbc-driver. Without these drivers, the connector will not function correctly.
I've downloaded the driver and deployed the files into
Is that route correct?
3) Confirm that the managers.d directory exists in the Sqoop configuration directory.
Note: Depending on how Sqoop is installed, its configuration directory may be in /etc/sqoop/conf, /usr/lib/sqoop/conf, or elsewhere if Sqoop was installed using the tar-ball distribution.
If the managers.d directory does not exist, create it and ensure the directory permissions are set to 755.
There was no managers.d directory in /etc/sqoop/conf, so I've created giving the appropiate permissions with CHMOD.
drwxr-xr-x 2 root root 4096 Dec 16 07:58 managers.d
4) Create a text file in the managers.d directory with a descriptive name such as cldra_td_connector. Ensure the file permissions are set to 644.
I've created the file with the required permissions:
-rw-r--r-- 1 root root 113 Dec 16 08:02 cldra_td_connector
5)The cldra_td_connector file must have the connector class name followed by the complete path to the directory where the connector jar is located.
For example, for the Cloudera Connector powered by Teradata 1.2cX
For example, for the Cloudera Connector for Teradata 1.2cX:
Note: The preceding command is shown on two lines, but this must be entered in a single line.
The TeradataManagerFactory acts as a single point of delegation for invoking the connector bundled with this distribution. An alternate way to specify TeradataManagerFactory is to add the following inside a sqoop-site.xml file, which must be inside a classpath directory:
This is the way to configure a Sqoop action to use the Teradata connector inside Oozie.
I've written the following in the /etc/sqoop/conf/managers.d/cldra_td_connector file (in a single line)
After having completed all the steps, I start the sqoop service in my VM.
Next, I go to HUE, select Sqoop from the upper menu. I click the "new job" button, and then "add a new connection" button.
And there, in the "connection" combobox, only "generic jdbc connector" appears.
I think that I've done all the steps correctly a "teradata" connector or something should appear, is that true?
I am doing it right?