Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

sqoop import overhead using teradata hortonwork driver

Highlighted

sqoop import overhead using teradata hortonwork driver

New Contributor

Hi Everyone,

I am using sqoop to pull the data from teradata to HDP 2.5 using teradata driver(terajdbc4.jar) .Before it start pulling the data, it seems there is Big overhead, it is trying to search all the cashe directory, trying to create folder links. i.e

ln -sf "/usr/local/middleware/hadoop/yarn/local/usercache/root/filecache/185/jackson-databind-2.3.1.jar" "jackson-databind-2.3.1.jar" hadoop_shell_errorcode=$?

I believe it is trying to copy the launch_container.sh and all these pre steps are related to launch_container.sh.

Is there any way to skip these unnecessary step.?

Please find the sqoop command: and yarn container log file.application-1481156388370-0012.txt

sqoop import -Dhdp.version=2.5.3.0-37 \ --connect jdbc:teradata://xxxxxxx.org/DATABASE=xxxx \ --connection-manager org.apache.sqoop.teradata.TeradataConnManager \ --username xxxxxxxx \ --password xxxxxxx \ --table STATUS \ --target-dir /tmp/tt/status \ --as-textfile

@arjun.3066

2 REPLIES 2
Highlighted

Re: sqoop import overhead using teradata hortonwork driver

Contributor

The creation of files under /yarn/local/usercache is normal operation for any YARN job (including the MapReduce job launched by Sqoop). It is not recommended that you attempt to change this.

Could you describe why you are concerned about this part of Sqoop operation?

Highlighted

Re: sqoop import overhead using teradata hortonwork driver

New Contributor

Hi Wes Floyd,

My concerned is becouse, sqoop jobs hangs at this step for over 4 to 5 Mins. So even if i have to import few records from teradata it will take over 5 to 10 Mins to import the data to Hadoop, and this steps needs to done each and every time i submit any import job. i.e if i need to import 200 tables from teradata then it will spend considerable amount of time before reading any data from teradata.

Is there any source code available for TDHC, Teradata Hadoop connector. ?

Don't have an account?
Coming from Hortonworks? Activate your account here