I am using sqoop to pull the data from teradata to HDP 2.5 using teradata driver(terajdbc4.jar) .Before it start pulling the data, it seems there is Big overhead, it is trying to search all the cashe directory, trying to create folder links. i.e
ln -sf "/usr/local/middleware/hadoop/yarn/local/usercache/root/filecache/185/jackson-databind-2.3.1.jar" "jackson-databind-2.3.1.jar" hadoop_shell_errorcode=$?
I believe it is trying to copy the launch_container.sh and all these pre steps are related to launch_container.sh.
Is there any way to skip these unnecessary step.?
Please find the sqoop command: and yarn container log file.application-1481156388370-0012.txt
sqoop import -Dhdp.version=220.127.116.11-37 \ --connect jdbc:teradata://xxxxxxx.org/DATABASE=xxxx \ --connection-manager org.apache.sqoop.teradata.TeradataConnManager \ --username xxxxxxxx \ --password xxxxxxx \ --table STATUS \ --target-dir /tmp/tt/status \ --as-textfile
The creation of files under /yarn/local/usercache is normal operation for any YARN job (including the MapReduce job launched by Sqoop). It is not recommended that you attempt to change this.
Could you describe why you are concerned about this part of Sqoop operation?
Hi Wes Floyd,
My concerned is becouse, sqoop jobs hangs at this step for over 4 to 5 Mins. So even if i have to import few records from teradata it will take over 5 to 10 Mins to import the data to Hadoop, and this steps needs to done each and every time i submit any import job. i.e if i need to import 200 tables from teradata then it will spend considerable amount of time before reading any data from teradata.
Is there any source code available for TDHC, Teradata Hadoop connector. ?