Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

changing temp directory in sqoop import

avatar
Contributor

A ".temp" directory is getting created in the target directory while importing data using sqoop. I want to change the directory location where the .temp/job_1569164528775_11264/mr/* directories are getting created.

 

command:

 

sqoop import -D yarn.app.mapreduce.am.staging-dir=/user/test/ --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" --connect "jdbc:sqlserver://ip:port;database=database;" --connection-manager "org.apache.sqoop.manager.SQLServerManager" --username <username> -password <password> --table 'tablename' --as-parquetfile --delete-target-dir --target-dir /user/database/test/ --m 1

 

result now:

 

hadoop fs -ls -R /user/database/test/
/user/database/test/.temp
/user/database/test/.temp/job_1569164528775_11264/mr/.....*

 

Expected result:

 

hadoop fs -ls -R /user/test/
/user/test/.temp/job_1569164528775_11264/mr/.....*

 

tried the below aswell:

mapred.system.dir
mapreduce.task.output.dir
mapreduce.task.tmp.dir
mapred.work.output.dir
mapreduce.cluster.temp.dir
mapreduce.cluster.local.dir
mapred.temp.dir
mapreduce.jobtracker.system.dir
hadoop.tmp.dir

 

 

9 REPLIES 9

avatar
Master Mentor

@sow 

Have you tried changing your  --target-dir /user/database/test/ --m 1

$ sqoop import -D yarn.app.mapreduce.am.staging-dir=/user/test/ --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" --connect "jdbc:sqlserver://ip:port;database=database;" --connection-manager "org.apache.sqoop.manager.SQLServerManager" --username <username> -password <password> --table 'tablename' --as-parquetfile --delete-target-dir --target-dir /user/test/ --m 1

 

While running hive import target-dir argument value controls where the data needs to store temporarily before loading into Hive table, but target-dir doesn't create hive table in that location.

If you want to import to specific directory then use target-dir without hive-import argument and create hive table on top of HDFS directory.


HTH

avatar
Contributor

Hi @Shelton 

 

I am not using hive-import or trying to create hive table.

 

The issue here is that  a .temp directory that gets created at the target-dir i specify.

The .temp will  be deleted once the execution is complete but my use case is to change the directory where the .temp is getting created during run time.

 

 

avatar
Master Mentor

@sow 

 

I am not using hive-import or trying to create a hive table. but below is your code I can see target -dir

targetdir.PNG

 Thanks

avatar
Contributor

@Shelton 

 

The target-dir is specified so that the generated files are placed in that directory.

The problem I am facing here is with the temporary directory (.temp) that gets created during run time (i.e when the mapreduce job is initiated) at the target -dir.

The solution I am looking for is to change the temporary directory(.temp) location.

avatar
Expert Contributor

Hi,

 

You can use this tag "--temporary-rootdir" to make sure that the temporary data goes into this folder.

 

Example:- [example]
sqoop import --target-dir /<hdfs path>/<import_data_dir> --temporary-rootdir /<dfs path>/<tmp_dir> ...

 

Regards

Nitish

avatar
Contributor

@Khanna 

 

I tried it. but the .temp is still getting created at the target directory that i specify.

 

 

avatar
Expert Contributor

Hi,

 

Can you please share the sqoop command that you are running?

 

Regards

Nitish

avatar
Contributor

@Khanna 

 

 

command used :
sqoop import --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" --connect "jdbc:sqlserver://server:port;database=db_name;" --connection-manager "org.apache.sqoop.manager.SQLServerManager" --username <> -password <> --table 'table_name' --as-parquetfile --delete-target-dir --target-dir /user/test/axe/ --temporary-rootdir /user/test2/ --m 4 --split-by user_id

 

before mapreduce starts:
hadoop fs -ls -R /user/test/
/user/test/axe
/user/test/axe/.metadata
/user/test/axe/.metadata/descriptor.properties
/user/test/axe/.metadata/schema.avsc
/user/test/axe/.metadata/schemas
/user/test/axe/.metadata/schemas/1.avsc

 

when the mapreduce job starts:
hadoop fs -ls -R /user/test/
/user/test/.temp
/user/test/.temp/job_1571067970221_0156
/user/test/.temp/job_1571067970221_0156/mr
/user/test/.temp/job_1571067970221_0156/mr/job_1571067970221_0156
/user/test/.temp/job_1571067970221_0156/mr/job_1571067970221_0156/.metadata
/user/test/.temp/job_1571067970221_0156/mr/job_1571067970221_0156/.metadata/descriptor.properties
/user/test/.temp/job_1571067970221_0156/mr/job_1571067970221_0156/.metadata/schema.avsc
/user/test/.temp/job_1571067970221_0156/mr/job_1571067970221_0156/.metadata/schemas
/user/test/.temp/job_1571067970221_0156/mr/job_1571067970221_0156/.metadata/schemas/1.avsc
/user/test/axe
/user/test/axe/.metadata
/user/test/axe/.metadata/descriptor.properties
/user/test/axe/.metadata/schema.avsc
/user/test/axe/.metadata/schemas
/user/test/axe/.metadata/schemas/1.avsc

 

once the import is complete:
hadoop fs -ls -R /user/test/
/user/test/axe
/user/test/axe/.metadata
/user/test/axe/.metadata/descriptor.properties
/user/test/axe/.metadata/schema.avsc
/user/test/axe/.metadata/schemas
/user/test/axe/.metadata/schemas/1.avsc
/user/test/axe/.signals
/user/test/axe/.signals/unbounded
/user/test/axe/679dadfc-3657-410b-a805-6c98b8d1720b.parquet
/user/test/axe/923bbf06-35d6-4156-8df3-a6e53ebf00f7.parquet
/user/test/axe/aef24eac-15dd-4ebc-a61a-68265d53a320.parquet
/user/test/axe/f903a079-87b3-48a0-bea6-aa02d92c5aac.parquet

 

avatar
Expert Contributor

Hi,

 

NOTE:- Parquet is hard coded to write the temporary data in /tmp even though the target directory is different.

 

Kindly check /tmp for intermediate data, you will see it there.

 

Regards