Support Questions

ask_bill_brooks · ‎11-04-2019

A ".temp" directory is getting created in the target directory while importing data using sqoop. I want to change the directory location where the .temp/job_1569164528775_11264/mr/* directories are getting created.

command:

sqoop import -D yarn.app.mapreduce.am.staging-dir=/user/test/ --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" --connect "jdbc:sqlserver://ip:port;database=database;" --connection-manager "org.apache.sqoop.manager.SQLServerManager" --username <username> -password <password> --table 'tablename' --as-parquetfile --delete-target-dir --target-dir /user/database/test/ --m 1

result now:

hadoop fs -ls -R /user/database/test/
/user/database/test/.temp
/user/database/test/.temp/job_1569164528775_11264/mr/.....*

Expected result:

hadoop fs -ls -R /user/test/
/user/test/.temp/job_1569164528775_11264/mr/.....*

tried the below aswell:

mapred.system.dir
mapreduce.task.output.dir
mapreduce.task.tmp.dir
mapred.work.output.dir
mapreduce.cluster.temp.dir
mapreduce.cluster.local.dir
mapred.temp.dir
mapreduce.jobtracker.system.dir
hadoop.tmp.dir

Shelton · ‎11-04-2019

@sow

Have you tried changing your --target-dir /user/database/test/ --m 1

$ sqoop import -D yarn.app.mapreduce.am.staging-dir=/user/test/ --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" --connect "jdbc:sqlserver://ip:port;database=database;" --connection-manager "org.apache.sqoop.manager.SQLServerManager" --username <username> -password <password> --table 'tablename' --as-parquetfile --delete-target-dir --target-dir /user/test/ --m 1

While running hive import target-dir argument value controls where the data needs to store temporarily before loading into Hive table, but target-dir doesn't create hive table in that location.

If you want to import to specific directory then use target-dir without hive-import argument and create hive table on top of HDFS directory.

HTH

sow · ‎11-04-2019

Hi @Shelton

I am not using hive-import or trying to create hive table.

The issue here is that a .temp directory that gets created at the target-dir i specify.

The .temp will be deleted once the execution is complete but my use case is to change the directory where the .temp is getting created during run time.

Shelton · ‎11-05-2019

@sow

I am not using hive-import or trying to create a hive table. but below is your code I can see target -dir

Thanks

sow · ‎11-05-2019

@Shelton

The target-dir is specified so that the generated files are placed in that directory.

The problem I am facing here is with the temporary directory (.temp) that gets created during run time (i.e when the mapreduce job is initiated) at the target -dir.

The solution I am looking for is to change the temporary directory(.temp) location.

Khanna · ‎11-07-2019

Hi,

You can use this tag "--temporary-rootdir" to make sure that the temporary data goes into this folder.

Example:- [example]
sqoop import --target-dir /<hdfs path>/<import_data_dir> --temporary-rootdir /<dfs path>/<tmp_dir> ...

Regards

Nitish

sow · ‎11-10-2019

@Khanna

I tried it. but the .temp is still getting created at the target directory that i specify.

Khanna · ‎11-10-2019

Hi,

Can you please share the sqoop command that you are running?

Regards

Nitish

sow · ‎11-10-2019

@Khanna

command used :
sqoop import --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" --connect "jdbc:sqlserver://server:port;database=db_name;" --connection-manager "org.apache.sqoop.manager.SQLServerManager" --username <> -password <> --table 'table_name' --as-parquetfile --delete-target-dir --target-dir /user/test/axe/ --temporary-rootdir /user/test2/ --m 4 --split-by user_id

before mapreduce starts:
hadoop fs -ls -R /user/test/
/user/test/axe
/user/test/axe/.metadata
/user/test/axe/.metadata/descriptor.properties
/user/test/axe/.metadata/schema.avsc
/user/test/axe/.metadata/schemas
/user/test/axe/.metadata/schemas/1.avsc

when the mapreduce job starts:
hadoop fs -ls -R /user/test/
/user/test/.temp
/user/test/.temp/job_1571067970221_0156
/user/test/.temp/job_1571067970221_0156/mr
/user/test/.temp/job_1571067970221_0156/mr/job_1571067970221_0156
/user/test/.temp/job_1571067970221_0156/mr/job_1571067970221_0156/.metadata
/user/test/.temp/job_1571067970221_0156/mr/job_1571067970221_0156/.metadata/descriptor.properties
/user/test/.temp/job_1571067970221_0156/mr/job_1571067970221_0156/.metadata/schema.avsc
/user/test/.temp/job_1571067970221_0156/mr/job_1571067970221_0156/.metadata/schemas
/user/test/.temp/job_1571067970221_0156/mr/job_1571067970221_0156/.metadata/schemas/1.avsc
/user/test/axe
/user/test/axe/.metadata
/user/test/axe/.metadata/descriptor.properties
/user/test/axe/.metadata/schema.avsc
/user/test/axe/.metadata/schemas
/user/test/axe/.metadata/schemas/1.avsc

once the import is complete:
hadoop fs -ls -R /user/test/
/user/test/axe
/user/test/axe/.metadata
/user/test/axe/.metadata/descriptor.properties
/user/test/axe/.metadata/schema.avsc
/user/test/axe/.metadata/schemas
/user/test/axe/.metadata/schemas/1.avsc
/user/test/axe/.signals
/user/test/axe/.signals/unbounded
/user/test/axe/679dadfc-3657-410b-a805-6c98b8d1720b.parquet
/user/test/axe/923bbf06-35d6-4156-8df3-a6e53ebf00f7.parquet
/user/test/axe/aef24eac-15dd-4ebc-a61a-68265d53a320.parquet
/user/test/axe/f903a079-87b3-48a0-bea6-aa02d92c5aac.parquet

cjervis · ‎04-11-2020

Hi,

NOTE:- Parquet is hard coded to write the temporary data in /tmp even though the target directory is different.

Kindly check /tmp for intermediate data, you will see it there.

Regards

Cloudera Community

Support Questions

changing temp directory in sqoop import