Support Questions

Find answers, ask questions, and share your expertise

Sqoop job failing with DSQuotaExceededException

avatar
Contributor

My sqoop job into Hive from Oracle is failing with Error: org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota of /user/xxxxxxxis exceeded: quota = 1649267441664 B = 1.50 TB but diskspace consumed = 1649581016912 B = 1.50 TB at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyStoragespaceQuota(DirectoryWithQuotaFeature.java:211) at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:239) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:912) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:741) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:700) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:525) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3624) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.storeAllocatedBlock(FSNamesystem.java:3208) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3089) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:822) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) at

I think perhaps I need to set the Hive configuration parameter hive.exec.scratchdir to a location with more space to fix this but I do not know how to do this and pass it to the sqoop job. Am I on the right track with diagnosing the problem? Can anyone help with this?

7 REPLIES 7

avatar
Master Mentor

HDFS has a mechanism called quotas, it is possible that your admin team set storage quotas on the individual user directories, you can set larger quota on your directory and avoid the situation

# requires superuser privileges
# set space quota of 1kb on a directory, can be k, m, g, etc.
sudo -u hdfs hdfs dfsadmin -setSpaceQuota 1k /quotasdir

# add a file
sudo -u hdfs hdfs dfs -touchz /quotasdir/1
# notice file is 0 bytes
sudo -u hdfs hdfs dfs -ls /quotasdir/

# for demo purposes, we need to upload a large file, larger than 1kb into directory, watch the prompt
sudo -u hdfs hdfs dfs -chown -R root:hdfs /quotasdir
hdfs dfs -put /root/install.log /quotasdir/

15/11/25 15:10:47 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota of /quotasdir is exceeded: quota = 1024 B = 1 KB but diskspace consumed = 402653184 B = 384 MB
    at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyStoragespaceQuota(DirectoryWithQuotaFeature.java:211)
    at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:239)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:907)

# remove space quota
sudo -u hdfs hdfs dfsadmin -clrSpaceQuota /quotasdir

avatar
Master Mentor

As an alternative, you can change scratchdir like so

https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration

Using the set command in the CLI or Beeline for setting session level values for the configuration variable for all statements subsequent to the set command. For example, the following command sets the scratch directory (which is used by Hive to store temporary output and plans) to /tmp/mydir for all subsequent statements:
  set hive.exec.scratchdir=/tmp/mydir;


Using the --hiveconf option of the hive command (in the CLI) or beeline command for the entire session. For example:
  bin/hive --hiveconf hive.exec.scratchdir=/tmp/mydir


In hive-site.xml. This is used for setting values for the entire Hive configuration (see hive-site.xml and hive-default.xml.template below). For example:
  <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/mydir</value>
    <description>Scratch space for Hive jobs</description>
  </property>


avatar
Contributor

Artem,

If I use the set hive.exec.scratchdir on the Beeline command line, how do I launch a Sqoop script that will pick up that option? I do not know how to launch sqoop script from Beeline command line. I launch my Sqoop script from the linux command line like this.

sqoop --options-file <filename>

avatar
Contributor
@Artem Ervits

When I tried to issue the command for the scratchdir I got this error:

set hive.exec.scratchdir=/data/groups/hdp_ground; Error: Error while processing statement: Cannot modify hive.exec.scratchdir at runtime. It is not in list of params that are allowed to be modified at runtime (state=42000,code=1)

Is it possible the sysadmins have set this up so it can't be modified?

avatar
Master Mentor

@Carol Elliott I didn't try this but can you try the following

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/location_of_your_hive_site.xml/*

edit your hive-site.xml file to have own scratchdir so you will have your own copy of hive-site.xml

then if it works you can add it to your bash_profile and source it.

source ~/.bash_profile

avatar
Master Mentor

@Carol Elliott actually see if you can use the -D or -conf options

-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property

The -conf, -D, -fs and -jt arguments control the configuration and Hadoop server settings. For example, the -D mapred.job.name=<job_name> can be used to set the name of the MR job that Sqoop launches, if not specified, the name defaults to the jar name for the job - which is derived from the used table name.

so in same fashion, try

sqoop import -D hive.exec.scratchdir=...

https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_using_generic_and_specific_arguments

avatar
Rising Star

hive.exec.scratchdir on this cluster is /tmp/hive. Don't know why the user appears to be exceeding quota on a personal directory.