Created 03-17-2017 04:21 AM
My sqoop job into Hive from Oracle is failing with Error: org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota of /user/xxxxxxxis exceeded: quota = 1649267441664 B = 1.50 TB but diskspace consumed = 1649581016912 B = 1.50 TB at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyStoragespaceQuota(DirectoryWithQuotaFeature.java:211) at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:239) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:912) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:741) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:700) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:525) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveAllocatedBlock(FSNamesystem.java:3624) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.storeAllocatedBlock(FSNamesystem.java:3208) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3089) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:822) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500) at
I think perhaps I need to set the Hive configuration parameter hive.exec.scratchdir to a location with more space to fix this but I do not know how to do this and pass it to the sqoop job. Am I on the right track with diagnosing the problem? Can anyone help with this?
Created 03-17-2017 11:48 AM
HDFS has a mechanism called quotas, it is possible that your admin team set storage quotas on the individual user directories, you can set larger quota on your directory and avoid the situation
# requires superuser privileges # set space quota of 1kb on a directory, can be k, m, g, etc. sudo -u hdfs hdfs dfsadmin -setSpaceQuota 1k /quotasdir # add a file sudo -u hdfs hdfs dfs -touchz /quotasdir/1 # notice file is 0 bytes sudo -u hdfs hdfs dfs -ls /quotasdir/ # for demo purposes, we need to upload a large file, larger than 1kb into directory, watch the prompt sudo -u hdfs hdfs dfs -chown -R root:hdfs /quotasdir hdfs dfs -put /root/install.log /quotasdir/ 15/11/25 15:10:47 WARN hdfs.DFSClient: DataStreamer Exception org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota of /quotasdir is exceeded: quota = 1024 B = 1 KB but diskspace consumed = 402653184 B = 384 MB at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyStoragespaceQuota(DirectoryWithQuotaFeature.java:211) at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:239) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:907) # remove space quota sudo -u hdfs hdfs dfsadmin -clrSpaceQuota /quotasdir
Created 03-17-2017 11:52 AM
As an alternative, you can change scratchdir like so
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration
Using the set command in the CLI or Beeline for setting session level values for the configuration variable for all statements subsequent to the set command. For example, the following command sets the scratch directory (which is used by Hive to store temporary output and plans) to /tmp/mydir for all subsequent statements: set hive.exec.scratchdir=/tmp/mydir; Using the --hiveconf option of the hive command (in the CLI) or beeline command for the entire session. For example: bin/hive --hiveconf hive.exec.scratchdir=/tmp/mydir In hive-site.xml. This is used for setting values for the entire Hive configuration (see hive-site.xml and hive-default.xml.template below). For example: <property> <name>hive.exec.scratchdir</name> <value>/tmp/mydir</value> <description>Scratch space for Hive jobs</description> </property>
Created 03-17-2017 01:20 PM
Artem,
If I use the set hive.exec.scratchdir on the Beeline command line, how do I launch a Sqoop script that will pick up that option? I do not know how to launch sqoop script from Beeline command line. I launch my Sqoop script from the linux command line like this.
sqoop --options-file <filename>
Created 03-17-2017 07:25 PM
When I tried to issue the command for the scratchdir I got this error:
set hive.exec.scratchdir=/data/groups/hdp_ground; Error: Error while processing statement: Cannot modify hive.exec.scratchdir at runtime. It is not in list of params that are allowed to be modified at runtime (state=42000,code=1)
Is it possible the sysadmins have set this up so it can't be modified?
Created 03-17-2017 08:09 PM
@Carol Elliott I didn't try this but can you try the following
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/location_of_your_hive_site.xml/*
edit your hive-site.xml file to have own scratchdir so you will have your own copy of hive-site.xml
then if it works you can add it to your bash_profile and source it.
source ~/.bash_profile
Created 03-17-2017 08:14 PM
@Carol Elliott actually see if you can use the -D or -conf options
-conf <configuration file> specify an application configuration file -D <property=value> use value for given property
The -conf
, -D
, -fs
and -jt
arguments control the configuration and Hadoop server settings. For example, the -D mapred.job.name=<job_name>
can be used to set the name of the MR job that Sqoop launches, if not specified, the name defaults to the jar name for the job - which is derived from the used table name.
so in same fashion, try
sqoop import -D hive.exec.scratchdir=...
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_using_generic_and_specific_arguments
Created 03-17-2017 08:49 PM
hive.exec.scratchdir on this cluster is /tmp/hive. Don't know why the user appears to be exceeding quota on a personal directory.