After truncating a Hive table in Spark2, the permissions of the table's directory in HDFS are changed from the default drwxrwxrwx to drwxr-xr-x. The hive system user can no longer write data to the table, since it was created by another user and so its HDFS directory is owned by that user and not by hive.
This problem occurs only when I truncate the table in Spark2. If I recreate the table and then truncate it from the Hive CLI, the permissions of its HDFS directory remain the default drwxrwxrwx.
Here is the sample Pyspark script that I use for testing the truncate operation:
from pyspark.sql import SparkSession session = SparkSession.builder.enableHiveSupport().getOrCreate() session.sql('TRUNCATE TABLE testtable') session.stop()
Just a short (maybe not satisfying) answer: You shouldn’t use the hive user to write data, but only for operations.
If you need to use the hive user for some reason: you probably use Apache Ranger to authorize: Why don’t you give the hive user access to the data on HDFS level if it *really* needs to write there?
If you are not managing permissions with Ranger (which is recommended) you should look into: “umask” to change the default permissions of newly created directories and files or “hdfs acls” to give users and groups access to HDFS without Ranger