Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Truncating a Hive table in Spark2 changes permissions of the table's directory in HDFS

Truncating a Hive table in Spark2 changes permissions of the table's directory in HDFS

New Contributor

After truncating a Hive table in Spark2, the permissions of the table's directory in HDFS are changed from the default drwxrwxrwx to drwxr-xr-x. The hive system user can no longer write data to the table, since it was created by another user and so its HDFS directory is owned by that user and not by hive.

This problem occurs only when I truncate the table in Spark2. If I recreate the table and then truncate it from the Hive CLI, the permissions of its HDFS directory remain the default drwxrwxrwx.

Here is the sample Pyspark script that I use for testing the truncate operation:

from pyspark.sql import SparkSession
session = SparkSession.builder.enableHiveSupport().getOrCreate()
session.sql('TRUNCATE TABLE testtable')
session.stop() 
1 REPLY 1
Highlighted

Re: Truncating a Hive table in Spark2 changes permissions of the table's directory in HDFS

Contributor

Hi Dobromir,

Just a short (maybe not satisfying) answer: You shouldn’t use the hive user to write data, but only for operations.

If you need to use the hive user for some reason: you probably use Apache Ranger to authorize: Why don’t you give the hive user access to the data on HDFS level if it *really* needs to write there?

If you are not managing permissions with Ranger (which is recommended) you should look into: “umask” to change the default permissions of newly created directories and files or “hdfs acls” to give users and groups access to HDFS without Ranger

Best,

Stefan