Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Securing Parquet Files Column-wise

Securing Parquet Files Column-wise

New Contributor

I have been looking for a way to secure Parquet files, column-wise, for Spark access. Ideally, that would work the same way Ranger works for Hive, i.e., a Sysadmin defines the access policies for different groups and columns.

I have been trying Ranger through HDP, however, it seems that plug-ins for Spark and Parquet are not there yet.

I have also been able to devise a solution using Apache Drill and its views capability, however, it is not acceptable right now mainly because of the still scarce community support.

Has anyone faced the same requirement and/or have some directions for a solution?

6 REPLIES 6

Re: Securing Parquet Files Column-wise

Expert Contributor

Could you try the following SPARK-LLAP? It uses Hive LLAP and Ranger inside Spark.

Row/Column-level Security in SQL for Apache Spark

Re: Securing Parquet Files Column-wise

New Contributor

Hi @Dongjoon Hyun, thanks for the reply.

The tutorial is great, very clear, however, how could I apply that to Parquet files? (sorry if a newbie question, but I'm indeed a newbie :) )

Re: Securing Parquet Files Column-wise

Expert Contributor

Please create a Hive table on those Parquet files. If Hive can access them securely with Ranger, Spark also can via SPARK-LLAP.

Re: Securing Parquet Files Column-wise

Expert Contributor

@Felipe Melo Does it solve your problem?

Re: Securing Parquet Files Column-wise

New Contributor

Hi @Dongjoon Hyun. That definitely works, however, the requirements I had could not be addressed with that course of action. I was looking for a solution that works on Parquet files the same way Ranger works with Hive, for instance. I'd like to go to Ranger and set specific permissions directly on Parquet columns without having to first load the files into Hive.

After better understanding how Ranger works I could realize that this is not possible, as Ranger works with hooks (plug-ins) to the tools it secures (HDFS, HBase, Hive, etc) and Parquet is simply a file format. A solution I started to investigate is an extension to the HDFS plug-in which could act on Parquet files, filtering access as specified through Ranger. With that solution, Parquet files could be secured at a column-level directly from Ranger as long as it's stored in HDFS.

Anyway, thank you very much for the replies and also for checking on the result.

Re: Securing Parquet Files Column-wise

Expert Contributor

I see. Yes, Ranger and Parquet does. I believe you can find a way for your requirements!