Member since
11-08-2017
7
Posts
0
Kudos Received
0
Solutions
09-13-2018
05:23 PM
@Raghavendra Gupta, have you tried Cobrix?
... View more
08-22-2018
07:05 PM
@Karthik Narayanan, you can use Cobrix to parse the EBCDIC files through Spark and stored them on HDFS using whatever format you want. It is open-source. DISCLAIMER: I work for ABSA and I am one of the developers behind this library. Our focus has been: 1) ease of use, 2) performance.
... View more
08-22-2018
06:58 PM
Very good survey. I would like, however, to introduce one more alternative for your appreciation. At ABSA we have been working on COBOL data source for Spark which we call Cobrix The performance results we have found so far are VERY encouraging and the ease of use is way ahead of Informatica's and similar tools.
... View more
12-04-2017
11:58 PM
Hi @Dongjoon Hyun. That definitely works, however, the requirements I had could not be addressed with that course of action. I was looking for a solution that works on Parquet files the same way Ranger works with Hive, for instance. I'd like to go to Ranger and set specific permissions directly on Parquet columns without having to first load the files into Hive. After better understanding how Ranger works I could realize that this is not possible, as Ranger works with hooks (plug-ins) to the tools it secures (HDFS, HBase, Hive, etc) and Parquet is simply a file format. A solution I started to investigate is an extension to the HDFS plug-in which could act on Parquet files, filtering access as specified through Ranger. With that solution, Parquet files could be secured at a column-level directly from Ranger as long as it's stored in HDFS. Anyway, thank you very much for the replies and also for checking on the result.
... View more
11-09-2017
04:35 PM
Hi @Dongjoon Hyun, thanks for the reply. The tutorial is great, very clear, however, how could I apply that to Parquet files? (sorry if a newbie question, but I'm indeed a newbie 🙂 )
... View more
11-08-2017
01:01 AM
I have been looking for a way to secure Parquet files, column-wise, for Spark access. Ideally, that would work the same way Ranger works for Hive, i.e., a Sysadmin defines the access policies for different groups and columns. I have been trying Ranger through HDP, however, it seems that plug-ins for Spark and Parquet are not there yet. I have also been able to devise a solution using Apache Drill and its views capability, however, it is not acceptable right now mainly because of the still scarce community support. Has anyone faced the same requirement and/or have some directions for a solution?
... View more
Labels:
- Labels:
-
Apache Ranger
-
Apache Spark