Member since
02-10-2016
34
Posts
16
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1689 | 06-20-2016 12:48 PM | |
1143 | 06-11-2016 07:45 AM |
05-17-2018
08:54 AM
Thanks @Felix Albani I am aware that we can turn on/off auditing at a policy level. However, I am looking at granular control at operations level so that we can disable logging for some operations which are perceived as unimportant.
... View more
05-16-2018
10:35 AM
Hi, Is there a mechanism to choose particular operations / audit types for auditing in Ranger while ignoring other types? We are exploring the possibility of turning-off logging mechanism for few categories in anticipation of better search performance in RangerSolr. Regards Vijay
... View more
Labels:
- Labels:
-
Apache Ranger
01-30-2017
10:54 PM
Thanks @Sriharsha Chintalapani for bringing out this article. A much needed one with growing importance of Kafka in every data-centric organisation. Covers lot of ground from MirrorMaker perspective. Thanks
... View more
01-23-2017
06:12 PM
1 Kudo
Hi, I am trying to distcp data between two encryption zones located on two different clusters. Data has been copied successfully. However, when I read the data on the target cluster, I see some gibberish being printed on the terminal. Encryption Zone on source has been created with key (test-key). As its a DR requirement, I created a key on the target cluster with the same key name i.e. test-key. However, fundamentally they both are completely independent clusters. I presume when DistCp reads the data from the source cluster, it should read and transfer the data transparently using source side key and material and then write to target using the target’s key and material Wondering where this has gone wrong. Any pointers?
... View more
Labels:
- Labels:
-
Apache Hadoop
12-01-2016
05:09 PM
2 Kudos
Hi, I am trying to understand what are the benefits of using distcp -update vs distcp -update with hdfs snapshot differences? As I understand, update without any snapshot options will only replicate the modified data at source and doesnt touch files that are already existing at the destination. What additional benefits would one realise using HDFS snapshots in distcp based replication? Thanks Vijay
... View more
Labels:
- Labels:
-
Apache Hadoop
11-08-2016
04:22 PM
@Mike Garris Hi, LUKS is a disk level encryption and hence is independent of the encryption supported by HDFS. Please see the link below to have an overview of the various levels of encryptions and where TDE sits. https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html Hope that answers your query.
... View more
09-30-2016
10:19 AM
1 Kudo
Hi, We are running distcp / falcon based replication between clusters. As depicted in the diagram above, we have edge nodes configured on both the clusters and a dedicated private network link has been established between them. For normal cluster traffic, I presume it uses the normal firewall channeled network. However, my understanding of distcp is that it works as name node to name node communication and hence would probably use the firewall route, but not the private link. Can anyone please guide me on how to make use of the private link so that all the replication traffic (which is expected to be huge and also to adhere to SLAs) would be directed through this. Looking for alternate suggestions and ideas to make this more performant. Thanks Vijay
... View more
Labels:
- Labels:
-
Apache Falcon
-
Apache Hadoop
08-03-2016
04:35 PM
Thanks @Sunile Manjee This is the approach I think I need to follow. Just was trying to understand if there is any other alternative. To answer your question around Falcon, we are not using because we are on HDP2.4.2 and need to leverage HDFS snapshots. Falcon doesn't yet support snapshots till 2.5. So going with this approach for now.
... View more
08-03-2016
02:56 PM
Hi, I am working on a distcp solution between two clusters. On cluster01 HDFS, there are multiple directories and each is owned by a different application team. The requirement is to distcp these directories onto cluster02 by preserving the access privileges. Both the clusters are secured. I was thinking of having a service user something like "distcp-user" with its own kerberos principal who can manage the distcp process and auditing would be easy as well.
Would it be possible for distcp-user to complete the distcp process without having read access on cluster01 and write access on cluster02? Is this something impersonation can help with? For example, if dir1 on cluster01 is owned appuser1 and dir2 owned by appuser2, can distup-user impersonate both appuser1 and appuser2 and perform the distcp jobs on their behalf without sniffing into the actual underlying data? Is it only possible if distcp-user has appropriate read access enabled on the cluster01 and write access on cluster02, something to be managed by Ranger / HDFS ACLs? Thanks Vijay
... View more
Labels:
- Labels:
-
Apache Hadoop
07-28-2016
01:36 PM
Thanks @mqureshi for your response. In order to explain my case better, I have created another question with more detail. Request you to please have a look at it.
... View more