About mqureshi

mqureshi · ‎07-24-2016

@sankar rao Can you please check the value of dfs.permissions.enabled in your hdfs-site.xml? If this is false, permissions are not enforced.

mqureshi · ‎07-24-2016

sure. change swappiness to 1 not 0 which is what you have. Also disable transparent_huge_page as well as transparent_hugepage defrag. When you are using Hadoop, your files are in Hadoop file system. So you don't really need to track last access time for your files. Disable that for your mount points using "noatime". This will stop linux from keeping track of last access times which is not used anyway. If you could, in your BIOS settings, change CPU and CPU frequency governance to performance mode. This is a tradeoff between power and performance. So make sure you know what you are doing. your current loads may not already be CPU bound so there may not be any point in doing this. Same things with power settings. you change those to performance for power/performance tradeoff. These are more meant to be sucking juice out of your hardware after you have done everything at OS level, so ideally you don't want to go this far.

mqureshi · ‎07-24-2016

@SBandaru I think this link is what you are looking for. http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.2.0/bk_Installing_HDP_AMB/content/_prepare_the_environment.html Also, I am pretty sure in the newer version of Redhat, you need to setup swappiness to 1 instead of 0. I would also disable transparent_hugepage do a cat on the the following file on your OS and see if it's set to never or always. If it's at always, change it to never. /sys/kernel/mm/redhat_transparent_hugepage/defrag Use the following command to change this value to never echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag I am not hundred percent sure but I think this requires a restart.

mqureshi · ‎07-23-2016

@Saurabh Kuma @Madhavi Amirneni Let me try to explain how security works from a high level and why Ranger without Kerberos is useless. Kerberos is what is used by many applications to authenticate a user. That is, to verify that the user is exactly who he says he is. Ranger is used as a next step in security. That is once you know, let's say a user name Alex is actually 'Alex', then does Alex have permissions to view a particular set of data. That is the job of Ranger. It enforces policies for users who have been authenticated. Authentication of whether a user is actually who he says he is, is done by Kerberos. Without Kerberos you don't even know, if Alex is actually Alex. That's why there is no point in trying to enforce using Ranger a policy that Alex cannot access certain datasets or doing an audit on what Alex did when he logged in. Without Kerberos you are not even sure, was it really Alex, when someone named Alex logged in. Authorization and audit are pretty much useless at that point and that's why you need a Kerberized cluster before you enable authorization/auditing using Ranger.

mqureshi · ‎07-23-2016

@jestin ma I wonder if doing a filter would help rather than a join and achive the same results. So instead of join, is it possible to do something like this? df1.filter(df2).groupBy(key).count().

mqureshi · ‎07-23-2016

I think this is permission issue for the principal smanjee for Phoenix service. Can you try the following from the same node where you have squirrel and after kinit try accessing your phoenix service from command line. I think it would still fail and once you resolve that with proper permissions for this user, your squirrel issue would be resolved too. Hope this helps. kinit -ket <your keytab file> smanjee@CLOUD.HORTONWORKS.COM

mqureshi · ‎07-23-2016

@Aman Poonia I think what you are asking for is N+2 redundancy for namenode. This feature will be available in Hadoop 3.0. It would allow 3-5 name ndoes. Please see the following Jira. https://issues.apache.org/jira/browse/HDFS-6440

mqureshi · ‎07-19-2016

Hi @sujitha sanku The administration tool is Ambari. You can share the details from Ambari docs on how much details you want to share. Thanks

mqureshi · ‎07-13-2016

Those db's are likely for hive metastore as well as for Ambari. These services are often run on master or edge nodes.

mqureshi · ‎07-13-2016

@Kumar Veerapan It is not true that namenode will perform all admin functions. You need Ambari to manage the cluster. Namenode only stores the metadata for Hadoop files. As for gateway, you need these because in a large cluster you don't want clients to connect directly to the cluster and open up the cluster for clients. You would rather have gateway nodes so clients use these to access the cluster.

Online	Offline
Last Visited	‎10-31-2017 03:17 AM

Member Since	‎06-07-2016 09:05 AM
Last Visited	‎10-31-2017 03:17 AM
Posts	923
Kudos received	310

Cloudera Community

Re: YARN recommended configuration

Re: How to resolve for NULL values when they are c...

Re: Why is spark has better speed than Hadoop

Re: Is it possible to assign Hadoop queues to Hado...

Re: Kafka NiFi HDF Installation

Re: Base hdfs file system permissions i am unable ...

Re: All OS settings for RedHat Cluster?

Re: All OS settings for RedHat Cluster?

Re: On an unkerborized cluster, how can I get rang...

Re: Rdd/DataFrame/DataSet Performance Tuning

Re: Kerberos Unable to obtain password from user

Re: Is there a way to configure 'n' hosts as namen...

Re: Administration tools:

Re: Question on Admin node and Gateway node

Re: Question on Admin node and Gateway node