About pbalasundaram

pbalasundaram · ‎03-02-2017

The hadoop.security.auth_to_local under Advanced core-site.xml in HDFS service can be modified to remove spaces in user names. This translation rule is a modification of the default rule The default rule has a line which removes the domain name RULE:[1:$1@$0](.*@MYDOMAIN.COM)s/@.*// In order to translate the user name with Spaces "John Doe@MYDOMAIN.COM" to "John_Doe" The following rule can be used RULE:[1:$1](.* .*)s/ /_/g/L RULE:[1:$1@$0](.*@MYDOMAIN.COM)s/@.*// The first rule replaces the Space with the Underscore while retaining the MYDOMAIN.COM and allows second rule also to be respected. This results in successful replacement of usernames with spaces to underscores.

pbalasundaram · ‎03-01-2017

When synchronizing users from AD or LDAP to Ranger using usersync, It is possible that there are spaces in usernames. These spaces in user names and Group Names may be replaced by SSSD to use underscores. (Example SSSD config section below. [sssd] override_space = _ Ranger needs to synchronize users similarly to avoid issues in access permission setup. To enable Ranger to perform this conversion , the following flags can be added to "Ranger --> Configs --> Advanced --> custom ranger-ug-site" By adding 2 properties ranger.usersync.mapping.groupname.regex, ranger.usersync.mapping.username.regex These two properties can be set to s/ /_/g After setting these properties restart ranger service using Ambari and then restart Ranger user sync. User sync will add new user names with Underscores in the user name. Older usernames and group names with Spaces will need manual cleanup. Note: This feature is available only From Ranger 0.5.1 version of Ranger. For more details, please refer to https://issues.apache.org/jira/browse/RANGER-684

pbalasundaram · ‎12-15-2016

1. WASB --> https://blogs.msdn.microsoft.com/cindygross/2015/02/04/understanding-wasb-and-hadoop-storage-in-azure/ WASB is a storage model which allows storage of data in Blobs within storage accounts/ containers in Azure cloud. 2. DASH --> http://sequenceiq.com/cloudbreak-deployer/latest/azure_pre_prov/ --> This link describes a few scale-out related limits of WASB, and proposes DASH as the solution. DASH is not supported as a storage option, and there are scalability limitations on the number of storage accounts. To quote, "When WASB is used as a Hadoop filesystem the files are full-value blobs in a storage account. It means better performance compared to the data disks and the WASB filesystem can be configured very easily but Azure storage accounts have their own limitations as well. There is a space limitation for TB per storage account (500 TB) as well but the real bottleneck is the total request rate that is only 20000 IOPS where Azure will start to throw errors when trying to do an I/O operation. To bypass those limits Microsoft created a small service called DASH. DASH itself is a service that imitates the API of the Azure Blob Storage API and it can be deployed as a Microsoft Azure Cloud Service. Because its API is the same as the standard blob storage API it can be used almost in the same way as the default WASB filesystem from a Hadoop deployment. DASH works by sharding the storage access across multiple storage accounts. It can be configured to distribute storage account load to at most 15 scaleout storage accounts. It needs one more namespace storage account where it keeps track of where the data is stored. When configuring a WASB filesystem with Hadoop, the only required config entries are the ones where the access details are described. To access a storage account Azure generates an access key that is displayed on the Azure portal or can be queried through the API while the account name is the name of the storage account itself. A DASH service has a similar account name and key, those can be configured in the configuration file while deploying the cloud service." 3. Cloudbreak's allocation model using Multiple storage accounts and Local HDFS. (High performance / Scale out option ) . When allocating HDFS on Azure, Cloudbreak can leverage multiple Storage Accounts and spread data across several storage accounts, This allows data to be sharded across various Storage accounts and helps overcome storage account level limitations on IOPS. This option can be used to scale upto 200 Storage accounts, where as DASH is limited to 15 Scale out Storage accounts.The disk selection can support both premium and storage , based on the VM Type. DS13 or Ds14 VM's are economical for most general purpose use cases, and can support 16 , 1TB storage disks (Standard) storageaccounts.png

pbalasundaram · ‎12-15-2016

Prior to Version 1.6.2-rc.8 version of Cloudbreak, provisioning of HDP Clusters on Azure was by Public IP only. This article describes provisioning of a cluster based on Private IP. While creating a Network within Cloudbreak, there are options to 1. Create a New Network and a new subnet 2. Use an existing subnet in an existing virtual network Under the Option 2 "Use an existing subnet in an existing virtual network" There are 2 options added , a) Don't create public IPs b) Don't create new firewall rules Upon selecting these options ( and using Private IP "export PUBLIC_IP=xx.xx.xx.xx" in the Profile file for Cloudbreak, ), Cloudbreak can install clusters using Private IP only in Azure cloud. A screen shot of this feature is attached here --> networkoptions.png

pbalasundaram · ‎12-13-2016

1. Install MySQL data Directories on a Non-root partition (Not /var/lib) 2. Create a dedicated least privileged account for Mysql deamon 3. Disable MySQL Command History, Command History may contain passwords which is viewable by other users 4. Disable interactive Login 5. Disable login from nodes other than the those used by hive services 6. Provide only hive user permission to the Hive metadata database within MySQL 7. During installation do not specifify passwords in command line 8. Ensure Data Directories for Mysql has appropriate permissions and ownerships 9. Ensure only DBA administrators have full database access 10. Ensure that database logging is enabled for error logs and log files are maintained on a non system partition 11. Ensure that old passwords is not set to 1 12. Ensure secure_auth is set to ON 13. Consider if your Component can work with the MYSQL "Connect using SSL" option

pbalasundaram · ‎12-12-2016

Is this a Kerberized cluster, if so are you using Firefox , IE Or chrome ?

pbalasundaram · ‎12-12-2016

+ https://community.hortonworks.com/questions/11369/is-there-a-way-to-export-ranger-policies-from-clus.html

pbalasundaram · ‎12-12-2016

This is a prior discussion on this topic, I am interested in any new approaches , in case this thread leads to one. https://community.hortonworks.com/questions/37036/how-to-replicate-policies-across-cluster.html

pbalasundaram · ‎11-07-2016

Hi The Hive principal is not a headless principal , ie the hive principal is dedicated to the HiveServer2 Server . So the Principal name always pooints to the Hiveserver2 , which in your case is qwang-hdp2. So if you are able to login using beeline -u "jdbc:hive2://qwang-hdp2:10000/default;principal=hive/qwang-hdp2@REALM.NAME" Then you are good.

pbalasundaram · ‎11-07-2016

Nifi can be downloaded as a standalone version (HDF2.0) Or installed with the help of the Nifi Management pack added to Ambari. Within sandbox, This version of Nifi should work fine. (Note , you may need to remove the LZO Codec from a copy of the core-site.xml to allow writing to HDFS with HDF2.0)

Online	Offline
Last Visited	‎09-20-2017 01:15 AM

Member Since	‎10-06-2015 11:29 AM
Last Visited	‎09-20-2017 01:15 AM
Posts	42
Kudos received	23

Cloudera Community

Re: AD Kerberized cluster Hive connection string

Re: Any use case for using hdf for importing data ...

Re: error install Hbase

Re: Tez with Transaction with Bucketing

Converting Spaces to Underscores for user names in...

Converting Spaces in usernames or group names in R...

Storage models for Cloudbreak based provisioning o...

Create Azure Linux VM with Private IP Only using C...

MYSQL Security for HDP service components

Re: Ambari Infra Solr UI & Ranger UI having Proble...

Re: Is there a way to bulk dump ranger policy usin...

Re: Is there a way to bulk dump ranger policy usin...

Re: AD Kerberized cluster Hive connection string

Re: Installing Nifi using Ambari