Our client has Cloudera Enterprise physical cluster. Our software depends on the cluster to perform work for our client's user. To be able to properly monitor the whole platform and better debug performance issues, we wanted to ask client to give us read-only access to Cloudera Manager interface.
Unfortunately there is a catch - the data itself is confidential and we shouldn't be able to see it. Read-only user role in CM would do, as all we really want are performance metrics and monitoring. *Unfortunately Read-Only is also able to see Impala queries being executed* - which can contain sensitive data themselves e.g. inside literals. That's why we can't be granted it.
Would it be possible to restrict Read-Only users to only sub-set of services? In our case we don't care about monitoring Imapala at all, we would just want to be able to monitor Hosts, HDFS, YARN and Spark.
Impala has a configuration for limiting what queries a non-admin user is able to see. Impala > Configuration > Non-Admin Users Query List Visibility Settings.
Similarly, YARN configurations contain Non-Admin Users Applications List Visibility Settings.
Does this help?
Got quite excited by this, but customer devs pointed out, that read-only user can still see Hive logs (redacting them is on their roadmap) and possibly Hive queries, so its still no go :(
Sight, looks like my next week project is implementing own metric monitor for hadoop...
For now, I'll say that Cloudera is targeting new granular access control features for coming releases of Cloudera Manager. The specifics could change, so details of what those new controls are is not available at this time.
What I'll do is follow up with the team working on access control and ask if a way will be added to protect against viewing YARN Applications and Impala Queries. If not, I'll open a feature request internally.
Thanks for the info. From my point of view there would be two ways of approaching this problem:
1) Provide "read-only" access on per-service basis
2) Create some sort of "monitoring" user than can only see non-data related stuff, ie metrics and statuses
As noted above, looks like current version can hide Impala queries, but now we are blocked by issue of being able to see Hive queries in logs (I know they could be redacted, but that's besides the point).
Will be looking forward to grantular access features then, which hopefully will solve all the above - is there any predicted date for the release?
There are plans to introduce granular access control in Cloudera Manager 6 (months till released).
For now, I don't think there is any way to achieve the level of access control you desire with CM.