Member since
09-25-2015
82
Posts
93
Kudos Received
17
Solutions
11-29-2019
10:36 AM
2 Kudos
What is Cloudera Data Warehouse? Cloudera Data Warehouse is an auto-scaling, highly concurrent and cost effective analytics service that ingests high scale data anywhere, from structured, unstructured and edge sources. It supports hybrid and multi-cloud infrastructure models by seamlessly moving workloads between on-premises and any cloud for reports, dashboards, ad-hoc and advanced analytics, including AI, with consistent security and governance. Cloudera Data Warehouse offers zero query wait times, reduced IT costs and agile delivery. See more information here Key Concepts: In the Cloudera Data Warehouse service, your data is stored in an object store in a data lake that resides in your specific cloud environment. The service is composed of: Database Catalogs: a metadata service associated to a CDP Data Lake which provides the data context for your defined tables and databases within the CDP Enterprise Data Cloud. Virtual Warehouses: compute resources running Hive or Impala on Kubernetes, which allow you to query data stored in cloud object store via the Database Catalogue. Please see Cloudera Documentation for further information. How do I monitor Virtual Warehouse usage? Cloudera Data Warehouse environments come with a pre-built Grafana dashboard that lets you monitor usage of all Virtual Warehouses within that environment. To access the Grafana dashboard, you will need to access the Kubernetes pods and extract the password. Pre-requisites: This article assumes you have already configured Cloudera Data Platform and the Cloudera Data Warehouse service with at least one Environment, at least one Database Catalogue and at least one Virtual Warehouse. Please see Cloudera's Getting Started Instructions. Install kubectl command line interface or your favourite kubernetes UI or CLI How To: 1. On the Cloudera Data Platform Home Page, open the Data Warehouse service: 2. Expand the Environments menu: 3. Click the hamburger menu on your desired Environment: 4. Click Show Kubeconfig and copy the text to your clipboard: 5. Paste the kubeconfig into a file and run the following command to access the kubernetes cluster for that Environment. This command will get the password which is stored encoded in base 64, decrypt it and copy the password to your clipboard: vi dwx.config kubectl --kubeconfig ~/dwx.config get secret grafana -n istio-system -o json | jq -r .data.passphrase | base64 -D | pbcopy 6. Go back to your Cloudera Data Warehouse environment and and click Open Grafana: 7. You should see the Grafana login screen open on a new tab: Username = admin Password = the password that is now on your clipboard 8. Once logged in, expand the istio menu and choose the Compute Autoscaling dashboard: The Compute Autoscaling dashboard will show you total node usage for your environment, as well as individual nodecounts for each of your Virtual Warehouses:
... View more
Labels:
03-15-2017
04:51 PM
At this time, column level security it only possible when accessing data through Hive
... View more
03-07-2017
04:44 PM
8 Kudos
Picture the scene... You've run the HDFS Balancer on your cluster and have your data balanced nicely across your DataNodes on HDFS. Your cluster is humming along nicely, but your system administrator runs across the office to you with alerts about full disks on one of your DataNode machines! What now? The Low-Down Uneven data distribution amongst disks isn't dangerous as such, though in some rare cases you may start to notice the fuller disks becoming bottlenecks for I/O. As of Apache Hadoop 2.7.3, it is not possible to balance disks within a single node (aka intra-node balancing) - the HDFS balancer only balances across DataNodes and not within them. HDFS-1312 is tracking work to introduce this functionality into Apache HDFS, but it will not be available before Hadoop 3.0. The conservative approach: Modify the following property to your HDFS configurations or add it if it isn't already there: dfs.datanode.du.reserved (reserved space in bytes per volume). This will always leave this much space free on all DataNode disks. Set it to a value that will make your sysadmin happy and continue to use the HDFS balancer as before until HDFS-1312 is complete. The brute force method (careful!): Run fsck and MAKE SURE there are no under-replicated blocks (IMPORTANT!!). Then just wipe the contents of the offending disk. HDFS will re-replicate those blocks elsewhere automatically! NOTE: Do not wipe more than one disk across the cluster at a time!!
... View more
Labels:
11-09-2016
10:02 AM
@vamsi valiveti the jar file is from the XML SerDe created by the community and is available on github: https://github.com/dvasilen/Hive-XML-SerDe
... View more
10-20-2016
04:50 PM
Hi @Jasper sorry just spotted this comment. That's interesting - I used the same technique for all of them. Did you get it working since then? Since you will have downloaded it in the first curl I'm guessing the URL is right. Silly question, but is the hive plugin enabled?
... View more
10-12-2016
04:59 PM
13 Kudos
In HDP 2.5, the addition of RANGER-606 has introduced the ability to explicity deny access to a Hadoop resource via a Ranger Policy.
RANGER-876 makes these types of policies optional by default for all except tag-based policies. To enable them, you must set enableDenyAndExceptionsInPolicies to true in the Service Definition for each of the Ranger Repository types as below, via the REST API: {
"name": "hdfs",
"description": "HDFS Repository",
"options": {
"enableDenyAndExceptionsInPolicies": "true"
}
} How To If deny policies are not enabled, the Ranger “Create Policy” UI will look like this: Get the current service definition of the desired repository via a curl command and output to a file: curl -u admin:admin ranger-admin-host.hortonworks.com:6080/service/public/v2/api/servicedef/1 > hdfs.json It should look something like this: {"id":1,"guid":"0d047247-bafe-4cf8-8e9b-d5d377284b2d","isEnabled":true,"createTime":1476173228000,"updateTime":1476173228000,"version":1,"name":"hdfs","implClass":"org.apache.ranger.services.hdfs.RangerServiceHdfs","label":"HDFS Repository","description":"HDFS Repository","options":{},"configs":[{"itemId":1,"name":"username","type":"string","subType":"","mandatory":true,"validationRegEx":"","validationMessage":"","uiHint":"","label":"Username"},{"itemId":2,"name":"password","type":"password","subType":"","mandatory":true,"validationRegEx":"","validationMessage":"","uiHint":"","label":"Password"},{"itemId":3,"name":"fs.default.name","type":"string","subType":"","mandatory":true,"validationRegEx":"","validationMessage":"","uiHint":"","label":"Namenode URL"},{"itemId":4,"name":"hadoop.security.authorization","type":"bool","subType":"YesTrue:NoFalse","mandatory":true,"defaultValue":"false","validationRegEx":"","validationMessage":"","uiHint":"","label":"Authorization Enabled"},{"itemId":5,"name":"hadoop.security.authentication","type":"enum","subType":"authnType","mandatory":true,"defaultValue":"simple","validationRegEx":"","validationMessage":"","uiHint":"","label":"Authentication Type"},{"itemId":6,"name":"hadoop.security.auth_to_local","type":"string","subType":"","mandatory":false,"validationRegEx":"","validationMessage":"","uiHint":""},{"itemId":7,"name":"dfs.datanode.kerberos.principal","type":"string","subType":"","mandatory":false,"validationRegEx":"","validationMessage":"","uiHint":""},{"itemId":8,"name":"dfs.namenode.kerberos.principal","type":"string","subType":"","mandatory":false,"validationRegEx":"","validationMessage":"","uiHint":""},{"itemId":9,"name":"dfs.secondary.namenode.kerberos.principal","type":"string","subType":"","mandatory":false,"validationRegEx":"","validationMessage":"","uiHint":""},{"itemId":10,"name":"hadoop.rpc.protection","type":"enum","subType":"rpcProtection","mandatory":false,"defaultValue":"authentication","validationRegEx":"","validationMessage":"","uiHint":"","label":"RPC Protection Type"},{"itemId":11,"name":"commonNameForCertificate","type":"string","subType":"","mandatory":false,"validationRegEx":"","validationMessage":"","uiHint":"","label":"Common Name for Certificate"}],"resources":[{"itemId":1,"name":"path","type":"path","level":10,"mandatory":true,"lookupSupported":true,"recursiveSupported":true,"excludesSupported":false,"matcher":"org.apache.ranger.plugin.resourcematcher.RangerPathResourceMatcher","matcherOptions":{"wildCard":"true","ignoreCase":"false"},"validationRegEx":"","validationMessage":"","uiHint":"","label":"Resource Path","description":"HDFS file or directory path"}],"accessTypes":[{"itemId":1,"name":"read","label":"Read","impliedGrants":[]},{"itemId":2,"name":"write","label":"Write","impliedGrants":[]},{"itemId":3,"name":"execute","label":"Execute","impliedGrants":[]}],"policyConditions":[],"contextEnrichers":[],"enums":[{"itemId":1,"name":"authnType","elements":[{"itemId":1,"name":"simple","label":"Simple"},{"itemId":2,"name":"kerberos","label":"Kerberos"}],"defaultIndex":0},{"itemId":2,"name":"rpcProtection","elements":[{"itemId":1,"name":"authentication","label":"Authentication"},{"itemId":2,"name":"integrity","label":"Integrity"},{"itemId":3,"name":"privacy","label":"Privacy"}],"defaultIndex":0}],"dataMaskDef":{"maskTypes":[],"accessTypes":[],"resources":[]},"rowFilterDef":{"accessTypes":[],"resources":[]}} 2. Update the file to add "options":{"enableDenyAndExceptionsInPolicies":"true"} {"id":1,"guid":"0d047247-bafe-4cf8-8e9b-d5d377284b2d","isEnabled":true,"createdBy":"Admin","updatedBy":"Admin","createTime":1476173228000,"updateTime":1476287031622,"version":2,"name":"hdfs","implClass":"org.apache.ranger.services.hdfs.RangerServiceHdfs","label":"HDFS Repository","description":"HDFS Repository","options":{"enableDenyAndExceptionsInPolicies":"true"},"configs":[{"itemId":1,"name":"username","type":"string","subType":"","mandatory":true,"validationRegEx":"","validationMessage":"","uiHint":"","label":"Username"},{"itemId":2,"name":"password","type":"password","subType":"","mandatory":true,"validationRegEx":"","validationMessage":"","uiHint":"","label":"Password"},{"itemId":3,"name":"fs.default.name","type":"string","subType":"","mandatory":true,"validationRegEx":"","validationMessage":"","uiHint":"","label":"Namenode URL"},{"itemId":4,"name":"hadoop.security.authorization","type":"bool","subType":"YesTrue:NoFalse","mandatory":true,"defaultValue":"false","validationRegEx":"","validationMessage":"","uiHint":"","label":"Authorization Enabled"},{"itemId":5,"name":"hadoop.security.authentication","type":"enum","subType":"authnType","mandatory":true,"defaultValue":"simple","validationRegEx":"","validationMessage":"","uiHint":"","label":"Authentication Type"},{"itemId":6,"name":"hadoop.security.auth_to_local","type":"string","subType":"","mandatory":false,"validationRegEx":"","validationMessage":"","uiHint":""},{"itemId":7,"name":"dfs.datanode.kerberos.principal","type":"string","subType":"","mandatory":false,"validationRegEx":"","validationMessage":"","uiHint":""},{"itemId":8,"name":"dfs.namenode.kerberos.principal","type":"string","subType":"","mandatory":false,"validationRegEx":"","validationMessage":"","uiHint":""},{"itemId":9,"name":"dfs.secondary.namenode.kerberos.principal","type":"string","subType":"","mandatory":false,"validationRegEx":"","validationMessage":"","uiHint":""},{"itemId":10,"name":"hadoop.rpc.protection","type":"enum","subType":"rpcProtection","mandatory":false,"defaultValue":"authentication","validationRegEx":"","validationMessage":"","uiHint":"","label":"RPC Protection Type"},{"itemId":11,"name":"commonNameForCertificate","type":"string","subType":"","mandatory":false,"validationRegEx":"","validationMessage":"","uiHint":"","label":"Common Name for Certificate"}],"resources":[{"itemId":1,"name":"path","type":"path","level":10,"mandatory":true,"lookupSupported":true,"recursiveSupported":true,"excludesSupported":false,"matcher":"org.apache.ranger.plugin.resourcematcher.RangerPathResourceMatcher","matcherOptions":{"wildCard":"true","ignoreCase":"false"},"validationRegEx":"","validationMessage":"","uiHint":"","label":"Resource Path","description":"HDFS file or directory path"}],"accessTypes":[{"itemId":1,"name":"read","label":"Read","impliedGrants":[]},{"itemId":2,"name":"write","label":"Write","impliedGrants":[]},{"itemId":3,"name":"execute","label":"Execute","impliedGrants":[]}],"policyConditions":[],"contextEnrichers":[],"enums":[{"itemId":1,"name":"authnType","elements":[{"itemId":1,"name* Connection #0 to host ana-sme-security2.field.hortonworks.com left intact
":"simple","label":"Simple"},{"itemId":2,"name":"kerberos","label":"Kerberos"}],"defaultIndex":0},{"itemId":2,"name":"rpcProtection","elements":[{"itemId":1,"name":"authentication","label":"Authentication"},{"itemId":2,"name":"integrity","label":"Integrity"},{"itemId":3,"name":"privacy","label":"Privacy"}],"defaultIndex":0}],"dataMaskDef":{"maskTypes":[],"accessTypes":[],"resources":[]},"rowFilterDef":{"accessTypes":[],"resources":[]}} 3. Put the updated file back into the Service Definition: curl -iv -u admin:admin -X PUT -H "Accept: application/json" -H "Content-Type: application/json" -d @hdfs.json ranger-admin-host.hortonworks.com:6080/service/public/v2/api/servicedef/1
If successful, the Ranger “Create Policy” UI will look like this: 4. Repeat for any other desired repository.
References Apache Ranger Wiki: Deny Conditions and Excludes in Ranger Policies Apache Ranger Wiki: REST APIs for Service Definition, Service and Policy Management
... View more
Labels: