Created on 03-30-2017 08:13 PM - edited 08-17-2019 01:30 PM
Sometimes data in the system expires because it is no longer correct or the data was rented for a specific time period. One way to implement data expiration requirements is to delete the data after it is no longer valid. However you may also have another policy that requires retention of the data to track how decisions were made or for compliance with regulations. In addition deleting the data is more error prone to implement because an administrator must track a task in the future to delete the data after it expires. If the task is missed and the data is not deleted, expired data or illegal data could lead to incorrect decisions or lapses in compliance. This article shows an example of specifying the expiration date for a Hive table in Atlas and creating a tag based policy that prevents access of the table after the expiration date.
Enabling Atlas in the Sandbox
1. Create a Hortonworks HDP 2.5 Sandbox You can use either a Virtual Machine or a host in the cloud.
2. In the browser, enter the Ambari url (http://<sandbox host name>:8080)
3. Log in as user name raj_ops with password raj_ops
4. Atlas and its related services are stopped by default in the Sandbox. Follow the instructions in section 1 of the Tag Based Policies Tutorial to start the following services and turn off maintenance mode: Kafka, Ranger Tag Sync, HBase, Ambari Infrastructure, and Atlas. Wait for the operations to complete and all services to show green. Be sure to start Atlas last. For example, if HBase is not running, Atlas will not start properly and remain red in Ambari after it is started.
Creating a Hive finance Database and tax_2015 Table
5. First we will create a new Hive database and a new table. Then we will apply a Ranger policy to the table that causes it to expire and demonstrate that only specific users can access the table.
6. Click on the grid icon at the top right side of the window near the user name.
7. Select the Hive View menu option. The Hive View, a GUI interface for executing queries, appears.
8. In the Worksheet in the Query Editor, enter the following Hive statements:
CREATE DATABASE finance; DROP TABLE IF EXISTS finance.tax_2015; CREATE TABLE finance.tax_2015(ssn string, fed_tax double, state_tax double, local_tax double) STORED AS ORC; INSERT INTO TABLE finance.tax_2015 VALUES ('123-45-6789',22575,5750,2375); INSERT INTO TABLE finance.tax_2015 VALUES ('234-56-7890',31114,8765,2346); INSERT INTO TABLE finance.tax_2015 VALUES ('345-67-8901',35609,10123,3421);
9. Click the Execute button. The Execute button will turn orange with the label Stop Execution and will return to green with the label Execute when the statements complete.
Verifying Both maria_dev and raj_ops Users can Access tax_2015 Table
10. Once the Execute button is green you should see the finance database appear in the Database Explorer on the left side of the screen.
11. Click finance database. The tax_2015 table will appear.
13. We will now verify that maria_dev can also access the table. In the upper right corner pull down the menu with the user name (raj_ops).
14. Select Sign Out. The login screen will appear. Log in using user maria_dev and password maria_dev.
15. Select the tile icon and open the Hive View. Repeat the sample query for the tax_2015 table in the finance database. Verify that the query completes and maria_dev has access to the tax_2015 table.
Creating Tag Service and Expires on Tag Based Policy
16. Sign out of Ambari and log in again using user raj_ops and password raj_ops.
17. We will now create a tag based policy in Ranger to deny access to expired data. First we need to add a Tag service.
18. Click Dashboard at the top of the window.
19. Click on Ranger in the list of services.
20. Select Quick Links -> Ranger Admin UI.
21. Enter the user name raj_ops with password raj_ops. Pull down the Access Manager menu and select Tag Based Policies.
22.. If you don’t have a Sandbox_tag service already, select the + button to add a new Service.
23. Enter Sandbox_tag in the Service Name field and click Add.
24. We will now associate the new Tag Service with resource service for hive. Even if you already had a Sandbox_tag service, complete the next steps to verify that the Sandbox_tag service is associated with the Sandbox_hive service. If the tag service is not associated, tag based policies will not function properly.
25. Pull down the Access Manager menu and select Resource Based Policies.
26. Click on the pencil button to the right of the Sandbox_hive service. The Edit Service form appears.
27. Select Sandbox_tag from the Select Tag Service.
28. Click Save to save the changes to the hive service.
29. Pull down the Access Manager menu and select Tag Based Policies.
30. Click on the Sandbox_tag link.
31. An EXPIRES_ON policy is created by default.
32. Click on the Policy ID column for the EXPIRES_ON policy. By default all users are denied access to data after it expires.
33. We will now add a policy that allows raj_ops to access the expired data. Scroll down to the Deny Conditions and click show to expand the Exclude from Deny Conditions region.
34. Select raj_ops in Select User.
35. Click the + icon in the Policy Conditions column.
36. Enter yes in the Accessed after expiry_date. Click the green check icon to save the condition.
37. Click the plus button in the Component Permissions column.
38. Select hive from components and check hive to permit all hive operations. Click the green check button to save the Component Permissions.
39. The Deny and Exclude from Deny Conditions should look like the ones below. Everyone except raj_ops is denied access to all expired tables:
40. Click the green Save button at the bottom of the policy to save the policy.
Setting the Expiration Date for the tax_2015 Table by applying the EXPIRES_ON Tag
41. Return to Ambari. Log in with user name raj_ops and password raj_ops. Click on Dashboard at the top. Then select Atlas from the left. Then select Quick Links > Atlas Dashboard.
42. The Atlas login appears. Enter the user holger_gov and the password holger_gov.
43. Click on the Tags tab on the left side of the screen.
44. Click on Create Tag. The Create a new tag dialog opens.
45. In the Name field, enter EXPIRES_ON. Click the Create button.
46. Click on the ADD Attribute+ button for the EXPIRES_ON tag.
47. In the Attribute name field enter expiry_date. Click the green Add button.
48. Click on the Search tab.
49. Toggle right to select DSL. Select hive_table from Search For drop down. Click the green Search button.
50. Locate the tax_2015 table. Click on the + in the Tags column. The Add Tag dialog appears.
51. Select EXPIRES_ON from the drop down.
52. Set the expiry_date attribute to 2015/12/31 Then click the green Add button.
Verifying raj_ops can Access tax_2015 but maria_dev can't
53. Return to the Ambari Hive View and log in as raj_ops.
54. Enter the query below in the Query Editor:
select * from finance.tax_2015;
55. Click the green Execute button.
56. The query succeeds without error and the results appear in the bottom of the window.
57. Sign out of Ambari and log back in as maria_dev.
58. Enter the same query in the Query Editor:
select * from finance.tax_2015;
59. The following error is reported:
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [maria_dev] does not have [SELECT] privilege on [finance/tax_2015/fed_tax,local_tax,ssn,state_tax]
Inspecting the Audit Log to See Denial by EXPIRES_ON Rule
60. To see which policy caused the error. Return to Ranger and log in as raj_ops.
61. Click on Audit. Then click on the Access tab.
62. Click in the filter to select SERVICE TYPE Hive and RESULT Denied.
63. If you click on the Policy ID link, you will see the policy that caused the denial is the EXPIRES_ON policy.
This article shows how to create a tag based policy using Atlas and Ranger that prevents access to data after a specified date. Data expiration policies make it easier to comply with regulations and prevents errors caused by using out of date tables.