Member since
08-02-2019
131
Posts
92
Kudos Received
13
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1718 | 12-03-2018 09:33 PM | |
2297 | 04-11-2018 02:26 PM | |
1488 | 05-09-2017 09:35 PM | |
537 | 03-31-2017 12:59 PM | |
1009 | 11-21-2016 08:58 PM |
12-21-2018
09:28 PM
Try restarting the ambari-server and ambari-agent: 1. Log into the ambari host. sudo ambari-server restart 2. Log into the metron host. sudo ambari-agent restart. You can check to see if the indexing topology is running by opening the Storm UI. You will see the random access and batch indexing topologies running (or not).
... View more
12-04-2018
07:25 PM
@Michael Bronson The page Designing a ZooKeeper explains that a 3 node cluster an survive 1 node failure. A 5 node cluster can survive 2 node failures.
... View more
12-04-2018
07:21 PM
@Michael Bronson A 3 node ZK cluster can survive the loss of one ZK node.
... View more
12-04-2018
07:21 PM
@Michael Bronson Physical machines are better. ZooKeeper is very sensitive to disk read latency. Virtual machines are typically connected to a networked file system. Writes to the file system can be delayed if the network connecting the VM to the networked file appliance is busy. Also VMs on hypervisors can do other fancy tricks like migrating which can cause zookeeper to fail. If you do use VMs you will need to modify some of the connection timeouts and shut off VM migration. VMs should be hosted on different hypervisors.
... View more
12-04-2018
07:16 PM
1 Kudo
@Michael Bronson ZooKeeper needs an odd number of hosts so it can build a quorum. A 3 node cluster can survive the loss of 1 node. It will fail if there is a simultaneous loss of 2 nodes (for example a node fails during an upgrade). If zookeeper goes down the brokers will not operate. Designing a ZooKeeper deployment explains: "For the ZooKeeper service to be active, there must be a majority of non-failing machines that can communicate with each other. To create a deployment that can tolerate the failure of F machines, you should count on deploying 2xF+1 machines. Thus, a deployment that consists of three machines can handle one failure, and a deployment of five machines can handle two failures. Note that a deployment of six machines can only handle two failures since three machines is not a majority. For this reason, ZooKeeper deployments are usually made up of an odd number of machines."
... View more
12-03-2018
09:33 PM
Depending on how many tenants you have using ZooKeeper you may be just fine with 3 nodes. In ZooKeeper, more nodes doesn't always yield better performance because of the communication overhead. Adding more nodes decreases write performance because of the node to node communication required to synchronize across the cluster. A few things to consider: 1. ZooKeeper will be more fault tolerant with 5 nodes vs 3 nodes. A 3 node cluster can only tolerate one down node before it loses its quorum. 2. Get the best performance out of your current 3 node deployment by following best practices. 4. Look at the current 3 node cluster performance under existing load and see the capacity you have. Check out this article. Add new kafka nodes and see how performance is affected. 3. Zookeeper needs an odd number of nodes and you will most likely not need to have more than 7. 4. Later versions of Kafka do not rely on zookeeper for consumer offsets. How Kafka uses ZooKeeper. This article describes the 0.10 release and later 5. Consider upgrading to HDP 3.0 and use Streams Messaging Manager. It makes managing Kafka a lot easier, but it only works on HDP 3.0 and above. Best of luck on your Kafka journey!
... View more
07-15-2018
02:53 AM
@Michael Dennis "MD" Danang I think there may have been an issue generating the test data. It looks like some of the benchmark tables did not get generated correctly. Try the generation again.
... View more
04-11-2018
02:26 PM
I resolved the issue by using a different browser. The version of Internet Explorer I was using did not work. I resolved the issue by allowing the 8080 port through the windows firewall and then connecting remotely using Chrome.
... View more
04-10-2018
06:18 PM
I tried on Windows Server 2012 and had the same issue. I am creating my windows instance in AWS using the Windows Server with SQL Server.
... View more
04-10-2018
03:52 PM
@Wynner I see the same problem with both of those URLs.
... View more
04-10-2018
03:43 PM
2018-04-10 00:49:44,720 INFO [main] org.apache.nifi.web.server.JettyServer NiFi has started. The UI is available at the following URLs:
2018-04-10 00:49:44,720 INFO [main] org.apache.nifi.web.server.JettyServer http://127.0.0.1:8080/nifi
2018-04-10 00:49:44,720 INFO [main] org.apache.nifi.web.server.JettyServer http://172.31.56.60:8080/nifi
2018-04-10 00:49:44,722 INFO [main] org.apache.nifi.BootstrapListener Successfully initiated communication with Bootstrap
... View more
04-10-2018
02:57 PM
Nifi installed on Windows Server 2008 R2 using the Windows msi without any errors. The service starts up but when I attempt to open nifi in the Web browser using http://localhost:8080/nifi, I get the Nifi water drop but the canvas never appears. The error appears in the nifi user log. Do I need to enable HTTPS? 2018-04-10 14:52:07,061 INFO [NiFi Web Server-119] org.apache.nifi.web.filter.RequestLogger Attempting request for (<no user found>) POST http://localhost:8080/nifi-api/access/kerberos (source ip: 127.0.0.1)
2018-04-10 14:52:07,062 INFO [NiFi Web Server-119] o.a.n.w.a.c.IllegalStateExceptionMapper java.lang.IllegalStateException: Access tokens are only issued over HTTPS.. Returning Conflict response.
2018-04-10 14:52:07,235 INFO [NiFi Web Server-128] org.apache.nifi.web.filter.RequestLogger Attempting request for (<no user found>) POST http://localhost:8080/nifi-api/access/oidc/exchange (source ip: 127.0.0.1)
2018-04-10 14:52:07,236 INFO [NiFi Web Server-128] o.a.n.w.a.c.IllegalStateExceptionMapper java.lang.IllegalStateException: User authentication/authorization is only supported when running over HTTPS.. Returning Conflict response.
2018-04-10 14:52:07,842 INFO [NiFi Web Server-129] org.apache.nifi.web.filter.RequestLogger Attempting request for (anonymous) GET http://localhost:8080/nifi-api/flow/current-user (source ip: 127.0.0.1)
2018-04-10 14:52:08,418 INFO [NiFi Web Server-128] org.apache.nifi.web.filter.RequestLogger Attempting request for (anonymous) GET http://localhost:8080/nifi-api/flow/client-id (source ip: 127.0.0.1)
2018-04-10 14:52:08,419 INFO [NiFi Web Server-114] org.apache.nifi.web.filter.RequestLogger Attempting request for (anonymous) GET http://localhost:8080/nifi-api/flow/config (source ip: 127.0.0.1)
2018-04-10 14:52:08,422 INFO [NiFi Web Server-131] org.apache.nifi.web.filter.RequestLogger Attempting request for (anonymous) GET http://localhost:8080/nifi-api/flow/cluster/summary (source ip: 127.0.0.1)
2018-04-10 14:52:09,634 INFO [NiFi Web Server-45] org.apache.nifi.web.filter.RequestLogger Attempting request for (anonymous) GET http://localhost:8080/nifi-api/flow/banners (source ip: 127.0.0.1)
2018-04-10 14:52:11,046 INFO [NiFi Web Server-133] org.apache.nifi.web.filter.RequestLogger Attempting request for (anonymous) GET http://localhost:8080/nifi-api/flow/processor-types (source ip: 127.0.0.1)
2018-04-10 14:52:11,076 INFO [NiFi Web Server-128] org.apache.nifi.web.filter.RequestLogger Attempting request for (anonymous) GET http://localhost:8080/nifi-api/flow/about (source ip: 127.0.0.1)
2018-04-10 14:52:11,262 INFO [NiFi Web Server-130] org.apache.nifi.web.filter.RequestLogger Attempting request for (<no user found>) GET http://localhost:8080/nifi-api/access/config (source ip: 127.0.0.1)
2018-04-10 14:52:12,613 INFO [NiFi Web Server-131] org.apache.nifi.web.filter.RequestLogger Attempting request for (anonymous) GET http://localhost:8080/nifi-api/flow/controller-service-types (source ip: 127.0.0.1)
2018-04-10 14:52:15,440 INFO [NiFi Web Server-119] org.apache.nifi.web.filter.RequestLogger Attempting request for (anonymous) GET http://localhost:8080/nifi-api/flow/reporting-task-types (source ip: 127.0.0.1)
2018-04-10 14:52:26,157 INFO [NiFi Web Server-113] org.apache.nifi.web.filter.RequestLogger Attempting request for (anonymous) GET http://localhost:8080/nifi-api/flow/prioritizers (source ip: 127.0.0.1)
... View more
Labels:
- Labels:
-
Apache NiFi
09-21-2017
03:05 PM
@ketan kunde The github for cloudbreak is : https://github.com/hortonworks/cloudbreak HCP is based on Apache Metron and the Apache projects in HDP: https://github.com/apache/metron As @Graham Martin says, Smartsense and HDCloud are services.
... View more
07-21-2017
11:06 AM
@Dominika Bialek Thanks for checking in. I ended up switching regions.
... View more
07-17-2017
11:27 PM
I am having the same same issue. Were you able to find a resolution?
... View more
05-24-2017
06:50 PM
@jzhang @Palanivelrajan Chellakutty Can you please post the version of virtual box and the name of the sandbox file that you downloaded. Thanks!
... View more
05-24-2017
06:47 PM
@Palanivelrajan Chellakutty @jzhang Thanks for the heads up. I will check into it. In the meantime, please use the article.
... View more
05-23-2017
06:50 PM
@jzhang The sandbox has been updated since this issue was detected. It should be fine now. Are you having issues?
... View more
05-09-2017
09:35 PM
This was a problem with the 2.6 tech preview components. It is resolved in the latest version.
... View more
05-08-2017
11:26 PM
The source for the spark-csv library is in github. If you want to see how the real library works: https://github.com/databricks/spark-csv Here is a simple solution that will be good enough for the test: // define a case class to specify the schema for the columns in the CSV
case class Product(sku: String, description: String, countPerPack: Int)
// read the lines of the file as text
val rawTextRDD = sc.textFile("/user/zeppelin/test/products.csv")
// map each line from the raw text to a Product
val productsRDD = rawTextRDD.map{ raw_line =>
val columns = raw_line.split(",")
Product(columns(0), columns(1), columns(2).toInt)
}
// optional: convert the RDD to a data frame
val productDF = productsRDD.toDF
productDF.show
... View more
05-03-2017
04:45 PM
@Marton Sereg I was not using spot instances but perhaps the setting is not read properly?
... View more
05-02-2017
10:01 PM
@dbalasundaran HD Cloud 1.14 I am provisioning an HDP 2.6 cluster EDW Analytics with Hive LLAP and Zeppelin
... View more
04-29-2017
07:17 PM
I provisioned an HDP 2.6 cluster using HD Cloud. When I first start up the cluster I get a critical Ambari alert for HiveServer2 interactive. I tried restarting zookeeper and hive, restarting the master host, but I still get the error. Ambari is unable to make a connection to the interactive port of Hiveserver2: ExecutionFailed: Execution of '! beeline -u 'jdbc:hive2://ip-10-0-68-150.us-west-1.compute.internal:10500/;transportMode=binary' -e '' 2>&1| awk '{print}'|grep -i -e 'Connection refused' -e 'Invalid URL'' returned 1. Error: Could not open client transport with JDBC Uri: jdbc:hive2://ip-10-0-68-150.us-west-1.compute.internal:10500/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)
Error: Could not open client transport with JDBC Uri: jdbc:hive2://ip-10-0-68-150.us-west-1.compute.internal:10500/;transportMode=binary: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)
) Hiveserver2 interactive log has the following error indicating an issue with Zookeeper: 2017-04-29 18:38:05,175 INFO [main-SendThread(ip-10-0-68-150.us-west-1.compute.internal:2181)]: zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server ip-10-0-68-150.us-west-1.compute.internal/10.0.68.150:2181. Will not attempt to authenticate using SASL (unknown error) The zookeeper log contains the following error messages: 2017-04-29 18:40:05,624 - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x15bbae5b75a000a type:delete cxid:0x338 zxid:0x20e txntype:-1 reqpath:n/a Error Path:/registry/users/hive/services/org-apache-slider/llap0 Error:KeeperErrorCode = Directory not empty for /registry/users/hive/services/org-apache-slider/llap0
2017-04-29 18:40:05,992 - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x15bbb0169d00008 type:create cxid:0x1 zxid:0x213 txntype:-1 reqpath:n/a Error Path:/services Error:KeeperErrorCode = NodeExists for /services
2017-04-29 18:40:05,998 - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x15bbb0169d00008 type:create cxid:0x2 zxid:0x214 txntype:-1 reqpath:n/a Error Path:/services/slider Error:KeeperErrorCode = NodeExists for /services/slider
2017-04-29 18:40:05,999 - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x15bbb0169d00008 type:create cxid:0x3 zxid:0x215 txntype:-1 reqpath:n/a Error Path:/services/slider/users Error:KeeperErrorCode = NodeExists for /services/slider/users
2017-04-29 18:40:06,000 - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x15bbb0169d00008 type:create cxid:0x4 zxid:0x216 txntype:-1 reqpath:n/a Error Path:/services/slider/users/hive Error:KeeperErrorCode = NodeExists for /services/slider/users/hive
2017-04-29 18:40:06,043 - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x15bbb0169d00009 type:delete cxid:0x1 zxid:0x218 txntype:-1 reqpath:n/a Error Path:/registry/users/hive/services/org-apache-slider/llap0 Error:KeeperErrorCode = NoNode for /registry/users/hive/services/org-apache-slider/llap0
2017-04-29 18:40:21,698 - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x15bbb0169d0000a type:delete cxid:0x1 zxid:0x21d txntype:-1 reqpath:n/a Error Path:/registry/users/hive/services/org-apache-slider/llap0 Error:KeeperErrorCode = NoNode for /registry/users/hive/services/org-apache-slider/llap0
2017-04-29 18:40:21,728 - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x15bbb0169d0000a type:create cxid:0x2 zxid:0x21e txntype:-1 reqpath:n/a Error Path:/registry/users/hive/services/org-apache-slider Error:KeeperErrorCode = NodeExists for /registry/users/hive/services/org-apache-slider
2017-04-29 18:41:00,709 - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x15bbb0169d0000f type:create cxid:0x1 zxid:0x22e txntype:-1 reqpath:n/a Error Path:/hiveserver2-hive2 Error:KeeperErrorCode = NodeExists for /hiveserver2-hive2 Any help would be much appreciated.
... View more
- Tags:
- Data Processing
- hiveserver2
- Upgrade to HDP 2.5.3 : ConcurrentModificationException When Executing Insert Overwrite : Hive
Labels:
- Labels:
-
Apache Hive
04-29-2017
04:06 PM
2 Kudos
I am provisioning a cluster using version 1.14.0 of Hortonworks Data Cloud. The Autoscaling status slider is in the off position, but cluster provisioning fails with the following error on the auto scaling token: Infrastructure creation failed. Reason:
com.amazonaws.AmazonServiceException: The security token included in the
request is expired (Service: AmazonAutoScaling; Status Code: 403; Error
Code: ExpiredToken; Request ID: cf2aebb0-2cec-11e7-8985-4100dda4669c) The autoscaling tab on the failed cluster in the cloud controller, shows the cluster trying to autoscale to 3 nodes:
... View more
- Tags:
- aws
- Cloudbreak
- hdcloud
Labels:
- Labels:
-
Hortonworks Cloudbreak
04-25-2017
05:50 PM
6 Kudos
NOTE: This issue was found in the HDF Sandbox. A new Sandbox will be posted soon. Until then, please use the instructions below. After importing the HDP 2.6 Sandbox hosted in Oracle VirtualBox and starting the VM, I saw the message "Connectivity issues detected!" in the console and when I tried to connect to Ambari in the web browser, I was not able to connect. To correct the connectivity: 1. Go to the Oracle VM VirtualBox Manager. Select the sandbox virtual machine. Right click and select Close > Power off to shut down the VM. 2. Right click on the sandbox virtual machine and select Setting. The VM Settings dialog displays. 3. Click on Network from the list on the left side of the dialog. 4. Click on Advanced to unfold the Advanced network settings. 5. Check Cable Connected. 6. Click OK to save the setting. 7. Restart the VM. 8. You should now be able to connect to Ambari by entering the url http://127.0.0.1:8080 in your browser.
... View more
- Find more articles tagged with:
- hdp2.6
- Issue Resolution
- issue-resolution
- Sandbox
- Sandbox & Learning
Labels:
03-31-2017
12:59 PM
Metron supports 3 types of parsers: Grok, CSV and Java. For XML data Java is the best choice. You can see example parsers in the Metron github: https://github.com/apache/incubator-metron/tree/master/metron-platform/metron-parsers/src/main/java/org/apache/metron/parsers You could also use Nifi to convert the XML to JSON and enqueue the events to the enrichment topic. Here are some articles about parsing XML logs with Nifi: https://community.hortonworks.com/articles/25720/parsing-xml-logs-with-nifi-part-1-of-3.html
... View more
03-31-2017
01:20 AM
5 Kudos
Many of us in Hortonworks Community Connection feel most at home when we are talking about technologies and tools and the "animals in the zoo". However if we want to grow the data lake and gain support from the business we have to learn to think a little differently and use a new vocabulary to communicate. Start by meeting with the business to identifying possible use cases. Talk to the analysts about the highest priorities and pain points for the business. Before thinking about anything remotely Hadoop animal like, summarize "what" needs to be done. This may take several interviews with different business analysts to gain a full understanding of the problem. Then determine if Big Data can solve the problem. Are data silos preventing the organization from getting a complete view of the customer or logistics? Is the volume of data required to solve the problem too much or too expensive for existing systems to handle? Are the unstructured or semi-structured data required to solve the problem not working effectively in existing systems? If the answer to any of these questions is yes, then Big Data is likely a good fit. Next calculate the return of the solution to the business. Return can come from cost savings from increased efficiency or reduction in loss, increased sales resulting from improved customer satisfaction, or new revenue and growth from new data products. Then estimate the investment required for the solution. What are the costs of the development and infrastructure required for the solution? How much will it cost to operationalize the solution? How much will it cost to maintain the solution in coming years? The value of the solution is the return minus the investment. Project the figures out over several years. The first year the development, infrastructure, and operationalization costs will most likely be higher so the value will be lower. However if the maintenance costs are low, years two and three may have much higher value with lower investment. Let's look at some example use cases: 1. Customer 360 is bringing everything that the organization knows about the customer into the data lake. The insights gained from Customer 360 can reduce churn, improve customer loyalty and improve campaign effectiveness. The return is the estimate of increased sales due to reduced churn and better campaign performance. The investment is how much it costs to develop the Customer 360, the costs to obtain the data needed, the infrastructure and personnel required to run the system, and the training required to enable analysts to use it effectively. 2. Fraud detection is preventing loss due to theft. For example a retailer can flag fraudulent returns of stolen goods or detect theft of merchandise. The return is estimated by measuring the amount of loss that could be prevented and the investment is the costs to develop the system, the cost of the infrastructure and personnel to run the system, and the costs to deploy the system to stores. 3. Predictive maintenance optimizes downtime and reduces the cost of maintaining machinery in a factory or vehicles in a fleet. Predictive maintenance uses algorithms that look at the historical failure of parts and the operating conditions of the machines and determines what maintenance needs to be done and when. The return of predictive maintenance is calculated by the reductions in downtime or breakdowns and the savings in parts and labor of only doing maintenance when it is indicated by the operating conditions. How much does a breakdown or downtime cost? Will the contents of the vehicle be lost if the vehicle is down for a lengthy period of time? How much is lost in sales when a delivery is not completed? How much is spent on maintenance and what is the cost of preventable maintenance? The investment is the cost to collect of the machinery or vehicle information, the cost to develop the algorithms and the infrastructure needed to collect and process the machine or fleet data. Examine the results of the use case discovery and build a roadmap that shows which use cases will be implemented and when the implementation will start and end. Create a map of the use cases on two dimensions: value and difficulty of implementing. Start with the high value use cases that are easy to implement. Save the higher value but more difficult to implement use cases for later in the road map. Your team will be more experienced and better able to tackle these use cases. Communicate the road map to the business in terms of the value and investment required. Don't dive into too many technical details. Keep it high level and focus on the what and the why. When you start executing on your use cases don't forget to measure. Tracking your actual return and investment will help you realize the value the solutions and improve your estimation skills going forward.
... View more
Labels:
03-30-2017
08:13 PM
8 Kudos
Sometimes data in the system expires because it is no longer correct or the data was rented for a specific time period. One way to implement data expiration requirements is to delete the data after it is no longer valid. However you may also have another policy that requires retention of the data to track how decisions were made or for compliance with regulations. In addition deleting the data is more error prone to implement because an administrator must track a task in the future to delete the data after it expires. If the task is missed and the data is not deleted, expired data or illegal data could lead to incorrect decisions or lapses in compliance. This article shows an example of specifying the expiration date for a Hive table in Atlas and creating a tag based policy that prevents access of the table after the expiration date. Enabling Atlas in the Sandbox
1. Create a Hortonworks HDP 2.5 Sandbox You can use either a Virtual Machine or a host in the cloud. 2. In the browser, enter the Ambari url (http://<sandbox host name>:8080) 3. Log in as user name raj_ops with password raj_ops 4. Atlas and its related services are stopped by default in the Sandbox. Follow the instructions in section 1 of the Tag Based Policies Tutorial to start the following services and turn off maintenance mode: Kafka, Ranger Tag Sync, HBase, Ambari Infrastructure, and Atlas. Wait for the operations to complete and all services to show green. Be sure to start Atlas last. For example, if HBase is not running, Atlas will not start properly and remain red in Ambari after it is started. Creating a Hive finance Database and tax_2015 Table 5. First we will create a new Hive database and a new table. Then we will apply a Ranger policy to the table that causes it to expire and demonstrate that only specific users can access the table. 6. Click on the grid icon at the top right side of the window near the user name. 7. Select the Hive View menu option. The Hive View, a GUI interface for executing queries, appears. 8. In the Worksheet in the Query Editor, enter the following Hive statements: CREATE DATABASE finance;
DROP TABLE IF EXISTS finance.tax_2015;
CREATE TABLE finance.tax_2015(ssn string,
fed_tax double,
state_tax double,
local_tax double) STORED AS ORC;
INSERT INTO TABLE finance.tax_2015 VALUES ('123-45-6789',22575,5750,2375);
INSERT INTO TABLE finance.tax_2015 VALUES ('234-56-7890',31114,8765,2346);
INSERT INTO TABLE finance.tax_2015 VALUES ('345-67-8901',35609,10123,3421);
9. Click the Execute button. The Execute button will turn orange with the label Stop Execution and will return to green with the label Execute when the statements complete. Verifying Both maria_dev and raj_ops Users can Access tax_2015 Table 10. Once the Execute button is green you should see the finance database appear in the Database Explorer on the left side of the screen. 11. Click finance database. The tax_2015 table will appear. 13. We will now verify that maria_dev can also access the table. In the upper right corner pull down the menu with the user name (raj_ops). 14. Select Sign Out. The login screen will appear. Log in using user maria_dev and password maria_dev. 15. Select the tile icon and open the Hive View. Repeat the sample query for the tax_2015 table in the finance database. Verify that the query completes and maria_dev has access to the tax_2015 table. Creating Tag Service and Expires on Tag Based Policy 16. Sign out of Ambari and log in again using user raj_ops and password raj_ops. 17. We will now create a tag based policy in Ranger to deny access to expired data. First we need to add a Tag service. 18. Click Dashboard at the top of the window. 19. Click on Ranger in the list of services. 20. Select Quick Links -> Ranger Admin UI. 21. Enter the user name raj_ops with password raj_ops. Pull down the Access Manager menu and select Tag Based Policies. 22.. If you don’t have a Sandbox_tag service already, select the + button to add a new Service. 23. Enter Sandbox_tag in the Service Name field and click Add. 24. We will now associate the new Tag Service with resource service for hive. Even if you already had a Sandbox_tag service, complete the next steps to verify that the Sandbox_tag service is associated with the Sandbox_hive service. If the tag service is not associated, tag based policies will not function properly. 25. Pull down the Access Manager menu and select Resource Based Policies. 26. Click on the pencil button to the right of the Sandbox_hive service. The Edit Service form appears. 27. Select Sandbox_tag from the Select Tag Service. 28. Click Save to save the changes to the hive service. 29. Pull down the Access Manager menu and select Tag Based Policies. 30. Click on the Sandbox_tag link. 31. An EXPIRES_ON policy is created by default. 32. Click on the Policy ID column for the EXPIRES_ON policy. By default all users are denied access to data after it expires. 33. We will now add a policy that allows raj_ops to access the expired data. Scroll down to the Deny Conditions and click show to expand the Exclude from Deny Conditions region. 34. Select raj_ops in Select User. 35. Click the + icon in the Policy Conditions column. 36. Enter yes in the Accessed after expiry_date. Click the green check icon to save the condition. 37. Click the plus button in the Component Permissions column. 38. Select hive from components and check hive to permit all hive operations. Click the green check button to save the Component Permissions. 39. The Deny and Exclude from Deny Conditions should look like the ones below. Everyone except raj_ops is denied access to all expired tables: 40. Click the green Save button at the bottom of the policy to save the policy. Setting the Expiration Date for the tax_2015 Table by applying the EXPIRES_ON Tag 41. Return to Ambari. Log in with user name raj_ops and password raj_ops. Click on Dashboard at the top. Then select Atlas from the left. Then select Quick Links > Atlas Dashboard. 42. The Atlas login appears. Enter the user holger_gov and the password holger_gov. 43. Click on the Tags tab on the left side of the screen. 44. Click on Create Tag. The Create a new tag dialog opens. 45. In the Name field, enter EXPIRES_ON. Click the Create button. 46. Click on the ADD Attribute+ button for the EXPIRES_ON tag. 47. In the Attribute name field enter expiry_date. Click the green Add button. 48. Click on the Search tab. 49. Toggle right to select DSL. Select hive_table from Search For drop down. Click the green Search button. 50. Locate the tax_2015 table. Click on the + in the Tags column. The Add Tag dialog appears. 51. Select EXPIRES_ON from the drop down. 52. Set the expiry_date attribute to 2015/12/31 Then click the green Add button. Verifying raj_ops can Access tax_2015 but maria_dev can't 53. Return to the Ambari Hive View and log in as raj_ops. 54. Enter the query below in the Query Editor: select * from finance.tax_2015; 55. Click the green Execute button. 56. The query succeeds without error and the results appear in the bottom of the window. 57. Sign out of Ambari and log back in as maria_dev. 58. Enter the same query in the Query Editor: select * from finance.tax_2015; 59. The following error is reported: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [maria_dev] does not have [SELECT] privilege on [finance/tax_2015/fed_tax,local_tax,ssn,state_tax] Inspecting the Audit Log to See Denial by EXPIRES_ON Rule 60. To see which policy caused the error. Return to Ranger and log in as raj_ops. 61. Click on Audit. Then click on the Access tab. 62. Click in the filter to select SERVICE TYPE Hive and RESULT Denied. 63. If you click on the Policy ID link, you will see the policy that caused the denial is the EXPIRES_ON policy. Conclusion This article shows how to create a tag based policy using Atlas and Ranger that prevents access to data after a specified date. Data expiration policies make it easier to comply with regulations and prevents errors caused by using out of date tables.
... View more
- Find more articles tagged with:
- Atlas
- Governance & Lifecycle
- How-ToTutorial
- Ranger
- tag-based-security
Labels:
03-21-2017
07:49 PM
@Raj B It does work eventually. However maybe something about the configuration is making it slow? What should I check?
... View more
03-21-2017
01:05 AM
I set up a sandbox in Azure and a separate Nifi installation on a different host. I added a PutHiveQL processor on my flow: The Queue on the flow before CreateTableInHive says it has one item in it but when I right click and list the queue it is empty. The SQL does eventually get executed because I can see my table in hive. I don't see a red error on the processor indicating an error but I do see the following warning in the nifi-app.log: 2017-03-21 00:23:21,553 WARN [Timer-Driven Process Thread-9] o.a.thrift.transport.TIOStreamTransport Error closing output stream.
java.net.SocketException: Socket closed
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118) ~[na:1.8.0_121]
at java.net.SocketOutputStream.write(SocketOutputStream.java:155) ~[na:1.8.0_121]
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[na:1.8.0_121]
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[na:1.8.0_121]
at java.io.FilterOutputStream.close(FilterOutputStream.java:158) ~[na:1.8.0_121]
at org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110) ~[hive-exec-1.2.1000.2.5.0.0-1245.jar:1.2.1000.2.5.0.0-1245]
at org.apache.thrift.transport.TSocket.close(TSocket.java:235) [hive-exec-1.2.1000.2.5.0.0-1245.jar:1.2.1000.2.5.0.0-1245]
at org.apache.thrift.transport.TSaslTransport.close(TSaslTransport.java:402) [hive-exec-1.2.1000.2.5.0.0-1245.jar:1.2.1000.2.5.0.0-1245]
at org.apache.thrift.transport.TSaslClientTransport.close(TSaslClientTransport.java:37) [hive-exec-1.2.1000.2.5.0.0-1245.jar:1.2.1000.2.5.0.0-1245]
at org.apache.hive.jdbc.HiveConnection.close(HiveConnection.java:708) [hive-jdbc-1.2.1000.2.5.0.0-1245.jar:1.2.1000.2.5.0.0-1245]
at org.apache.commons.dbcp.DelegatingConnection.close(DelegatingConnection.java:247) [commons-dbcp-1.4.jar:1.4]
at org.apache.commons.dbcp.PoolableConnection.reallyClose(PoolableConnection.java:122) [commons-dbcp-1.4.jar:1.4]
at org.apache.commons.dbcp.PoolableConnectionFactory.destroyObject(PoolableConnectionFactory.java:628) [commons-dbcp-1.4.jar:1.4]
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1178) [commons-pool-1.5.4.jar:1.5.4]
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106) [commons-dbcp-1.4.jar:1.4]
at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044) [commons-dbcp-1.4.jar:1.4]
at org.apache.nifi.dbcp.hive.HiveConnectionPool.getConnection(HiveConnectionPool.java:288) [nifi-hive-processors-1.1.0.2.1.2.0-10.jar:1.1.0.2.1.2.0-10]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_121]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_121]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_121]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
at org.apache.nifi.controller.service.StandardControllerServiceProvider$1.invoke(StandardControllerServiceProvider.java:177) [nifi-framework-core-1.1.0.2.1.2.0-10.jar:1.1.0.2.1.2.0-10]
at com.sun.proxy.$Proxy83.getConnection(Unknown Source) [na:na]
at org.apache.nifi.processors.hive.PutHiveQL.onTrigger(PutHiveQL.java:152) [nifi-hive-processors-1.1.0.2.1.2.0-10.jar:1.1.0.2.1.2.0-10]
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) [nifi-api-1.1.0.2.1.2.0-10.jar:1.1.0.2.1.2.0-10]
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1099) [nifi-framework-core-1.1.0.2.1.2.0-10.jar:1.1.0.2.1.2.0-10]
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136) [nifi-framework-core-1.1.0.2.1.2.0-10.jar:1.1.0.2.1.2.0-10]
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) [nifi-framework-core-1.1.0.2.1.2.0-10.jar:1.1.0.2.1.2.0-10]
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132) [nifi-framework-core-1.1.0.2.1.2.0-10.jar:1.1.0.2.1.2.0-10]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_121]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_121]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_121]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_121]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
2
... View more
- Tags:
- Data Ingestion & Streaming
- NiFi
- Upgrade to HDP 2.5.3 : ConcurrentModificationException When Executing Insert Overwrite : Hive
Labels:
- Labels:
-
Apache NiFi