About aervits

aervits · ‎09-07-2018

This is a short how-to leveraging Zeppelin and Solr's native SQL capabilities to query the Ranger audit logs in real time. The capability to query Ranger audits has been in existence for quite a while and there are multiple articles available demonstrating how to apply a Hive external table on top of Ranger audits stored in HDFS. This article demonstrates how to leverage Zeppelin and Solr SQL to query Solr in real time without additional step of creating an external table on top of the HDFS audit. First thing you need is access to your Solr instance. I'm using the default instance packaged with Ambari infra. The Solr admin UI is available at the following address: http://{ambari-infra-ip}:8886. In the UI, you can issue arbitrary queries using standard Solr syntax. I am new to Solr and found the query syntax cumbersome. Instead, I decided to leverage Solr SQL, available as of version 6. HDP 3.0 ships with Solr 7.3. The next step is to set up Zeppelin interpreter for Solr via JDBC. Steps for doing that are available on the Solr website and I'm going to summarize the minimum required configuration for HDP 3. Feel free to copy and modify the properties below: default.driver : org.apache.solr.client.solrj.io.sql.DriverImpl default.url : jdbc:solr://{ambari-infra-ip}:2181/infra-solr?collection=ranger_audits default.user : solr In the artifacts section, add the following entry org.apache.solr:solr-solrj:7.3.1 Be mindful of the port for Zookeeper quorum for the Ranger Solr collection. I found the information browsing the ZK CLI shell. /usr/hdp/current/zookeeper-client/bin/zkCli.sh [zk: localhost:2181(CONNECTED) 4] ls /infra-solr/collections/ranger_audits So once you enter that information into Zeppelin interpreter, you can now use the %solr command to browse Ranger audits with SQL, just add a new note with Solr interpreter selected. Notice I am using all of the standard fields in Ranger audit, you can find an older version of the schema at the following link. I say older because in HDP 3, Ranger supports multiple clusters and additional fields identifying separate clusters are available, by I digress. The query above will show all current events where result, i.e. access is denied. This is really convenient because you don't need to apply schema and data is available in real time. You can build powerful reporting capabilities on top of what is available in Ranger Admin UI, (in case your question was why even doing that if that info was available already via Ranger). Finally, once you press execute, the data will be shown below: You can now add more plotting libraries and built-in Zeppelin charting capabilities to make very powerful dashboards!

aervits · ‎08-29-2017

Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html I get a lot of questions about doing distcp and figured I'd write yet another article in the series on WFM. There's a common assumption that FS action should be able to do a copy within a cluster. Unfortunately it's not obvious that you can leverage distcp action to do a copy within a cluster instead. The reason behind FS action missing copy functionality is that copy is not meant to be distributed and will DOS your Oozie server until the action completes. What you need to do is use distcp action as it's meant to do distributed operations and it being decoupled from Oozie launcher will complete w/out DOS. The functionality is the same even with naming convention being a bit off. We're going to start with adding a new workflow and naming it distcp-wf. Now we're going to add distcp node to the flow. I prefer to name the nodes something other than default so I'll name it distcp_example and hit the gear button to configure it. Now in distcp arguments field, I'm going to use Oozie XML variable replacement to add the full HDFS path of the source and target, which happen to be in the same cluster. They could might as well be two separate clusters. Now if you're familiar with how Oozie and Mapreduce works, you're quickly going to realize that this workflow will only run once and fail second time around. The reason is that my destination never changes and if output exists, you're going to get a failure on the next run. For that, we're going to add a prepare action to delete destination file/directory. Copy the second argument to clipboard. Paste it into advanced properties and change mkdir drop-down to delete. We're almost ready to submit our workflow; I first have to create an HDFS directory (distcp-wf) that will contain my distcp workflow and file I'd like copied. hdfs dfs -mkdir distcp-wf hdfs dfs -touchz file hdfs dfs -ls Found 4 items drwx------ - centos hdfs 0 2017-08-29 14:35 .Trash drwx------ - centos hdfs 0 2017-08-29 14:33 .staging drwxr-xr-x - centos hdfs 0 2017-08-29 14:35 distcp-wf -rw-r--r-- 3 centos hdfs 10 2017-08-29 01:26 file Now I'm ready to save and submit my workflow, enter the HDFS path of the workflow directory you just created notice the job properties have the fully-expanded nameNode and resourceManager addresses, that's what is being used for variable substitution. Now I am going to submit the job and and use filtering in the dashboard for the name of the workflow. Now let's switch back to the distcp action as I'd like to demonstrate a few other things about distcp that you can leverage. If you refer to distcp user guide you notice that there are many arguments we didn't cover like -append, -update etc. What if you would like to use them in your distcp? Well WFM has got you covered, the eagle-eyed users would see the tool-tip the first time we tried to configure distcp action node and see that you can pass the arguments in the same field as source and destination. So in addition to the two arguments, I'm going to add -update and -skipcrccheck in front of the existing ones. My workflow XML should now look like so So when I execute with new arguments, everything should still be green. On a side note, our documentation team has done a phenomenal job adding resources to our WFM section. I encourage everyone interested in WFM to review. The caveats with distcp is that in some cases you cannot do distcp via Oozie from secure to insecure and vice versa. There are parameters you have to specify to make it work in some cases but overall it is not supported in heterogeneous clusters. Other issues crop up when you distcp from HA enabled clusters. You have to specify the nameservices for both clusters. Please leverage HCC to find resources how to get that working. Hope this was useful!

aervits · ‎06-30-2017

@riyer I'd avoid going against HBase with Hive. Generating a snapshot is so trivial that you should consider going that route first. On average, going against a snapshot should be 2.5x times better than going against HBase directly.

aervits · ‎03-16-2017

@Sam Pat first of all thanks for checking out my article, I see you have company reference in your error message, please edit your comment and remove it. Secondly, can you run your python script w/out Oozie? I have a feeling you're trying to execute a Python 2 script with Python3 as default interpreter. You should add the interpreter line to your script and try again. Take a look at my scripts I have a version for Python 2 #! /usr/bin/env python and Python 3 #! /usr/bin/env /usr/local/bin/python3.3 If your cluster has Python 3 installed, make sure it's across the whole cluster and has the same path. If it's Python2 then also make sure every node is configured correctly with the location of the interpreter.

aervits · ‎03-03-2017

@Venkat Ranganathan from my experience, I think the goal was achieved. I love this product, planning to write more parts once blockers are addressed.

aervits · ‎02-28-2017

1. Yes, you have to go to host and decommission RS first, it will put HBase RS into drain mode, then you can continue with decommissioning DN. 2. RS can live without DN but due to data locality it is best to coexist 3.see #1 https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_administration/content/ch_slave_nodes.html

aervits · ‎02-26-2017

@Ali Mohammadi Shanghoshabad it is important that you click the checkbox for capture-output, your xml when you preview should look like so Here is that checkbox below, look for it in the shell action.

aervits · ‎02-23-2017

Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html Part 12: https://community.hortonworks.com/articles/131389/apache-ambari-workflow-manager-view-for-apache-ooz-7.html In the last tutorial I created a coordinator called part-10-coord. I'm going to use it in this tutorial to create a bundle. I'm personally new to bundles and only discovered them reviewing WFM. You can learn more about bundles here https://oozie.apache.org/docs/4.2.0/BundleFunctionalSpec.html Bundles are designed to make working with coordinators easier and managing coordinators on more holistic level. Bundle is a higher-level oozie abstraction that will batch a set of coordinator applications. The user will be able to start/stop/suspend/resume/rerun in the bundle level resulting a better and easy operational control. More specifically, the oozie Bundle system allows the user to define and execute a bunch of coordinator applications often called a data pipeline. There is no explicit dependency among the coordinator applications in a bundle. However, a user could use the data dependency of coordinator applications to create an implicit data application pipeline. Let's go to the top right hand corner and click on create, this time selecting bundle as choice. You're now prompted to enter coordinator information. Click on Add Coordinator button and fill out with existing coordinator information giving the full path of the coordinator XML file. If you provide a full path to the coordinator XML, coordinator name will be populated on its own. If your data pipeline consists of many coordinators, you can chain them here by adding more coordinators and their paths. Since my pipeline consists of only one coordinator, (yes not really useful, though I can see how it can be useful when you have multiple), I'm going to click on green Add button to finish. Last thing left to do is enter kick off time. It expects a date, if none given, it will default to NOW, which means it will kick off immediately once submitted. Bundle Application Definition A bundle definition is defined in XML by a name, controls and one or more coordinator application specifications: name: The name for the bundle job. * controls: The control specification for the bundle. kick-off-time: It defines when the bundle job should start and submit the coordinator applications. This field is optional and the default is NOW that means the job should start right-a-way. coordinator: Coordinator application specification. There should be at least one coordinator application in any bundle. name: Name of the coordinator application. It can be used for referring this application through bundle to control such as kill, suspend, rerun. app-path: Path of the coordinator application definition in hdfs. This is a mandatory element. configuration: A hadoop like configuration to parameterize corresponding coordinator application. This is optional. Finally, I'm going to rename the bundle workflow to part-10-bundle and submit it, notice I saved it to /user/centos/part-10 along with existing workflow called part-10 and coordinator called part-10-coord. All three XML files will be in the part-10 directory for organization purposes, though not required. Same as with workflows and coordinators, I can see my bundles run on the Dashboard. The configuration options change a bit and I no longer see an action tab, I see a coordinator tab. It also shows all my running coordinators that belong to the bundle. The definition tab shows all required properties for bundle to run. WFM makes it easy to fill out the properties and you're no longer required to maintain a job.properties file. Last thing I want to do is show you XML generated for this bundle. This tutorial just goes to show you how easy it is to start learning Oozie nomenclature, before my experience with WFM, I did not know how to work with bundles, decision nodes, SLA features, etc. WFM makes working with Oozie more approachable. Until next time!

aervits · ‎02-23-2017

Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html Part 12: https://community.hortonworks.com/articles/131389/apache-ambari-workflow-manager-view-for-apache-ooz-7.html In this tutorial, I will cover how to create Oozie coordinators via Workflow Manager View. I'm also going to leverage Publish/Import Asset functionality demonstrated in my last tutorial only now using local database rather than HDFS. We're going to publish the two actions nodes from part 9 (shell action doing echo and email action) using the next to last icon when you click on the action node. So just like I did in part 9 to publish an asset to HDFS, same steps except there's no HDFS path. You're greeted with dialog to give asset a name and description. I already published shell action to assets database in the same manner. I'd like to glance at what assets I have in my local Ambari Views instance. To do that, I'm going to click on manage assets button in the top right hand corner. You're going to see a list of any saved assets so far. In the same manner, you can delete assets by hitting red trash icon to the right of them. Asset manager also allows you to search through all saved assets. Keep in mind that local asset database is exactly that, local, it is not being shared across instances of Ambai Views nodes. For that, please use publish/import from HDFS functionality just like in part 9. We're ready to tie it all together, we're going to create a new workflow, name it part-10, then begin to add new nodes, though now instead of adding pre-existing nodes, we're going to click on import asset You'll get a pop-up to select an asset from asset manager. Click on it and hit import. Now we're ready to import 2nd asset Select the asset and import it. Your workflow should look like so I gave the action nodes more meaningful names. We pretty much built this wf from the wf in part 9 using publish/import assets. We can now submit the job. (The path mistakenly points to /user/centos/email, I then submitted this workflow and saved path to /user/centos/part-10). Great, now we know it works, we're ready to create a coordinator workflow. On the WFM page, in the right top hand corner, find create button and select coordinator. You'll be prompted to fill out the details. This beats working with XML as all I need to do is fill out 5 fields and I have a working coordinator for an existing workflow, by clicking the button next to the browse, you get an option to create a brand new workflow, since we already have one, we're going to enter the path of the existing one. I'm ready to submit the coordinator. I prefer to save coordinator and workflow in the same directory, though my screenshots do not show that, I chose /user/centos/part-10 as the HDFS path for both workflow and coordinator in my recent work. This is what my directory looks like hdfs dfs -ls part-10 Found 2 items -rw-r--r-- 3 centos hdfs 364 2017-02-23 17:58 part-10/coordinator.xml -rw-r--r-- 3 centos hdfs 971 2017-02-23 17:09 part-10/workflow.xml Let's preview coordinator XML. Ignore the app-path in my XML, I have two one in /user/centos/email and another one in /user/centos/part-10. I grabbed the wrong screenshot :). Let's look at our coordinator running and this will allow me to demonstrate some more cool features of WFM like chaining of search tags. Let's click on dashboard button and see our workflow in action, notice the clock icon to the left of it to identify it as part of a coordinator. This still shows workflows, if you click on the left of the page where workflow drop down is and select coordinator instead, you can see only coordinators. It makes it easy to filter out coordinators from workflows and as you see soon bundles by toggling the drop down to select the type of job you're looking for. Here's an example of what a coordinator triggered every 5min will do to your dashboard. Another cool feature in the dashboard is multi-tag search. In Oozie UI, you can click on name and it will sort ASC/DESC, here we can filter out using pre-defined tags instead to narrow down output to what's relevant. Notice I added name filter. What if I also want to filter by status:SUCCEEDED and not just name of wf? I can also add more tags, now I want to also filter out only running workflows. Now I want to also filter by user name The other available options are below Finally, since my coordinator is configured to execute a workflow called part-10 every 5min, I'm getting a lot of emails every time it runs and succeeds. I want to kill the coordinator. I can do it directly from the dashboard. To the right of the running coordinator, I have an option to kill highlighted in red, click that. once clicked, coordinator goes into Killed stated.

aervits · ‎02-23-2017

@Anders Boje yes that is a Workflow Manager view but an old version which is not mean to be production ready. Workflow Manager View AKA Oozie View will be available in Ambari 2.5. Ambari 2.5 is not released yet, it will be released in a month or two. As you noted, there are bugs and we are still working through fixing some issues so it is stable. My tutorials do cover a lot of WFM but the ideas can be applied to working with Oozie XML. You should just wait for Ambari 2.5 release if this is more what you want to do.

Online	Offline
Last Visited	‎08-15-2019 06:35 AM

Member Since	‎10-01-2015 11:46 AM
Last Visited	‎08-15-2019 06:35 AM
Posts	3,933
Kudos received	1067

Cloudera Community

Re: Hbase Region Server and Data Node Decommission

Monitoring Apache Ranger Audit with Apache Solr SQ...

Apache Ambari Workflow Manager View for Apache Ooz...

Re: working with HBase and Hive (WIP)

Re: Oozie Python workflow example walkthrough

Re: Apache Ambari Workflow Manager View for Apache...

Re: Hbase Region Server and Data Node Decommission

Re: Apache Ambari Workflow Manager View for Apache...

Apache Ambari Workflow Manager View for Apache Ooz...

Apache Ambari Workflow Manager View for Apache Ooz...

Re: Apache Ambari Workflow Manager View for Apache...