Member since
03-01-2016
45
Posts
78
Kudos Received
9
Solutions
06-02-2017
03:12 PM
4 Kudos
With the latest release of Apache NiFi 1.2.0 the JoltTransformJson Processor
became a bit more powerful with an upgrade to the Jolt library (to version 0.1.0)
and the introduction of expression language (EL) support. This now provides users the ability to create
dynamic specifications for JSON transformation and to perform some data
manipulation tasks all within the context of the processor. Internal caching
has also been added to improve overall performance.
Let’s take an example of transformation Twitter json payload
seen below:
{"created_at":"Wed Mar 29 02:53:48 +0000 2017","id":846918283102081024,"id_str":"846918283102081024","text":"CSUB falls to Georgia Tech 76-61 in NIT semifinal game. @Bakersfieldcali @BVarsityLive @CSUBAthletics @CSUB_MBB\u2026 https:\/\/t.co\/9e5dQesIbg","display_text_range":[0,140],"source":"\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated":true,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":2918922812,"id_str":"2918922812","name":"Felix Adamo","screen_name":"tbcpix","location":"Bakersfield Californian","url":null,"description":"Newspaper Photographer","protected":false,"verified":false,"followers_count":677,"friends_count":247,"listed_count":12,"favourites_count":1366,"statuses_count":3576,"created_at":"Thu Dec 04 18:46:27 +0000 2014","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/570251877397180416\/jL2kuB4f_normal.png","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/570251877397180416\/jL2kuB4f_normal.png","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/2918922812\/1483041284","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"extended_tweet":{"full_text":"CSUB falls to Georgia Tech 76-61 in NIT semifinal game. @Bakersfieldcali @BVarsityLive @CSUBAthletics @CSUB_MBB @csubnews https:\/\/t.co\/yV2AHFdVLc","display_text_range":[0,121],"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"Bakersfieldcali","name":"The Bakersfield Cali","id":33055408,"id_str":"33055408","indices":[56,72]},{"screen_name":"BVarsityLive","name":"BVarsityLive","id":762418351,"id_str":"762418351","indices":[73,86]},{"screen_name":"CSUBAthletics","name":"CSUB Athletics","id":51115996,"id_str":"51115996","indices":[87,101]},{"screen_name":"CSUB_MBB","name":"\ud83c\udfc0CSUB Men's Hoops\ud83c\udfc0","id":2897931481,"id_str":"2897931481","indices":[102,111]},{"screen_name":"csubnews","name":"CSU Bakersfield","id":209666415,"id_str":"209666415","indices":[112,121]}],"symbols":[],"media":[{"id":846918121248047104,"id_str":"846918121248047104","indices":[122,145],"media_url":"http:\/\/pbs.twimg.com\/media\/C8Dbi0rUwAAiffu.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/C8Dbi0rUwAAiffu.jpg","url":"https:\/\/t.co\/yV2AHFdVLc","display_url":"pic.twitter.com\/yV2AHFdVLc","expanded_url":"https:\/\/twitter.com\/tbcpix\/status\/846918283102081024\/photo\/1","type":"photo","sizes":{"medium":{"w":1200,"h":608,"resize":"fit"},"large":{"w":2048,"h":1038,"resize":"fit"},"small":{"w":680,"h":345,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"}}},{"id":846918179397906433,"id_str":"846918179397906433","indices":[122,145],"media_url":"http:\/\/pbs.twimg.com\/media\/C8DbmNTVMAEvpd3.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/C8DbmNTVMAEvpd3.jpg","url":"https:\/\/t.co\/yV2AHFdVLc","display_url":"pic.twitter.com\/yV2AHFdVLc","expanded_url":"https:\/\/twitter.com\/tbcpix\/status\/846918283102081024\/photo\/1","type":"photo","sizes":{"large":{"w":2048,"h":1213,"resize":"fit"},"medium":{"w":1200,"h":711,"resize":"fit"},"small":{"w":680,"h":403,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"}}}]},"extended_entities":{"media":[{"id":846918121248047104,"id_str":"846918121248047104","indices":[122,145],"media_url":"http:\/\/pbs.twimg.com\/media\/C8Dbi0rUwAAiffu.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/C8Dbi0rUwAAiffu.jpg","url":"https:\/\/t.co\/yV2AHFdVLc","display_url":"pic.twitter.com\/yV2AHFdVLc","expanded_url":"https:\/\/twitter.com\/tbcpix\/status\/846918283102081024\/photo\/1","type":"photo","sizes":{"medium":{"w":1200,"h":608,"resize":"fit"},"large":{"w":2048,"h":1038,"resize":"fit"},"small":{"w":680,"h":345,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"}}},{"id":846918179397906433,"id_str":"846918179397906433","indices":[122,145],"media_url":"http:\/\/pbs.twimg.com\/media\/C8DbmNTVMAEvpd3.jpg","media_url_https":"https:\/\/pbs.twimg.com\/media\/C8DbmNTVMAEvpd3.jpg","url":"https:\/\/t.co\/yV2AHFdVLc","display_url":"pic.twitter.com\/yV2AHFdVLc","expanded_url":"https:\/\/twitter.com\/tbcpix\/status\/846918283102081024\/photo\/1","type":"photo","sizes":{"large":{"w":2048,"h":1213,"resize":"fit"},"medium":{"w":1200,"h":711,"resize":"fit"},"small":{"w":680,"h":403,"resize":"fit"},"thumb":{"w":150,"h":150,"resize":"crop"}}}]}},"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/9e5dQesIbg","expanded_url":"https:\/\/twitter.com\/i\/web\/status\/846918283102081024","display_url":"twitter.com\/i\/web\/status\/8\u2026","indices":[113,136]}],"user_mentions":[{"screen_name":"Bakersfieldcali","name":"The Bakersfield Cali","id":33055408,"id_str":"33055408","indices":[56,72]},{"screen_name":"BVarsityLive","name":"BVarsityLive","id":762418351,"id_str":"762418351","indices":[73,86]},{"screen_name":"CSUBAthletics","name":"CSUB Athletics","id":51115996,"id_str":"51115996","indices":[87,101]},{"screen_name":"CSUB_MBB","name":"\ud83c\udfc0CSUB Men's Hoops\ud83c\udfc0","id":2897931481,"id_str":"2897931481","indices":[102,111]}],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1490756028329"}
In our case we want to accomplish several things when transforming this data in JoltTransformJson:
Create a subset of json data that contains id,
tweet text, in reply to fields and a new flow_file_id field
Match the “id” variable in the twitter payload based
on flow file variable and convert that
to a new label (tweet_id)
Set my tweet text to all lower case Set some default values for in reply to fields
that are null
Add flow file unique id to json data
Once the data has been transformed it will land on the file system as well as within a Mongo db repository.
Basic Flow of Twitter Data Transformation and Storage
Here's a close up of the specification in use:
[{
"operation": "shift",
"spec": {
"${id.var}": "tweet_id",
"text": "tweet_text",
"in_reply_to_*": "&"
}
},{
"operation": "modify-overwrite-beta",
"spec": {
"tweet_text": "=toLower"
}
},{
"operation": "modify-default-beta",
"spec": {
"~in_reply_to_status_id": 0,
"~in_reply_to_status_id_str": "",
"~in_reply_to_user_id": "",
"~in_reply_to_user_id_str": 0,
"~in_reply_to_screen_name": ""
}
},{
"operation": "default",
"spec":{
"flow_file_id" : "${uuid}"
}
}]
In the above you’ll see we’ve this accomplished with a
chain specification containing four operations (shift, modify-overwrite,
modify-default,
and default). The shift
helps to define the fields needed for the final schema and translates those
fields into new labels. Note the shift’s
specification uses expression language on the left side (${id.var}) that will
evaluate to a value populated by the UpdateAttribute processor (this value could
also be populated from the Variable Registry). The Jolt library will then
attempt to match that value to the corresponding label in the incoming json
data and change it to the new label (in this case “tweet_id”) on the
right.
The next operation uses modifier-overwrite to ensure that
for all the tweet text coming in we apply the Jolt lower case function to that
data. We then use a modifier-default operation that applies default values to the
in_reply_to fields if those values are null.
Finally we use a basic default operation to create the new flow_file_id
field by applying expression language on the right of the field name to
dynamically create the flow file id entry.
JoltTransformJson Advanced UI with Chain Specification
New Test Attributes Modal for testing Expression Language used in Specifications
The Advanced UI (shown above) has also been enhanced to allow testing of
specifications with expression language (specifically to provide test
attributes that need to be resolved during testing). This gives users greater insight into how a
flow will behave without relying on any external dependencies such as flow file
attributes or variable registry entries.
Example of Transformed JSON (shown in Provenance)
Looking to give this a try? Feel free to download the example template on
GitHub Gist
here and import it into NiFi. The template includes the
specification described above which you can tweak and test out various
scenarios. Also if you have any questions about transforming JSON in Apache NiFi with
Jolt please comment below or reach out to the community on the Apache NiFi mailing list.
... View more
Labels:
10-11-2016
12:44 PM
15 Kudos
Credits to @mgilman for contributing to this article:
Since the introduction of NiFi 1.0.0 administrators have a greater ability to manage policy through the addition of Ranger Integration and a more granular authorization model. This article provides a guide for those looking to define and manage NiFi policies in Ranger. To learn more on configuring NiFi to use Ranger via Ambari please review the parent article HDF 2.0 - Integrating Secured NiFi with Secured Ranger for Authorization Management.
Behind the scenes NiFi uses a REST based API for all user interaction; therefore resource based policies are used in Ranger to define users' level of permissions when executing calls against these REST endpoints via NiFi's UI . This allows administrators to define policies by selecting a NiFi resource/endpoint, choosing whether users have Read or Write (Modify) permissions to that resource, and selecting users who will be granted the configured permission. For example the image below shows a policy in Ranger where a user is granted the ability to View Flows in NiFi’s interface. This was configured by selecting /flow as the authorized resource and granting the selected user the Read permission to that resource.
Example of Global Level Policy Definition with Kerberos User Principal
Policies can be created that will apply authorizations to features at a global level or on a specific component level in NiFi. The following describes the policies that can be defined in Ranger using a combination of the indicated NiFi Resource and Permission (Read or Write).
Global Policies:
Policy
Privilege
NiFi Resource
Permission(s)
View the user Interface
Allows users to view the user interface
/flow
Read
Access the controller
Allows users to view/modify the controller, including Reporting Tasks, Controller Services, and clustering endpoints. Explicit access to reporting tasks and controller services can be overridden
/controller
Read (for View)
Write (for Modify)
Query Provenance
Allows users to submit a Provenance Search and request Event Lineage. Access to actual provenance events or lineage will be based off the data policies of the components that generated the events. This simply allows the user to submit the query.
/provenance
Read
Access Users/User Groups
Allows users to view/modify users and user groups
/tenants
Read (View User/Groups)
Write (Modify Users/Groups)
Retrieve Site-To-Site Details
This policy should be granted to other NiFi systems (or Site-To-Site clients) in order to retrieve the listing of available ports (and peers when using HTTP as the transport protocol). Explicit access to individual ports is still required to see and initiate Site-To-Site data transfer.
/site-to-site
Read (Allow retrieval of data)
View System Diagnostics
This policy should be granted in order to retrieve system diagnostic details including JVM memory usage, garbage collection, system load, and disk usage.
/system
Read
Proxy User Requests
This policy should be granted to any proxy sitting in front of NiFi or any node in the cluster that will be issuing requests on a user's behalf.
/proxy
Write (granted on Node Users defined in Ranger)
Access Counters
This policy should be granted to users to retrieve and reset counters. This policy is separated from each individual component has the counters can also be rolled up according to type.
/counters
Read (Read counter information)
Write (Reset Counters)
Note: Setting authorizations on the /policy resource is not applicable when using Ranger since NiFi’s policy UI is disabled when Ranger Authorization enabled.
Component Policies
Component Level policies can be set in Ranger on individual components on the flow within a process group or on the entire process group (with the root process group being the top level for all flows). Most components types (except for connections) can have a policy applied directly to it. For example the image below demonstrates a policy defined for a specific processor instance (noted by the unique identifier included in the resource url) which grants Read and Modify permissions to the selected user.
Example of Component Level Policy for Kerberos User Principal If no policy is available on the specific component then it will look to the parent process group for policy information. Below are the available resources for components where a specific policy can be applied to an instance of that component. Detailed information on component descriptions can be found in NiFi Documentation.
Component Type
Resource (Rest API)
Description (from NiFi Documentation)
Controller Service
/controller-services
Extension Point that provides centralized access data/configuration information to other components in a flow
Funnel
/funnels
Combine data from several connections into one connection
Input Port
/input-ports
Used to receive data from other data flow components
Label
/labels
Documentation for flow
Output Port
/output-ports
Used to send data to other data flow components
Processor
/processor
NiFi component that pulls data from or publishes to external sources, or route, transforms or extracts information from flow files.
Process Group
/process-groups
An abstract context used to group multiple components (processors) to create a sub flow. Paired with input and output ports process groups and be used to simplify complex flows and use logical flows
Reporting Task
/reporting-tasks
Runs in the background and provides reporting data on NiFi instance
Template
/templates
Represents a predefined dataflow available for reuse within NiFi. Can be imported and exported
The following table describes the types of policies that can be applied to the previously mentioned components. Note: UUID is the unique identifier of an individual component within the flow.
Policy
Description
REST API
Read or Update Component
This policy should be granted to users for retrieving component configuration details and modifying the component.
Read/Write on:
/{resource}/{uuid}
e.g.
/processor/{uuid}
View Component Data or Allow Emptying of Queues and Replaying
This policy should be granted to users for retrieving or modifying data from a component. Retrieving data will encompass listing of downstream queues and provenance events. Modifying data will encompass emptying of downstream queues and replay of provenance events. Additionally, data specific endpoints will require every link in the request chain is authorized with this policy. Since they will be traversing each link, we need to ensure that each proxy is authorized to have the data.
Read/Write on:
/data/{resource}/{uuid}
Write Receive Data, Write Send Data
These policies should be granted to other NiFi instances and Site-To-Site clients that will be sending/receiving data from the specified port. Once a client has been added to a port specific Site-To-Site policy, that client will be able to retrieve details about this post and initiate a data transfer. Additionally, data specific endpoints will require every link in the request chain is authorized with this policy. Since the will be traversing each link, we need to ensure that each proxy is authorized to have the data.
Write on:
/data-transfer/input-ports/{uuid}
/data-transfer/output-ports/{uuid}
For more information on Authorization configuration with Ranger and NiFi please review
http://bryanbende.com/development/2016/08/22/apache-nifi-1.0.0-using-the-apache-ranger-authorizer https://community.hortonworks.com/articles/57980/hdf-20-apache-nifi-integration-with-apache-ambarir.html
... View more
10-05-2016
02:52 PM
15 Kudos
UPDATE: This article has been vetted against HDF 2.0 - HDF 3.2. Minor updates have been made for additional clarity on use of NiFi CA for establishing trust with Ranger.
Prerequisites
NiFi has been installed running with SSL enabled (with certificates manually installed or using the NiFi Certificate Authority). You will need the keystore/truststore names, locations, aliases, identity (DN) and passwords used when enabling SSL for NiFi. Ensure that all nodes have the same keystore/truststore passwords applied and like named locations in order to apply ranger nifi plugin configurations consistently via Ambari. Ranger has been installed and configured with security enabled. For instructions on setting up SSL for Ranger please review Configure Ambari Ranger SSL . Note the name, location, aliases, identity (DN) and passwords used when creating keystores and truststores for Ranger If Kerberos will be used recommend that this is enabled for the HDF cluster before proceeding.
Part 1 - Establishing trust between NiFi nodes and Ranger
In order for NiFi nodes to communicate over SSL with Ranger, and Ranger to communicate with secured NiFi, certificates should be imported from the Ranger host to NiFi nodes and vice versa. In these instructions we will use the same keystore/truststore used to secure Ranger in order to communicate with NiFi; however it is possible to also generate additional keystore/truststores that are dedicated solely to NiFi communication.
1. Create certificate files from Ranger’s keystore – Use the following command to generate a certificate file:
{java.home}/bin/keytool -export -keystore {ranger.keystore.file} -alias {keystore-alias} -file {cert.filename}
Example:
/usr/jdk64/jdk1.8.0_77/bin/keytool -export -keystore /etc/security/certs/ranger/ranger-admin-keystore.jks -alias rangeradmin -file /etc/security/certs/ranger/ranger-admin-trust.cer
2. Import the generated Ranger certificate file into the trust stores for all nifi nodes in the cluster:
{java.home}/bin/keytool -import -file {ranger.cert.filename} -alias {ranger.keystore.alias} -keystore {nifi.node.ssl.truststore} -storepass {nifi.node.ssl.truststore.password}
Example:
/usr/jdk64/jdk1.8.0_77/bin/keytool -import -file /etc/security/certs/ranger/ranger-admin-trust.cer -alias rangeradmin -keystore /usr/hdf/current/nifi/conf/truststore.jks –storepass {nifi.truststore.password}
3. Create certificate files for import into Ranger’s trust store. There are two ways to approach this:
a) If NiFi Certificate Authority is in use, a certificate from the CA can be generated and imported into Ranger's trust store using the following steps:
i) Create a certificate file from NiFi-CA using command below:
{java.home}/bin/keytool -export -keystore {nifi-ca.keystore.file} -alias {nifi-ca.keystore-alias} -file {nifi-ca-cert.filename}
ii) Import the NiFi CA certificate into Ranger's truststore* using the below command:
{java.home}/bin/keytool -import -file {nifi-ca.cert.filename} -alias {nifi-ca.keystore.alias} -keystore {ranger.ssl.truststore} -storepass {ranger.ssl.truststore.password}
b) If an external CA or self signed certificates are used and manual keystores and truststores were provided for each NiFi node then perform the following:
i) Create certificate files from each nifi node's keystore using the command below:
{java.home}/bin/keytool -export -keystore {nifi.keystore.file} -alias {nifi.keystore-alias} -file {cert.filename}
ii) Import the nifi certificate files into Ranger's truststore*(Repeat for each cert generated. Remember any duplicate aliases may need to be changed using changealias command before importing new ones.)
{java.home}/bin/keytool -import -file {nifi.cert.filename} -alias {nifi.keystore.alias} -keystore {ranger.ssl.truststore} -storepass {ranger.ssl.truststore.password}
*NOTE truststore used by Ranger may be default truststore cacerts file located under {java_home}/jre/lib/security/cacerts*
Part 2 – Enabling Ranger NiFi Plugin via Ambari
Enabling the Ranger-NiFi Plugin should lead to Ambari creating a Service Repository entry in Ranger which will store information for Ranger to communicate with NiFi and store the authorized identity of the NiFi node[s] that will communicate with Ranger.
From the Ambari UI perform the following steps:
1. Under the Ranger configuration section go to the “Ranger Plugin” tab and switch the NiFi Ranger Plugin toggle to “ON”. When prompted save the configuration.
2. If Ranger Auditing will be used, under the Ranger configuration section go to the “Ranger Audit” tab and, if not already enabled, switch the “Audit to Solr” toggle to “On”. This will produce options to enter connection properties for a Solr instance. To use with Ambari Infra (Internal SolrCloud) enable the “SolrCloud” toggles to “ON” as well. Ambari will pre-populate the zookeeper connection string values and credentials. If an External Solr is used the connection values will need to be provided. When prompted save the configuration.
3. Under the NiFi configuration screen go to the ranger-nifi-plugin-properties section. This section stores all the information needed to support Ranger communication with NiFi (to retrieve NiFi REST endpoint data) .
Complete the following in the ranger-nifi-plugin-properties section:
a) Confirm that “Ranger repository config password” and “Ranger repository config user” are pre-populated. These values are set by default by Ambari and refer to Ranger’s admin user and password
b) Authentication - Enter “SSL” if not already detected and pre-populated by Ambari. This will indicate to Ranger that NiFi is running with SSL
c) Keystore for Ranger Service Accessing NiFi - Provide the keystore filename with location path that Ranger will use for SSL communications with NiFi. This should correspond to the keystore used to generate a certificate in Part 1, Step 1.
d) Keystore password - Enter the password for the above keystore
e) Keystore Type – Enter the keystore type for the provided keystore (e.g. JKS)
f) Truststore for Ranger Service Accessing NiFi – Enter the filename with location path of the truststore for the Ranger service
g) Truststore password – Enter the password for the above truststore
h) Truststore type – Enter the truststore type for the provided truststore (e.g. JKS)
i) Owner for Certificate – Enter the identity (Distinguished Name or DN) of the certificate used by ranger
j) Policy user for NiFi – This should be set by default to the value “nifi”
k) Enable Ranger for NiFi – This should be checked (enabled to true)
4. Next go to the ranger-nifi-policymgr-ssl section. This section stores the information NiFi will use to communicate with the secured Ranger service.
Complete the following in the ranger-nifi-policymgr-ssl section:
a) owner.for.certificate – Enter the identity (Distinguished Name or DN) of the nifi node(s) that will communicate with Ranger. Referring to multiple nodes identities this value use regex by adding a regex prefix along with the expression (E.g.: CN=regex:ydavis-kb-ranger-nifi-demo-[1-9]\.openstacklocal, OU=NIFI to match multiple DN using 1 through 9). This value is not required if Kerberos is enabled on HDF. Update: This regular expression feature will be available in HDF 2.0.1.
b) xasecure.policymgr.clientssl.keystore – Enter the keystore location and filename that NiFi will use to communicate with Ranger. This keystore reference should be the same file used to create and import a certificate into Ranger. (Ensure that for multi-node cluster this keystore location is consistent across all nifi node hosts)
c) xasecure.policymgr.clientssl.keystore.credential.file – This value is populated by default and is used by the plugin to generate a file to store credential information. No change to this value is required.
d) xasecure.policymgr.clientssl.truststore – Enter the truststore location and filename that NiFi will use to communicate with Ranger.
e) xasecure.policymgr.clientssl.truststore.credential.file - This value is populated by default and is used by the plugin to generate a file to store credential information. No change to this value is required.
f) xasecure.policymgr.clientssl.truststore.password – Enter the password for the provided truststore file.
5. The other two sections for Ranger NiFi plugin (ranger-nifi-security and ranger-nifi-audit) do not require additional configuration however can be reviewed for the following:
Confirm the following in ranger-nifi-audit section:
a) Audit to SOLR is enabled (if Ranger Audit was enabled in Part 2, Step 2)
b) xasecure.audit.destination.solr.urls is completed (if an external Solr instance was referenced in Step 2)
c) xasecure.audit.destination.solr.zookeepers is completed and matches the connection string (if Ambari Infra or external SolrCloud was enabled in Step 2)
d) xasecure.audit.is.enabled is set to true
Confirm the following in the ranger-nifi-security section:
a) ranger.plugin.nifi.policy.rest.ss.config.file is set to ranger-policymgr-ssl.xml
b) ranger.plugin.nifi.policy.rest.url refers to the ambari variable for Ranger service {{policy_mgr_url}} (any replacement here means that a Ranger service external to the HDF installation is the target)
6. Save all NiFi configuration changes
7. Restart all required services and ensure that Ambari indicates services have restarted successfully
Part 3 – Confirm Ranger Configuration and Setting up Policies
1. Go to the Ranger Admin UI and using the Access Manager menu select “Resource Based Policies”. Confirm that an entry for NiFi exists in the NiFi Service Manager. The entry name is dynamically created based on the Ambari cluster name (see example below).
2. Select the edit button next to the service repository entry and confirm that the properties from the ranger-nifi-plugin-properties are accurately populated. Also confirm the NiFi URL provided (usually 1 node is used)
3. Confirm that the commonNameForCertificate value is the CN value from the Owner for Certificate property from ranger-nifi-plugin-properties.
4. Using the Ranger Menu go to the “Audit” screen and select the plugin tab. You should see one or more entries from each node in the cluster showing NiFi syncing with ranger policies.
5. If not using Usersync in Ranger, manually create new users in Ranger which correspond to the authentication method used to secure NiFi. For example when using Kerberos Authentication in NiFi ensure that the users created match with the Kerberos principal.
To create a user perform the following tasks:
a) In the Ranger Admin go to Settings menu and select “User/Groups”
b) Click the “Add New User” button
c) Complete the User Detail screen providing the User Name as the identity for the appropriate NiFi authentication method (e.g. Client DN, LDAP DN or Kerberos principal). Password and First Name is required by Ranger but is not used by NiFi. The Role selected should be User (groups are not used by the plugin at this time)
d) Save the new user and repeat for any other users who need access to NiFi
6. Users entries must also be created for each node in the NiFi cluster. Repeat the “Add User” step; however for User Name provide the Distinguished Name for each node as shown below:
7. In order for NiFi nodes to be authorized to communicate within the cluster a Proxy policy should be created. In the Ranger Access Manager menu select “Resource Based Policies” and select the NiFi service repository entry link. On the policy screen select the “Add New Policy” button
8. Provide the following for Policy Details:
a) Policy Name – provide a name for the policy (e.g. proxy)
b) NiFi Resource Identifier – Enter “/proxy” for NiFi’s proxy endpoint
c) Audit Logging – should be set to yes if logging was previously enabled
d) Allow conditions section – In the “Select User” field choose each NiFi node user that was previously created. For the Permissions field select “Write”.
e) Add the new policy
9. Once these authorizations have been created, it is now possible to confirm that Ranger can communicate with NiFi (attempting to do so before nodes were authorized would result in a communication error). Go to the Access Manager menu, select “Resource Based Policies” and select the edit button next to the NiFi service repository entry link.
10. Click on the Test Connection button located just below the Config Properties entries. Ranger should be able to Connect to NiFi successfully.
11. Now configure other user policies for accessing NiFi. To configure a NiFi admin/super user (or admin rights), a user can be added to the all – nifi-resource policy that was created by default. In the Ranger Access Manager menu select “Resource Based Policies” and select the NiFi service repository entry link. Then select the edit button next to the “all – nifi-resource” entry.
12. In the Allow Conditions section select the user(s) which will be applied to this policy. Also add both Read and Write permissions.
13. Save the policy with the new settings and confirm that the configured user can access NiFi with given rights by logging into NiFi on a node. Repeat login on each node in cluster to confirm policy is applied throughout.
14. Confirm that login access as well as proxy communication of nodes were audited in Ranger using the Audit screen and navigating to the “Access” tab
At this point Ranger can be used to administer policy for NiFi.
Troubleshooting
If there are problems with NiFi communication with Ranger, review the xa_secure.log (on Ranger's installation) as well as NiFi's nifi-app.log to determine the source of the issue. Many times this is due to certificates not being imported into Ranger's truststore or, if kerberos was not enabled, the commonNameForCertificate (in the NiFi service repository entry in Ranger) value is inaccurate.
If there are problems with Ranger communication with NiFi this also could be due to certificates not being imported into NiFi nodes or the Ranger certificate not appropriately being identified. The in addition to the previously mentioned logs the nifi-user.log will be useful to review as well.
... View more
09-19-2016
11:15 PM
10 Kudos
NiFi has previously supported the ability to refer to flow
file attributes, system properties and environment properties within expression
language (EL); however the community requested an enhancement to also support
custom properties. This would give users even more flexibility either in
processing, handling flow content, or even in flow configuration (e.g.
referring to a custom property in EL for connection, server or service
properties).
In NiFi versions 0.7 & 1.0.0 an enhancement was added to
allow administrators to define custom property files on nodes within their
cluster and configure NiFi with their location so those properties could be
loaded and available within EL. A new
field in the nifi.properties file (
nifi.variable.registry.properties)
is available for an administrator to set the paths of one or more custom
properties files for use by NiFi.
Figure 1
- Custom Properties reference in nifi.properties
Once the nifi.properties file is updated custom attributes can
be used as needed. NOTE: custom properties should contain distinct property
values in order to ensure they won’t be overridden by other property files or
by existing environment, system or flow file attributes.
For demonstration I have a flow that demonstrates use of
custom properties in EL with UpdateAttributes processor and the PutFile
processor
Figure 2
- Test Flow Writing Custom Attribute Data
Figure 3
- UpdateAttribute Advanced Configuration
Figure 4
- PutFile Config Screen with Directory using Custom Property in Expression
The output of this flow saves attributes created from custom
property values to a folder location that is also defined by a custom property.
This custom properties enhancement sets the stage for
developing a richer Variable Registry that will provide even more flexibility
in custom property management providing UI driven administration, variable
scope management and more.
For testing the flow in the above example, a template and referenced properties are
available here: https://gist.github.com/YolandaMDavis/364307c1ab5fe89b2edcef5647180873
... View more
Labels:
07-13-2016
03:53 AM
9 Kudos
One of the great things about Apache NiFi is its ability to consume and transform many formats of data, however a particular area of complexity has been around the transformation of JSON data received (i.e. JSON-to-JSON transformation). Take the case of the GetTwitter processor that provides access to Twitter data streams (either from a filtered search, sample set, or firehose endpoints). For anyone who has worked with Twitter’s JSON schema it is very rich with detailed information for an individual tweet. There are cases where a lot of this data isn’t required for analytics and data flow managers or analysts are looking to pare it down to the necessities. There are also instances where incoming JSON data simply needs to be formatted or re-labeled for use in another system or repository, such as Hive, HBase, or MongoDB.
Outside of NiFi, JSON-to-JSON transformation has been simplified in the
Jolt Java library which offers a declarative approach to defining JSON output. Jolt provides a set of transformation types, each with their own DSL (called specifications), that define the new structure for outgoing JSON data. Prior to NiFi 0.6.1 @Matt Burgess wrote a great article on how to incorporate the use of Jolt in NiFi via ExecuteScript, which by itself is a cool processor to use when you need to extend the capabilities of NiFi. The Apache Community saw an opportunity to pair the ease of use of Jolt with the power of NiFi by introducing the JoltTransformJSON as a standard processor in the upcoming version 0.7.
JoltTransformJSON will be included as part of the standard set of processors allowing NiFi users to easily add, validate, and test Jolt specifications for JSON data flow content. A simple configuration option is found under the properties tab on the processor, which provides Jolt’s existing options for transformation types and a field to enter the JSON specification for the selected transformation.
For those looking for more options to validate and test specifications the Advanced button provides access to a rich configuration UI that will allow users to do JSON and Jolt validation (against the selected transformation) as well as transformation testing with example input.This UI helps to give users a bit more assurance of the outcome of JSON data before actually applying it to the flow.When using either the simple or advanced flow if invalid specifications are saved then NiFi’s will do it’s usual work of notifying users of any errors associated with the processor's configuration.
Keep a look out for the JoltTransformJSON processor in the next release of NiFi 0.7. Or if you’re looking to get your hands dirty and try it out now you can download and compile NiFi source via the
github mirror. To test out the above data flow you can get a template on GitHub Gist here and import it into NiFi. Also here is a Gist with example specifications to try. Remember that you'll need to configure the GetTwitter processor with your own keys/access tokens first and make sure that PutFile processors are set with a destination. For more insight on using this processor (or working with the example flow) check out the video below:
This processor also has a community driven roadmap for growth with work in progress for custom transformation support and even more flexibility potential extensions for expression languages.
Have any questions about transforming JSON in NiFi with Jolt? Please feel free to comment below or reach out to the community on the
Apache NiFi mailing list.
... View more
Labels: