About drewski7

drewski7 · ‎05-22-2024

I have a cloudera cluster up and running. I have Knox which forwards requests to webhbase and hbase uses ranger for authorization. Ranger is connected to FreeIPA LDAP and we use kerberos internally for authentication. In Ranger, I have a policy that gives access to read a table. In that same ranger policy a group in my freeIPA instance has the ability to access that table. I am having an issue. When I remove a member from that group in freeIPA and rerun ranger user sync, I try a curl call to get data from the table it still works. However when I run it again it gives the expected result as a denial. This has been a consistent pattern for all changes to user membership in ldap groups. The Ranger HBase Policy Sync is working as expected and the Usersync is working as expected as well. After I run Usersync, I confirm that the user has been removed from that group in Ranger but it's still allowing me access. Does anyone know why this is? There are similar Kafka and HDFS policies where I try access those resources as well using the same group and they work the first time but for HBase, it's taking two calls for it to work correctly. Any help would be greatly appreciated!

drewski7 · ‎10-11-2023

Maybe you could reset the state via NiFi-REST-API at beginning of your flow or separately on a schedule cron every morning Could be: POST "https://[ip:port]/nifi-api/processors/${processor-id}/state/clear-requests" This is the request which NiFi itself uses when you go to the processor in the UI and choose the menu-option "view state" -> "clear state".

drewski7 · ‎09-26-2023

@Abhiram-4455 What's the input look like?

drewski7 · ‎06-28-2023

I am looking in the Kafka policies in my current Ranger Instance. There is a policy called "service_all - cluster". When I look here are the two allow conditions for this policy - However, when I run this API call to get all the policies for kafka and search for the "service_all - cluster" this is result - <policies> <id>11</id> <guid>dbbd8ed1-2bc6-452d-991e-28082727e3cf</guid> <isEnabled>true</isEnabled> <version>1</version> <service>cm_kafka</service> <name>service_all - cluster</name> <policyType>0</policyType> <policyPriority>0</policyPriority> <description>Service Policy for all - cluster</description> <isAuditEnabled>true</isAuditEnabled> <resources> <entry> <key>cluster</key> <value> <values>*</values> <isExcludes>false</isExcludes> <isRecursive>false</isRecursive> </value> </entry> </resources> <policyItems> <accesses> <type>configure</type> <isAllowed>true</isAllowed> </accesses> <accesses> <type>describe</type> <isAllowed>true</isAllowed> </accesses> <accesses> <type>kafka_admin</type> <isAllowed>true</isAllowed> </accesses> <accesses> <type>create</type> <isAllowed>true</isAllowed> </accesses> <accesses> <type>idempotent_write</type> <isAllowed>true</isAllowed> </accesses> <accesses> <type>describe_configs</type> <isAllowed>true</isAllowed> </accesses> <accesses> <type>alter_configs</type> <isAllowed>true</isAllowed> </accesses> <accesses> <type>cluster_action</type> <isAllowed>true</isAllowed> </accesses> <accesses> <type>alter</type> <isAllowed>true</isAllowed> </accesses> <accesses> <type>publish</type> <isAllowed>true</isAllowed> </accesses> <accesses> <type>consume</type> <isAllowed>true</isAllowed> </accesses> <accesses> <type>delete</type> <isAllowed>true</isAllowed> </accesses> <users>cruisecontrol</users> <users>streamsmsgmgr</users> <users>kafka</users> <delegateAdmin>true</delegateAdmin> </policyItems> <policyItems> <accesses> <type>describe</type> <isAllowed>true</isAllowed> </accesses> <users>rangerlookup</users> <delegateAdmin>false</delegateAdmin> </policyItems> <serviceType>kafka</serviceType> <options/> <zoneName/> <isDenyAllElse>false</isDenyAllElse> </policies> Here you can see there are 3 extra accesses given called publish, consume, delete that aren't showing up in the user interface. Yesterday I did a whole reimport of all the policies for Kafka and it fixed the issue but after a restart of ranger this happened again. I checked the underlying database and it's consistent with the User Interface but again the API call is adding those three extra accesses. Does anyone know what happens after a restart that is causing the API call to differ from the User Interface?

drewski7 · ‎06-21-2023

@cotopaul - It's taking in JSON and writing to Parquet and only doing literal value replacements (ie. adding 5 fields to each record). 3 of those fields is just adding in attribute values and literal values to each record and the other two is doing minor date manipulation (ie converting dates to epoch).

drewski7 · ‎06-21-2023

@steven-matison - Thanks for response. If we were to just scope it to looking at the UpdateRecord processor for example, are there any things from an infrastrucutre or configuration stand point you know of to make it more efficient assuming that I can't scale up or tune processor concurrency?

drewski7 · ‎06-20-2023

I have been using multiple record oriented processors ConvertRecord, UpdateRecord, etc. in various parts of my flow. For example, my UpdateRecord processor takes about 16 seconds to read in a 30MB flowfile, add some fields to each record and convert that data to parquet. I want to improve performance such that this takes a lot less time. My infrastructure that I am working on currently is a 2 node e2-standard-4 cluster in GCP with a Centos7 operating system. These two instances have 4 vCPUs and 16 GB RAM and for each repository (content, provenance, flowfile) I have separate SSD persistent drives. A lot of the configs in NiFi are the defaults but what recommendations would anyone recommend either from an infrastructure or NiFi standpoint to improve performance on these processors.

drewski7 · ‎06-19-2023

@Ray82 Are you updating the same file that you are reading from?

drewski7 · ‎06-16-2023

@bhadraka What version of NiFi are you using? In NiFi 1.20.0, you can use ReplaceText Processor after reading in the file. Using the line-by-line evaluation mode, there is a drop down "Except-Last-Line". You could then configure it to just replace all previous lines with empty strings. Here's a screenshot of my ReplaceText processor properties.

drewski7 · ‎06-16-2023

@Ray82 - I tested out @SAMSAL solution quickly and it worked for me. Make sure in QueryRecord you are referencing the right attribute names.

Online	Offline
Last Visited	‎02-15-2025 02:12 PM

Member Since	‎06-16-2020 09:10 AM
Last Visited	‎02-15-2025 02:12 PM
Posts	53
Kudos received	14

Cloudera Community

Re: Apache NiFi - JOLT Expression: Flatten Hierar...

Re: getting null from JoltTransformJson in nifi wh...

Re: Nifi: InvokeHttp process for all 4xx HTTP stat...

Re: Count number of records before and after flowf...

Re: [APACHE NIFI] How to extract only the last rec...

HBase Authorization Issue

Re: Reset the Filename sequence no./counter every...

Re: I have to convert csv file to nested json can ...

Ranger API Conflicting Data with UI

Re: NiFi Optimization Record Oriented Processors

Re: NiFi Optimization Record Oriented Processors

NiFi Optimization Record Oriented Processors

Re: Loop and SplitJson iteration

Re: [APACHE NIFI] How to extract only the last rec...

Re: Help with UpdateRecord or QueryRecord