About Shelton

Shelton · ‎06-05-2025

@sydney- The SSL handshake error you're encountering is a common issue when connecting NiFi instances to NiFi Registry in secure environments it indicates that your NiFi instances cannot verify the SSL certificate presented by the NiFi Registry server. javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider. certpath.SunCertPathBuilder Exception: unable to find valid certification path to requested target Based on your description, there are several areas to address. The certificate used by NiFi Registry is self-signed or not issued by a trusted Certificate Authority (CA) The certificate chain is incomplete The truststore configuration is incorrect 1. Certificate Trust Configuration Verify Certificate Chain: # Check if certificate is in NiFi truststore (repeat for each instance) keytool -list -v -keystore /path/to/nifi/truststore.jks -storepass [password] # Check if certificate is in Registry truststore keytool -list -v -keystore /path/to/registry/truststore.jks -storepass [password] # Verify the Registry's certificate chain openssl s_client -connect nifi-registry.example.com:443 -showcerts Ensure Complete Certificate Chain: Add the Registry's complete certificate chain (including intermediate CAs) to NiFi's truststore Add NiFi's complete certificate chain to the Registry's truststore # Add Registry certificate to NiFi truststore keytool -import -alias nifi-registry -file registry-cert.pem -keystore /path/to/nifi/conf/truststore.jks -storepass [password] # Add NiFi certificate to Registry truststore keytool -import -alias nifi-prod -file nifi-cert.pem -keystore /path/to/registry/conf/truststore.jks -storepass [password] 2. Proper Certificate Exchange Ensure you've exchanged certificates correctly export NiFi Registry's public certificate keytool -exportcert -alias nifi-registry -keystore /path/to/registry/keystore.jks -file registry.crt -storepass [password] Import this certificate into each NiFi instance's truststore keytool -importcert -alias nifi-registry -keystore /path/to/nifi/truststore.jks -file registry.crt -storepass [password] -noprompt 3. NiFi Registry Connection Configuration In your NiFi instance (nifi.properties), verify # Registry client properties nifi.registry.client.name=NiFi Registry nifi.registry.client.url=https://nifi-registry.example.com/nifi-registry nifi.registry.client.timeout.connect=30 secs nifi.registry.client.timeout.read=30 secs Verify these configuration files in NiFi (production/development) # nifi.properties: nifi.registry.client.ssl.protocol=TLS nifi.registry.client.truststore.path=/path/to/truststore.jks nifi.registry.client.truststore.password=[password] nifi.registry.client.truststore.type=JKS In NiFi Registry # nifi-registry.properties: nifi.registry.security.truststore.path=/path/to/truststore.jks nifi.registry.security.truststore.password=[password] nifi.registry.security.truststore.type=JKS 4. LDAP Configuration For your LDAP integration issues in authorizers.xml ensure you have <accessPolicyProvider> <identifier>file-access-policy-provider</identifier> <class>org.apache.nifi.registry.security.authorization.FileAccessPolicyProvider</class> <property name="User Group Provider">ldap-user-group-provider</property> <property name="Authorizations File">./conf/authorizations.xml</property> <property name="Initial Admin Identity">cn=admin-user,ou=users,dc=example,dc=com</property> <property name="NiFi Identity 1">cn=dev-nifi,ou=servers,dc=example,dc=com</property> </accessPolicyProvider> In the authorizations.xml add appropriate policies for the dev-nifi identity <policy identifier="some-uuid" resource="/buckets" action="READ"> <user identifier="dev-nifi-uuid"/> </policy> 5. Proxy Configuration For proxy user requests, add in nifi.properties nifi.registry.client.proxy.identity=cn=dev-nifi,ou=servers,dc=example,dc=com 6. Restart Order After making changes restart the Nifi instance in the below order NiFi Registry first Then restart all NiFi instances Happy hadoping

Shelton · ‎06-04-2025

@hegdemahendra This is a classic case of off-heap memory consumption in NiFi. The 3G you see in the GUI only represents JVM heap + non-heap memory, but NiFi uses significant additional memory outside the JVM that doesn't appear in those metrics. Next time could you share your deployment YAML files that would help with solutioning Root Causes of Off-Heap Memory Usage: Content Repository (Primary Culprit) NiFi uses memory-mapped files for the content repository Large FlowFiles are mapped directly into memory This memory appears as process memory but not JVM memory Provenance Repository Uses Lucene indexes that consume off-heap memory Memory-mapped files for provenance data storage Native Libraries Compression libraries (gzip, snappy) Cryptographic libraries Network I/O libraries Direct Memory Buffers NIO operations use direct ByteBuffers Network and file I/O operations Possible Solutions: 1. Reduce JVM Heap Size # Instead of 28G JVM heap, try: NIFI_JVM_HEAP_INIT: "16g" NIFI_JVM_HEAP_MAX: "16g" This leaves more room (24G) for off-heap usage. 2. Configure Direct Memory Limit Add JVM arguments: -XX:MaxDirectMemorySize=8g 3. Content Repository Configuration In nifi.properties: # Limit content repository size nifi.content.repository.archive.max.retention.period=1 hour nifi.content.repository.archive.max.usage.percentage=50% # Use file-based instead of memory-mapped (if possible) nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository 4. Provenance Repository Tuning # Reduce provenance retention nifi.provenance.repository.max.storage.time=6 hours nifi.provenance.repository.max.storage.size=10 GB Long-term Solutions: 1. Increase Pod Memory Limit resources: limits: memory: "60Gi" # Increase from 40G requests: memory: "50Gi" 2. Monitor Off-Heap Usage Enable JVM flags for better monitoring: -XX:NativeMemoryTracking=summary -XX:+UnlockDiagnosticVMOptions -XX:+PrintNMTStatistics 3. Implement Memory-Efficient Flow Design Process smaller batches Avoid keeping large FlowFiles in memory Use streaming processors where possible Implement backpressure properly 4. Consider Multi-Pod Deployment Instead of single large pod, use multiple smaller pods: # 3 pods with 20G each instead of 1 pod with 40G replicas: 3 resources: limits: memory: "20Gi" Monitoring Commands: # Check native memory tracking kubectl exec -it <nifi-pod> -- jcmd <pid> VM.native_memory summary # Monitor process memory kubectl top pod <nifi-pod> # Check memory breakdown kubectl exec -it <nifi-pod> -- cat /proc/<pid>/status | grep -i mem Start with reducing JVM heap to 16G and implementing content repository limits. This should immediately reduce OOM occurrences while you plan for longer-term solutions. Always remember to share your configuration files with the vital data masked or scramble. Happy hadooping

Shelton · ‎04-28-2025

@nifier Thats good to hear. Now the onus is on you to share the provenance setup that helped you resolve your problem. It's priceless to share such information to grow our documentation base. If you do a good detailed write up then the moderators could help integrate that to the official Cloudera knowledgebase. Happy hadooping

Shelton · ‎04-02-2025

@shubham_sharma my bad its hbase.mob.file.expired.period and not hbase.mob.file.expired Happy hadooping

Shelton · ‎04-01-2025

@shubham_sharma The hbase.mob.file.expired property in HBase is found in the hbase-site.xml configuration file. This property is related to HBase's MOB (Medium-sized Objects) feature, which is designed to efficiently store objects that are larger than typical HBase cells but not large enough to warrant HDFS storage. <property> <name>hbase.mob.file.expired</name> <value>30</value> <description>The number of days to keep a mob file before deleting it. Default value is 30 days.</description> </property> Happy hadooping

Shelton · ‎03-28-2025

@allen_chu As you correctly suspected the error indicates that HBase is trying to access a MOB (Medium Object) file in HDFS that no longer exists at the expected location hdfs://ha:8020/apps/hbase/data/archive/data/default/FPCA_ITEMS_TY_NEW/bf92b15900f190730a5482b53d350df0/cf/ab741ac0919480a47353778bda55d142202502239bf346dbbfc6475c8967734c2edfaaf4 Potential Root Causes: The MOB file was manually deleted from HDFS HBase's MOB cleanup process didn't properly update metadata The HDFS path was changed without proper HBase configuration updates Corrupted HBase metadata Incomplete data migration Filesystem inconsistency Incomplete compaction or archiving process 1. Verify HDFS Integrity # Check HDFS file system health hdfs fsck /apps/hbase/data/archive/data/default/FPCA_ITEMS_TY_NEW 2. HBase Data Consistency Check # Start HBase shell hbase shell # Major compact the table to rebuild metadata major_compact 'FPCA_ITEMS_TY_NEW' # Verify table status status 'detailed' 3. Immediate Recovery Options Option A: Repair the table hbase hbck -j <path_to_hbase_classpath> -repair FPCA_ITEMS_TY_NEW 4. Advanced Recovery Options # If previous methods fail, consider: # a) Snapshot and restore hbase shell snapshot 'FPCA_ITEMS_TY_NEW', 'FPCA_ITEMS_TY_NEW_SNAPSHOT' # b) Clone the snapshot clone_snapshot 'FPCA_ITEMS_TY_NEW_SNAPSHOT', 'FPCA_ITEMS_TY_NEW_RECOVERED' Option B: Disable and re-enable the table disable 'FPCA_ITEMS_TY_NEW' enable 'FPCA_ITEMS_TY_NEW' Option C: Run MOB compaction hbase org.apache.hadoop.hbase.mob.mapreduce.Sweeper FPCA_ITEMS_TY_NEW cf Replace the 'cf' with your actual column family name Option 😧 If data is not critical, you can reset MOB references alter 'FPCA_ITEMS_TY_NEW', {NAME => 'cf', MOB_COMPACT_PARTITION_POLICY => 'daily'} major_compact 'FPCA_ITEMS_TY_NEW' 3. Preventive Measures <property> <name>hbase.mob.file.expired.period</name> <value>86400</value>  </property> Additional Checks 1.Verify HDFS permissions for the HBase user 2.Check HDFS health (namenode logs, datanode availability) 3.Review HBase MOB configuration for the table describe 'FPCA_ITEMS_TY_NEW' With the above steps you should be able to resolve your hbase issue Happy hadooping

Shelton · ‎03-23-2025

@shiva239 You've set up GitLabFlowRegistry in NiFi 2.0's controller settings and can import flows from GitLab and commit changes back through the UI your existing CI/CD pipeline uses NiFi toolkit commands. As of NiFi 2.0, there aren't direct toolkit commands specifically designed for the new GitLabFlowRegistry feature someone should correct me here. However, you have a few options to achieve what you need. Option 1: Use the NiFi REST API The GitLabFlowRegistry operations that you perform through the UI are backed by REST API endpoints. You can integrate these REST API calls into your CI/CD pipeline # Example REST API call to import/sync from Git repository curl -X POST "https://your-nifi-host:port/nifi-api/versions/process-groups/{processGroupId}/flow-registry-sync" \ -H "Content-Type: application/json" \ -d '{"registryId":"your-gitlab-registry-id","bucketId":"your-bucket-id","flowId":"your-flow-id","version":1}' \ --key your-key.key --cert your-cert.crt --cacert your-ca.crt Option 2: Hybrid Approach Use a combination of existing toolkit commands and REST API calls: Use toolkit commands for basic operations (deployment to a target NiFi instance) Use REST API calls for GitLabFlowRegistry-specific operations Option 3: Create Custom Scripts You could create wrapper scripts that combine the toolkit functionality with the necessary REST API calls for GitLabFlowRegistry operations: #!/bin/bash # Custom script to sync from GitLabFlowRegistry and deploy # First pull from Git registry via REST API curl -X POST "https://your-nifi-host:port/nifi-api/versions/process-groups/{processGroupId}/flow-registry-sync" \ -H "Content-Type: application/json" \ -d '{"registryId":"your-gitlab-registry-id","bucketId":"your-bucket-id","flowId":"your-flow-id","version":1}' \ --key your-key.key --cert your-cert.crt --cacert your-ca.crt # Then use toolkit commands for additional operations as needed ./bin/cli.sh nifi pg-import ... I recommend adopting Option 3 (custom scripts) for your CI/CD pipeline. This approach: Takes advantage of the new GitLabFlowRegistry feature Maintains compatibility with your existing toolkit-based CI/CD pipeline Provides flexibility to customize the process based on your specific requirements For implementing the REST API calls, consult the NiFi API documentation for the full set of endpoints related to the new GitLabFlowRegistry functionality. The NiFi Admin Guide for version 2.0 should also have details on these new REST endpoints. Happy hadooping

Shelton · ‎03-23-2025

@spserd Looking at your issue with Spark on Kubernetes, I see a clear difference between the client and cluster deployment modes that's causing the "system" authentication problem. the issue is when running in client mode with spark-shell, you're encountering an authorization issue where Spark is trying to create executor pods as "system" instead of using your service account "spark-sa", despite providing the token. Possible Solution For client mode, you need to add a specific configuration to tell Spark to use the token for executor pod creation --conf spark.kubernetes.authenticate.executor.serviceAccountName=spark-sa So your updated command should look like this ./bin/spark-shell \ --master k8s://https://my-k8s-cluster:6443 \ --deploy-mode client \ --name spark-shell-poc \ --conf spark.executor.instances=1 \ --conf spark.kubernetes.container.image=my-docker-hub/spark_poc:v1.4 \ --conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \ --conf spark.kubernetes.namespace=dynx-center-resources \ --conf spark.driver.pod.name=dynx-spark-driver \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \ --conf spark.kubernetes.authenticate.executor.serviceAccountName=spark-sa \ --conf spark.kubernetes.authenticate.submission.oauthToken=$K8S_TOKEN The key is that in client mode, you need to explicitly configure the executor authentication because the driver is running outside the cluster and needs to delegate this permission. If this still doesn't work, ensure your service account has appropriate ClusterRole bindings that allow it to create and manage pods in the specified namespace. Happy hadooping

Shelton · ‎03-21-2025

@PriyankaMondal Looking at your error message log, I can see you're experiencing authentication timeouts with ConsumeIMAP and ConsumePOP3 processors when connecting to Microsoft Office 365 services. Possible Blockers Timeout Issue The primary error is "Read timed out" during authentication, which suggests the connection to Office 365 is being established but then timing out during the OAUTH2 handshake. Microsoft 365 Specific Considerations Microsoft has specific requirements for modern authentication with mail services and has been deprecating basic authentication methods. Processor Configuration Using OAUTH2 authentication mode, is correct for Office 365, but there may be issues with the token acquisition or timeout settings. Possible solutions 1. Check timeout settings # Add these properties to your processor configuration mail.imap.connectiontimeout=60000 mail.imap.timeout=60000 mail.pop3.connectiontimeout=60000 .pop3.timeout=60000 2. Verify Modern Authentication settings Ensure the account has Modern Authentication enabled in Microsoft 365 Admin Center Verify the application registration in Azure AD has the correct permissions IMAP.AccessAsUser.All for IMAP POP.AccessAsUser.All for POP3 offline_access scope for refresh tokens 3. Update NiFi Mail Libraries NiFi's default JavaMail implementation might have compatibility issues with Office 365. Try: Updating to the latest version of NiFi (if possible) Or add Microsoft's MSAL (Microsoft Authentication Library) JAR to NiFi's lib directory 4. Use a Custom SSL Context Service Microsoft servers might require specific TLS settings # Create a Standard SSL Context Service with: Protocol: TLS # Add to Advanced Client settings for the processor 5. Alternative Approach: Use Microsoft Graph API Since Microsoft is moving away from direct IMAP/POP3 access, consider: Using InvokeHTTP processor to authenticate against Microsoft Graph API Use the Graph API endpoints to retrieve email content 6. Check Proxy Settings If your environment uses proxies # Add these properties mail.imap.proxy.host=your-proxy-host mail.imap.proxy.port=your-proxy-port mail.pop3.proxy.host=your-proxy-host mail.pop3.proxy.port=your-proxy-port 7. Implementation Steps Update the processor configuration with extended timeout values Verify OAuth2 settings in the processor match exactly with your Azure application registration Check Microsoft 365 account settings to ensure IMAP/POP3 is enabled with Modern Authentication Consider implementing a token debugging flow using InvokeHTTP to validate token acquisition separately Happy hadooping

Shelton · ‎03-20-2025

@RaoNEY The error message suggests that there's a JWT token algorithm mismatch: "An error occurred while attempting to decode the Jwt: Signed JWT rejected: Another algorithm exp" This typically happens when: The token you're receiving from Azure in Step 1 uses a signing algorithm that doesn't match what NiFi is expecting NiFi is configured to use RS256 algorithm (as shown in your nifi.properties), but the Azure token might be using a different algorithm Verify token algorithm First, check what algorithm your Azure token is using. You can decode your JWT token using tools like jwt.io to see the header which contains the algorithm (look for the "alg" field). Modify your Azure token request Azure AD OAuth tokens typically use RS256, but you may need to specify this explicitly in your Azure app registration settings. Ensure correct token type For NiFi OAuth/OIDC authentication, you need an ID token, not an access token. In your Step 1, you're requesting a client credentials grant which returns an access token. Instead, you need to: # Modified Step 1 - Use authorization code flow to get ID token curl -X POST https://login.microsoftonline.com/{tenant}/oauth2/v2.0/token \ -H "Content-Type: application/x-www-form-urlencoded" \ -d "grant_type=authorization_code" \ -d "client_id={ClientID}" \ -d "client_secret={ClientSecret}" \ -d "code={AuthorizationCode}" \ -d "redirect_uri={RedirectURI}" \ -d "scope=openid email profile Update NiFi properties: Ensure these settings match your Azure configuration: # Make sure these settings are correct nifi.security.user.oidc.jwt.algorithm=RS256 nifi.security.user.oidc.preferred.jwsalgorithm=RS256 Check Azure app registration In your Azure portal, verify: Redirect URI is properly set to your NiFi callback URL The app has appropriate API permissions Token configuration includes ID tokens Complete Authentication Flow For NiFi OAuth with Azure AD, the proper flow should be: 1. Initiate login via NiFi UI or using GET https://NIFIDnsName:9444/nifi-api/access/oidc/request 2. This redirects to Microsoft login page, where user authenticates 3. After successful authentication, Azure redirects back to NiFi with an authorization code 4. NiFi exchanges this code for tokens automatically 5. If you're doing this programmatically, use the authorization code flow, not client credentials The direct token exchange you're attempting in Step 2 might not be supported or requires specific configuration. NiFi typically handles the OIDC token exchange internally after receiving the authorization code. The direct token exchange you're attempting in Step 2 might not be supported or requires specific configuration. NiFi typically handles the OIDC token exchange internally after receiving the authorization code. happy hadooping

Online	Offline
Last Visited	‎06-05-2025 02:03 PM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎06-05-2025 02:03 PM
Posts	3,676
Kudos received	627

Cloudera Community

Re: Apache nifi memory consumption in kubernetes

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Not able to delete the NiFi existing flow usin...

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: Unable to connect NiFi to NiFi Registry

Re: Apache nifi memory consumption in kubernetes

Re: NiFi file transfer report

Re: HBase - java.io.FileNotFoundException: File do...

Re: HBase - java.io.FileNotFoundException: File do...

Re: HBase - java.io.FileNotFoundException: File do...

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Unable to run spark-shell command with k8s as ...

Re: Apache Nifi IMAP/POP3 authentocation Error

Re: NiFi API OAuth authentication issue