Member since
01-19-2017
3681
Posts
633
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1615 | 06-04-2025 11:36 PM | |
| 2073 | 03-23-2025 05:23 AM | |
| 986 | 03-17-2025 10:18 AM | |
| 3747 | 03-05-2025 01:34 PM | |
| 2582 | 03-03-2025 01:09 PM |
05-12-2026
05:34 AM
@AlokKumar You can validate the Discovery URL is working by opening it in a browser you should see a JSON document.
... View more
05-11-2026
01:22 PM
1 Kudo
@AlokKumar OpenID Connect (OIDC) is a standard login protocol. Instead of NiFi managing its own passwords, it redirects users to your internal Identity Provider (IdP) to log in. Your IdP says "yes, this is a valid user" and sends NiFi a token. NiFi trusts that and lets the user in. Every OIDC-compatible provider publishes a public JSON file describing itself its endpoints, what it supports, etc. This URL always ends with /.well-known/openid-configuration. NiFi fetches this URL at startup to learn how to talk to your provider. Example: https://your-internal-idp.company.com/.well-known/openid-configuration When you register NiFi as an "application" in your internal IdP, the IdP gives you two credentials a Client ID (like a username for the app) and a Client Secret (like a password for the app). NiFi uses these to prove to the IdP that it is a legitimate registered application. Is Kubernetes required? The Cloudera/Kubernetes article you read uses kubectl create secret only because it runs NiFi inside Kubernetes, where secrets are managed that way. On a plain machine, you just put the Client ID and Secret directly into nifi.properties as plain text, or use NiFi's built-in encrypt-config tool for security. Step-by-Step: Configure OIDC on Standalone NiFi (1.23 or 2.8) Step 1 Register NiFi in your internal Identity Provider Ask your IdP administrator to register a new OIDC client/application with: Name: Apache NiFi (or anything descriptive) Redirect URIs (these are mandatory) https://<your-nifi-host>:<port>/nifi-api/access/oidc/callback https://<your-nifi-host>:<port>/nifi-api/access/oidc/logout/callback Grant type: Authorization Code Once registered, your IdP admin will give you: A Client ID (e.g nifi-client-prod) A Client Secret (a long random string) The Discovery URL (e.g. https://idp.company.com/.well-known/openid-configuration) Verify the Discovery URL works by opening it in a browser you should see a JSON document. Step 2 Ensure NiFi is running with TLS (HTTPS) OIDC requires NiFi to run over HTTPS. It will not work on plain HTTP. Check your nifi.properties: nifi.web.https.host=0.0.0.0 nifi.web.https.port=8443 nifi.security.keystore=/path/to/keystore.jks nifi.security.keystoreType=JKS nifi.security.keystorePasswd=your_keystore_password nifi.security.truststore=/path/to/truststore.jks nifi.security.truststoreType=JKS nifi.security.truststorePasswd=your_truststore_password If you don't have a keystore/truststore yet, NiFi ships with a tls-toolkit.sh (in the bin/ directory) that can generate them for testing. Step 3 Edit conf/nifi.properties Open <nifi-install-dir>/conf/nifi.properties in a text editor and set these properties: # --- OIDC Authentication --- nifi.security.user.oidc.discovery.url=https://your-internal-idp.company.com/.well-known/openid-configuration nifi.security.user.oidc.client.id=nifi-client-prod nifi.security.user.oidc.client.secret=your-client-secret-here nifi.security.user.oidc.connect.timeout=5 secs nifi.security.user.oidc.read.timeout=5 secs # The claim in the OIDC token that identifies the user (usually email or preferred_username) nifi.security.user.oidc.claim.identifying.user=email # Scope - 'openid email profile' covers most providers nifi.security.user.oidc.additional.scopes=email profile # Leave blank unless your provider requires a specific algorithm nifi.security.user.oidc.preferred.jwsalgorithm= The nifi.security.user.oidc.discovery.url should be set to your provider's issuer endpoint with /.well-known/openid-configuration The nifi.security.user.oidc.claim.identifying.user value depends on your provider ask your IdP admin which claim carries the unique username. Common values are email, preferred_username, or sub. Step 4 Configure the Initial Admin in conf/authorizers.xml NiFi needs to know which user gets admin rights on first startup. Open conf/authorizers.xml and find the <property name="Initial Admin Identity"> line inside the FileAccessPolicyProvider block: <property name="Initial Admin Identity">your.email@company.com</property> This value must exactly match the identity claim that NiFi will receive from your OIDC provider after login so if your provider sends email, put your email address here. If it sends preferred_username, put your username. Step 5 Restart NiFi cd /opt/nifi/nifi-current ./bin/nifi.sh restart Watch the logs for errors: tail -f logs/nifi-app.log Step 6 Test the Login When a user attempts to access NiFi, NiFi will redirect them to your identity provider to log in. After logging in, the provider sends NiFi a response containing the user's credentials, and NiFi authenticates the user. Navigate to https://<your-nifi-host>:8443/nifi you should be redirected to your internal IdP login page instead of the NiFi login form. Step 7 (Optional) Encrypt the Client Secret Leaving a plain-text secret in nifi.properties is acceptable for testing but not ideal for production. NiFi ships with an encrypt-config tool ./bin/encrypt-config.sh \ -n conf/nifi.properties \ -o conf/nifi.properties \ -p your_master_password This encrypts sensitive values in the file so they are not readable in plain text. Share your feedback Happy Hadooping
... View more
12-10-2025
04:20 AM
@Amr5 The NoSuchMethodError means JAR conflict at runtime. You must ensure that only CDH 7.2.18 Hive JARs are in the classpath, with no remnants of 7.1.9. The ParseDriver.parse() method signature changed between Hive versions. In your case the Old Hive JARs (from CDH 7.1.9) are still present in /data1/informatica/dei/services/shared/hadoop/CDH_7.218 Java is loading the old hive-exec.jar instead of the new one, causing method signature mismatches Step 1. Identify ALL Old Hive JARs find /data1/informatica/dei/services/shared/hadoop/CDH_7.218 -name "hive*.jar" -exec ls -lh {} \; Step 2: Remove ALL Old Hive JARs cd /data1/informatica/dei/services/shared/hadoop/CDH_7.218 # Create backup directory if not exists mkdir -p backup_all_old_hive_jars # Move ALL hive-related JARs to backup mv hive*.jar backup_all_old_hive_jars/ Step 3: Copy ALL Correct Hive JARs from Cloudera Cluster # Find Cloudera CDH 7.2.18 parcels location CLOUDERA_PARCEL=$(find /opt/cloudera/parcels -maxdepth 1 -type d -name "CDH-7.2.18*" | head -1) # Copy ALL Hive JARs cp $CLOUDERA_PARCEL/lib/hive/lib/hive*.jar /data1/informatica/dei/services/shared/hadoop/CDH_7.218/ # Also copy Hive dependencies cp $CLOUDERA_PARCEL/jars/hive*.jar /data1/informatica/dei/services/shared/hadoop/CDH_7.218/ Step 4: Verify Correct Versions cd /data1/informatica/dei/services/shared/hadoop/CDH_7.218 ls -lh hive*.jar | head -5 # Check the version inside hive-exec.jar unzip -p hive-exec-*.jar META-INF/MANIFEST.MF | grep -i version Step 5: Clear Java Classpath Cache # Remove compiled artifacts rm -rf /data1/informatica/dei/tomcat/bin/disTemp/DOM_IDQ_DEV/DIS_DEI_DEV/node02_DEI_DEV/cloudera_dev/SPARK/* rm -rf /data1/informatica/dei/tomcat/bin/disTemp/DOM_IDQ_DEV/DIS_DEI_DEV/node02_DEI_DEV/cloudera_dev/HIVE/* Step 6: Restart Informatica Services infaservice.sh dis stop -domain DOM_IDQ_DEV -service DIS_DEI_DEV infaservice.sh dis start -domain DOM_IDQ_DEV -service DIS_DEI_DEV Step 7: Verify Hadoop Distribution in Informatica Admin Console Login to Informatica Administrator Navigate to DIS_DEI_DEV → Properties → Hadoop Connection Click Test Connection If it fails, click Re-import Hadoop Configuration to refresh Step 8: Re-run Your Mapping Happy Hadooping
... View more
12-09-2025
07:56 AM
@Amr5 Just as you too realized there is an old path issue. The error indicates a version mismatch between the Hive/Tez libraries being used by Informatica and those expected by your Cloudera cluster. RCA 1. Informatica is using Hive libraries from an older Cloudera version (7.1.9) 2. Your cluster is running Cloudera 7.2.18 (as shown in the path /data1/informatica/dei/services/shared/hadoop/CDH_7.218 3. The HiveSplitGenerator class in the old hive-exec.jar is incompatible with the newer Tez runtime Step 1: Locate Current Hive Libraries sudo find /data1/informatica -type f -name "hive-exec*.jar" 2>/dev/null Step 2: Backup Old Libraries cd /data1/informatica/dei/services/shared/hadoop/CDH_7.218 mkdir -p backup_old_hive_libs mv hive-exec*.jar backup_old_hive_libs/ Step 3: Copy Correct Hive Libraries from Cluster # Find the correct hive-exec.jar on your Cloudera cluster find /opt/cloudera/parcels -name "hive-exec*.jar" 2>/dev/null # Copy it to Informatica's Hadoop distribution directory cp /opt/cloudera/parcels/CDH-7.2.18*/lib/hive/lib/hive-exec-*.jar \ /data1/informatica/dei/services/shared/hadoop/CDH_7.218/ Step 4: Update Informatica Hadoop Distribution In Informatica Administrator Console: Navigate to Data Integration Service → Properties Go to Hadoop Connection → Distribution Verify it points to: /data1/informatica/dei/services/shared/hadoop/CDH_7.218 Click Test Connection to validate If needed, use Re-import Hadoop Configuration to refresh cluster configs Step 5: Restart Services infaservice.sh dis restart -domain DOM_IDQ_DEV -service DIS_DEI_DEV Step 6: Clear Cached Compilation Files rm -rf /data1/informatica/dei/tomcat/bin/disTemp/DOM_IDQ_DEV/DIS_DEI_DEV/node02_DEI_DEV/cloudera_dev/SPARK/* rm -rf /tmp/sqoop-infadpdev/* Step 7: Re-run Your Mapping If you have multiple nodes in your Informatica cluster, repeat Steps 2-3 on all nodes where the Data Integration Service runs. Happy hadooping
... View more
12-07-2025
11:30 PM
@Amr5 From the logs you shared the core issue is FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask
Vertex failed: INIT_FAILURE
Unable to instantiate class with 1 arguments:org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
Caused by: java.lang.ExceptionInInitializerError This indicates a classpath/library compatibility issue between Informatica and the upgraded Cloudera CDP cluster, specifically with Tez and Hive components. Root Causes: Version Mismatch: The Informatica integration is pointing to CDH 7.218 libraries, your HADOOP_MAPRED_HOME is /data1/informatica/dei/services/shared/hadoop/CDH_7.218/lib but your cluster was upgraded to a newer CDP version with incompatible Hive/Tez libraries. Class Initialization Failure: The HiveSplitGenerator class cannot be instantiated, likely due to missing or incompatible dependencies. Static Initializer Problem: The ExceptionInInitializerError suggests a static block in one of the Hive classes is failing during initialization. Solution 1 Update the Informatica Hadoop connection: Go to Administrator → Connections Edit your Hadoop connection Update the Hadoop distribution version to match your new CDP version Update the configuration files (core-site.xml, hdfs-site.xml, hive-site.xml, etc.) Update Hadoop libraries: Copy new CDP client libraries to Informatica cp -r /opt/cloudera/parcels/CDH/lib/* /data1/informatica/dei/services/shared/hadoop/CDP_<version>/lib/ Restart Informatica services: infaservice.sh stopService infaservice.sh startService Additionally Identify and copy missing Tez JARs # From CDP cluster, copy Tez libraries cp /opt/cloudera/parcels/CDH/lib/tez/*.jar /data1/informatica/dei/services/shared/hadoop/CDH_7.218/lib/ # Copy Hive execution libraries cp /opt/cloudera/parcels/CDH/lib/hive/lib/hive-exec-*.jar /data1/informatica/dei/services/shared/hadoop/CDH_7.218/lib/ Update classpathin Informatica domain configuration Configure Hive Execution Engine If Tez is causing issues, temporarily switch to MapReduce. In your Hive connection properties, add: hive.execution.engine=mr Update this post after the above steps and always share the logs to enable us understand whats happening in your environment. Happy hadooping
... View more
06-05-2025
12:37 AM
@sydney- The SSL handshake error you're encountering is a common issue when connecting NiFi instances to NiFi Registry in secure environments it indicates that your NiFi instances cannot verify the SSL certificate presented by the NiFi Registry server. javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider. certpath.SunCertPathBuilder
Exception:
unable to find valid certification path to requested target Based on your description, there are several areas to address. The certificate used by NiFi Registry is self-signed or not issued by a trusted Certificate Authority (CA) The certificate chain is incomplete The truststore configuration is incorrect 1. Certificate Trust Configuration Verify Certificate Chain: # Check if certificate is in NiFi truststore (repeat for each instance)
keytool -list -v -keystore /path/to/nifi/truststore.jks -storepass [password]
# Check if certificate is in Registry truststore
keytool -list -v -keystore /path/to/registry/truststore.jks -storepass [password]
# Verify the Registry's certificate chain
openssl s_client -connect nifi-registry.example.com:443 -showcerts Ensure Complete Certificate Chain: Add the Registry's complete certificate chain (including intermediate CAs) to NiFi's truststore Add NiFi's complete certificate chain to the Registry's truststore # Add Registry certificate to NiFi truststore
keytool -import -alias nifi-registry -file registry-cert.pem -keystore /path/to/nifi/conf/truststore.jks -storepass [password]
# Add NiFi certificate to Registry truststore
keytool -import -alias nifi-prod -file nifi-cert.pem -keystore /path/to/registry/conf/truststore.jks -storepass [password] 2. Proper Certificate Exchange Ensure you've exchanged certificates correctly export NiFi Registry's public certificate keytool -exportcert -alias nifi-registry -keystore /path/to/registry/keystore.jks -file registry.crt -storepass [password] Import this certificate into each NiFi instance's truststore keytool -importcert -alias nifi-registry -keystore /path/to/nifi/truststore.jks -file registry.crt -storepass [password] -noprompt 3. NiFi Registry Connection Configuration In your NiFi instance (nifi.properties), verify # Registry client properties
nifi.registry.client.name=NiFi Registry
nifi.registry.client.url=https://nifi-registry.example.com/nifi-registry
nifi.registry.client.timeout.connect=30 secs
nifi.registry.client.timeout.read=30 secs Verify these configuration files in NiFi (production/development) # nifi.properties:
nifi.registry.client.ssl.protocol=TLS
nifi.registry.client.truststore.path=/path/to/truststore.jks
nifi.registry.client.truststore.password=[password]
nifi.registry.client.truststore.type=JKS In NiFi Registry # nifi-registry.properties:
nifi.registry.security.truststore.path=/path/to/truststore.jks
nifi.registry.security.truststore.password=[password]
nifi.registry.security.truststore.type=JKS 4. LDAP Configuration For your LDAP integration issues in authorizers.xml ensure you have <accessPolicyProvider>
<identifier>file-access-policy-provider</identifier>
<class>org.apache.nifi.registry.security.authorization.FileAccessPolicyProvider</class>
<property name="User Group Provider">ldap-user-group-provider</property>
<property name="Authorizations File">./conf/authorizations.xml</property>
<property name="Initial Admin Identity">cn=admin-user,ou=users,dc=example,dc=com</property>
<property name="NiFi Identity 1">cn=dev-nifi,ou=servers,dc=example,dc=com</property>
</accessPolicyProvider> In the authorizations.xml add appropriate policies for the dev-nifi identity <policy identifier="some-uuid" resource="/buckets" action="READ">
<user identifier="dev-nifi-uuid"/>
</policy> 5. Proxy Configuration For proxy user requests, add in nifi.properties nifi.registry.client.proxy.identity=cn=dev-nifi,ou=servers,dc=example,dc=com 6. Restart Order After making changes restart the Nifi instance in the below order NiFi Registry first Then restart all NiFi instances Happy hadoping
... View more
06-04-2025
11:36 PM
@hegdemahendra This is a classic case of off-heap memory consumption in NiFi. The 3G you see in the GUI only represents JVM heap + non-heap memory, but NiFi uses significant additional memory outside the JVM that doesn't appear in those metrics. Next time could you share your deployment YAML files that would help with solutioning Root Causes of Off-Heap Memory Usage: Content Repository (Primary Culprit) NiFi uses memory-mapped files for the content repository Large FlowFiles are mapped directly into memory This memory appears as process memory but not JVM memory Provenance Repository Uses Lucene indexes that consume off-heap memory Memory-mapped files for provenance data storage Native Libraries Compression libraries (gzip, snappy) Cryptographic libraries Network I/O libraries Direct Memory Buffers NIO operations use direct ByteBuffers Network and file I/O operations Possible Solutions: 1. Reduce JVM Heap Size # Instead of 28G JVM heap, try:
NIFI_JVM_HEAP_INIT: "16g"
NIFI_JVM_HEAP_MAX: "16g" This leaves more room (24G) for off-heap usage. 2. Configure Direct Memory Limit Add JVM arguments: -XX:MaxDirectMemorySize=8g 3. Content Repository Configuration In nifi.properties: # Limit content repository size
nifi.content.repository.archive.max.retention.period=1 hour
nifi.content.repository.archive.max.usage.percentage=50%
# Use file-based instead of memory-mapped (if possible)
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository 4. Provenance Repository Tuning # Reduce provenance retention
nifi.provenance.repository.max.storage.time=6 hours
nifi.provenance.repository.max.storage.size=10 GB Long-term Solutions: 1. Increase Pod Memory Limit resources:
limits:
memory: "60Gi" # Increase from 40G
requests:
memory: "50Gi" 2. Monitor Off-Heap Usage Enable JVM flags for better monitoring: -XX:NativeMemoryTracking=summary
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintNMTStatistics 3. Implement Memory-Efficient Flow Design Process smaller batches Avoid keeping large FlowFiles in memory Use streaming processors where possible Implement backpressure properly 4. Consider Multi-Pod Deployment Instead of single large pod, use multiple smaller pods: # 3 pods with 20G each instead of 1 pod with 40G
replicas: 3
resources:
limits:
memory: "20Gi" Monitoring Commands: # Check native memory tracking
kubectl exec -it <nifi-pod> -- jcmd <pid> VM.native_memory summary
# Monitor process memory
kubectl top pod <nifi-pod>
# Check memory breakdown
kubectl exec -it <nifi-pod> -- cat /proc/<pid>/status | grep -i mem Start with reducing JVM heap to 16G and implementing content repository limits. This should immediately reduce OOM occurrences while you plan for longer-term solutions. Always remember to share your configuration files with the vital data masked or scramble. Happy hadooping
... View more
04-28-2025
05:43 AM
1 Kudo
@nifier Thats good to hear. Now the onus is on you to share the provenance setup that helped you resolve your problem. It's priceless to share such information to grow our documentation base. If you do a good detailed write up then the moderators could help integrate that to the official Cloudera knowledgebase. Happy hadooping
... View more
04-02-2025
07:09 AM
1 Kudo
@shubham_sharma my bad its hbase.mob.file.expired.period and not hbase.mob.file.expired Happy hadooping
... View more
04-01-2025
05:17 AM
@shubham_sharma The hbase.mob.file.expired property in HBase is found in the hbase-site.xml configuration file. This property is related to HBase's MOB (Medium-sized Objects) feature, which is designed to efficiently store objects that are larger than typical HBase cells but not large enough to warrant HDFS storage. <property> <name>hbase.mob.file.expired</name> <value>30</value> <description>The number of days to keep a mob file before deleting it. Default value is 30 days.</description> </property> Happy hadooping
... View more