Member since
01-19-2017
3679
Posts
632
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 665 | 06-04-2025 11:36 PM | |
| 1245 | 03-23-2025 05:23 AM | |
| 615 | 03-17-2025 10:18 AM | |
| 2263 | 03-05-2025 01:34 PM | |
| 1463 | 03-03-2025 01:09 PM |
12-10-2025
04:20 AM
@Amr5 The NoSuchMethodError means JAR conflict at runtime. You must ensure that only CDH 7.2.18 Hive JARs are in the classpath, with no remnants of 7.1.9. The ParseDriver.parse() method signature changed between Hive versions. In your case the Old Hive JARs (from CDH 7.1.9) are still present in /data1/informatica/dei/services/shared/hadoop/CDH_7.218 Java is loading the old hive-exec.jar instead of the new one, causing method signature mismatches Step 1. Identify ALL Old Hive JARs find /data1/informatica/dei/services/shared/hadoop/CDH_7.218 -name "hive*.jar" -exec ls -lh {} \; Step 2: Remove ALL Old Hive JARs cd /data1/informatica/dei/services/shared/hadoop/CDH_7.218 # Create backup directory if not exists mkdir -p backup_all_old_hive_jars # Move ALL hive-related JARs to backup mv hive*.jar backup_all_old_hive_jars/ Step 3: Copy ALL Correct Hive JARs from Cloudera Cluster # Find Cloudera CDH 7.2.18 parcels location CLOUDERA_PARCEL=$(find /opt/cloudera/parcels -maxdepth 1 -type d -name "CDH-7.2.18*" | head -1) # Copy ALL Hive JARs cp $CLOUDERA_PARCEL/lib/hive/lib/hive*.jar /data1/informatica/dei/services/shared/hadoop/CDH_7.218/ # Also copy Hive dependencies cp $CLOUDERA_PARCEL/jars/hive*.jar /data1/informatica/dei/services/shared/hadoop/CDH_7.218/ Step 4: Verify Correct Versions cd /data1/informatica/dei/services/shared/hadoop/CDH_7.218 ls -lh hive*.jar | head -5 # Check the version inside hive-exec.jar unzip -p hive-exec-*.jar META-INF/MANIFEST.MF | grep -i version Step 5: Clear Java Classpath Cache # Remove compiled artifacts rm -rf /data1/informatica/dei/tomcat/bin/disTemp/DOM_IDQ_DEV/DIS_DEI_DEV/node02_DEI_DEV/cloudera_dev/SPARK/* rm -rf /data1/informatica/dei/tomcat/bin/disTemp/DOM_IDQ_DEV/DIS_DEI_DEV/node02_DEI_DEV/cloudera_dev/HIVE/* Step 6: Restart Informatica Services infaservice.sh dis stop -domain DOM_IDQ_DEV -service DIS_DEI_DEV infaservice.sh dis start -domain DOM_IDQ_DEV -service DIS_DEI_DEV Step 7: Verify Hadoop Distribution in Informatica Admin Console Login to Informatica Administrator Navigate to DIS_DEI_DEV → Properties → Hadoop Connection Click Test Connection If it fails, click Re-import Hadoop Configuration to refresh Step 8: Re-run Your Mapping Happy Hadooping
... View more
12-09-2025
07:56 AM
@Amr5 Just as you too realized there is an old path issue. The error indicates a version mismatch between the Hive/Tez libraries being used by Informatica and those expected by your Cloudera cluster. RCA 1. Informatica is using Hive libraries from an older Cloudera version (7.1.9) 2. Your cluster is running Cloudera 7.2.18 (as shown in the path /data1/informatica/dei/services/shared/hadoop/CDH_7.218 3. The HiveSplitGenerator class in the old hive-exec.jar is incompatible with the newer Tez runtime Step 1: Locate Current Hive Libraries sudo find /data1/informatica -type f -name "hive-exec*.jar" 2>/dev/null Step 2: Backup Old Libraries cd /data1/informatica/dei/services/shared/hadoop/CDH_7.218 mkdir -p backup_old_hive_libs mv hive-exec*.jar backup_old_hive_libs/ Step 3: Copy Correct Hive Libraries from Cluster # Find the correct hive-exec.jar on your Cloudera cluster find /opt/cloudera/parcels -name "hive-exec*.jar" 2>/dev/null # Copy it to Informatica's Hadoop distribution directory cp /opt/cloudera/parcels/CDH-7.2.18*/lib/hive/lib/hive-exec-*.jar \ /data1/informatica/dei/services/shared/hadoop/CDH_7.218/ Step 4: Update Informatica Hadoop Distribution In Informatica Administrator Console: Navigate to Data Integration Service → Properties Go to Hadoop Connection → Distribution Verify it points to: /data1/informatica/dei/services/shared/hadoop/CDH_7.218 Click Test Connection to validate If needed, use Re-import Hadoop Configuration to refresh cluster configs Step 5: Restart Services infaservice.sh dis restart -domain DOM_IDQ_DEV -service DIS_DEI_DEV Step 6: Clear Cached Compilation Files rm -rf /data1/informatica/dei/tomcat/bin/disTemp/DOM_IDQ_DEV/DIS_DEI_DEV/node02_DEI_DEV/cloudera_dev/SPARK/* rm -rf /tmp/sqoop-infadpdev/* Step 7: Re-run Your Mapping If you have multiple nodes in your Informatica cluster, repeat Steps 2-3 on all nodes where the Data Integration Service runs. Happy hadooping
... View more
12-07-2025
11:30 PM
@Amr5 From the logs you shared the core issue is FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask
Vertex failed: INIT_FAILURE
Unable to instantiate class with 1 arguments:org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
Caused by: java.lang.ExceptionInInitializerError This indicates a classpath/library compatibility issue between Informatica and the upgraded Cloudera CDP cluster, specifically with Tez and Hive components. Root Causes: Version Mismatch: The Informatica integration is pointing to CDH 7.218 libraries, your HADOOP_MAPRED_HOME is /data1/informatica/dei/services/shared/hadoop/CDH_7.218/lib but your cluster was upgraded to a newer CDP version with incompatible Hive/Tez libraries. Class Initialization Failure: The HiveSplitGenerator class cannot be instantiated, likely due to missing or incompatible dependencies. Static Initializer Problem: The ExceptionInInitializerError suggests a static block in one of the Hive classes is failing during initialization. Solution 1 Update the Informatica Hadoop connection: Go to Administrator → Connections Edit your Hadoop connection Update the Hadoop distribution version to match your new CDP version Update the configuration files (core-site.xml, hdfs-site.xml, hive-site.xml, etc.) Update Hadoop libraries: Copy new CDP client libraries to Informatica cp -r /opt/cloudera/parcels/CDH/lib/* /data1/informatica/dei/services/shared/hadoop/CDP_<version>/lib/ Restart Informatica services: infaservice.sh stopService infaservice.sh startService Additionally Identify and copy missing Tez JARs # From CDP cluster, copy Tez libraries cp /opt/cloudera/parcels/CDH/lib/tez/*.jar /data1/informatica/dei/services/shared/hadoop/CDH_7.218/lib/ # Copy Hive execution libraries cp /opt/cloudera/parcels/CDH/lib/hive/lib/hive-exec-*.jar /data1/informatica/dei/services/shared/hadoop/CDH_7.218/lib/ Update classpathin Informatica domain configuration Configure Hive Execution Engine If Tez is causing issues, temporarily switch to MapReduce. In your Hive connection properties, add: hive.execution.engine=mr Update this post after the above steps and always share the logs to enable us understand whats happening in your environment. Happy hadooping
... View more
06-05-2025
12:37 AM
@sydney- The SSL handshake error you're encountering is a common issue when connecting NiFi instances to NiFi Registry in secure environments it indicates that your NiFi instances cannot verify the SSL certificate presented by the NiFi Registry server. javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider. certpath.SunCertPathBuilder
Exception:
unable to find valid certification path to requested target Based on your description, there are several areas to address. The certificate used by NiFi Registry is self-signed or not issued by a trusted Certificate Authority (CA) The certificate chain is incomplete The truststore configuration is incorrect 1. Certificate Trust Configuration Verify Certificate Chain: # Check if certificate is in NiFi truststore (repeat for each instance)
keytool -list -v -keystore /path/to/nifi/truststore.jks -storepass [password]
# Check if certificate is in Registry truststore
keytool -list -v -keystore /path/to/registry/truststore.jks -storepass [password]
# Verify the Registry's certificate chain
openssl s_client -connect nifi-registry.example.com:443 -showcerts Ensure Complete Certificate Chain: Add the Registry's complete certificate chain (including intermediate CAs) to NiFi's truststore Add NiFi's complete certificate chain to the Registry's truststore # Add Registry certificate to NiFi truststore
keytool -import -alias nifi-registry -file registry-cert.pem -keystore /path/to/nifi/conf/truststore.jks -storepass [password]
# Add NiFi certificate to Registry truststore
keytool -import -alias nifi-prod -file nifi-cert.pem -keystore /path/to/registry/conf/truststore.jks -storepass [password] 2. Proper Certificate Exchange Ensure you've exchanged certificates correctly export NiFi Registry's public certificate keytool -exportcert -alias nifi-registry -keystore /path/to/registry/keystore.jks -file registry.crt -storepass [password] Import this certificate into each NiFi instance's truststore keytool -importcert -alias nifi-registry -keystore /path/to/nifi/truststore.jks -file registry.crt -storepass [password] -noprompt 3. NiFi Registry Connection Configuration In your NiFi instance (nifi.properties), verify # Registry client properties
nifi.registry.client.name=NiFi Registry
nifi.registry.client.url=https://nifi-registry.example.com/nifi-registry
nifi.registry.client.timeout.connect=30 secs
nifi.registry.client.timeout.read=30 secs Verify these configuration files in NiFi (production/development) # nifi.properties:
nifi.registry.client.ssl.protocol=TLS
nifi.registry.client.truststore.path=/path/to/truststore.jks
nifi.registry.client.truststore.password=[password]
nifi.registry.client.truststore.type=JKS In NiFi Registry # nifi-registry.properties:
nifi.registry.security.truststore.path=/path/to/truststore.jks
nifi.registry.security.truststore.password=[password]
nifi.registry.security.truststore.type=JKS 4. LDAP Configuration For your LDAP integration issues in authorizers.xml ensure you have <accessPolicyProvider>
<identifier>file-access-policy-provider</identifier>
<class>org.apache.nifi.registry.security.authorization.FileAccessPolicyProvider</class>
<property name="User Group Provider">ldap-user-group-provider</property>
<property name="Authorizations File">./conf/authorizations.xml</property>
<property name="Initial Admin Identity">cn=admin-user,ou=users,dc=example,dc=com</property>
<property name="NiFi Identity 1">cn=dev-nifi,ou=servers,dc=example,dc=com</property>
</accessPolicyProvider> In the authorizations.xml add appropriate policies for the dev-nifi identity <policy identifier="some-uuid" resource="/buckets" action="READ">
<user identifier="dev-nifi-uuid"/>
</policy> 5. Proxy Configuration For proxy user requests, add in nifi.properties nifi.registry.client.proxy.identity=cn=dev-nifi,ou=servers,dc=example,dc=com 6. Restart Order After making changes restart the Nifi instance in the below order NiFi Registry first Then restart all NiFi instances Happy hadoping
... View more
06-04-2025
11:36 PM
@hegdemahendra This is a classic case of off-heap memory consumption in NiFi. The 3G you see in the GUI only represents JVM heap + non-heap memory, but NiFi uses significant additional memory outside the JVM that doesn't appear in those metrics. Next time could you share your deployment YAML files that would help with solutioning Root Causes of Off-Heap Memory Usage: Content Repository (Primary Culprit) NiFi uses memory-mapped files for the content repository Large FlowFiles are mapped directly into memory This memory appears as process memory but not JVM memory Provenance Repository Uses Lucene indexes that consume off-heap memory Memory-mapped files for provenance data storage Native Libraries Compression libraries (gzip, snappy) Cryptographic libraries Network I/O libraries Direct Memory Buffers NIO operations use direct ByteBuffers Network and file I/O operations Possible Solutions: 1. Reduce JVM Heap Size # Instead of 28G JVM heap, try:
NIFI_JVM_HEAP_INIT: "16g"
NIFI_JVM_HEAP_MAX: "16g" This leaves more room (24G) for off-heap usage. 2. Configure Direct Memory Limit Add JVM arguments: -XX:MaxDirectMemorySize=8g 3. Content Repository Configuration In nifi.properties: # Limit content repository size
nifi.content.repository.archive.max.retention.period=1 hour
nifi.content.repository.archive.max.usage.percentage=50%
# Use file-based instead of memory-mapped (if possible)
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository 4. Provenance Repository Tuning # Reduce provenance retention
nifi.provenance.repository.max.storage.time=6 hours
nifi.provenance.repository.max.storage.size=10 GB Long-term Solutions: 1. Increase Pod Memory Limit resources:
limits:
memory: "60Gi" # Increase from 40G
requests:
memory: "50Gi" 2. Monitor Off-Heap Usage Enable JVM flags for better monitoring: -XX:NativeMemoryTracking=summary
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintNMTStatistics 3. Implement Memory-Efficient Flow Design Process smaller batches Avoid keeping large FlowFiles in memory Use streaming processors where possible Implement backpressure properly 4. Consider Multi-Pod Deployment Instead of single large pod, use multiple smaller pods: # 3 pods with 20G each instead of 1 pod with 40G
replicas: 3
resources:
limits:
memory: "20Gi" Monitoring Commands: # Check native memory tracking
kubectl exec -it <nifi-pod> -- jcmd <pid> VM.native_memory summary
# Monitor process memory
kubectl top pod <nifi-pod>
# Check memory breakdown
kubectl exec -it <nifi-pod> -- cat /proc/<pid>/status | grep -i mem Start with reducing JVM heap to 16G and implementing content repository limits. This should immediately reduce OOM occurrences while you plan for longer-term solutions. Always remember to share your configuration files with the vital data masked or scramble. Happy hadooping
... View more
04-28-2025
05:43 AM
1 Kudo
@nifier Thats good to hear. Now the onus is on you to share the provenance setup that helped you resolve your problem. It's priceless to share such information to grow our documentation base. If you do a good detailed write up then the moderators could help integrate that to the official Cloudera knowledgebase. Happy hadooping
... View more
04-02-2025
07:09 AM
1 Kudo
@shubham_sharma my bad its hbase.mob.file.expired.period and not hbase.mob.file.expired Happy hadooping
... View more
04-01-2025
05:17 AM
@shubham_sharma The hbase.mob.file.expired property in HBase is found in the hbase-site.xml configuration file. This property is related to HBase's MOB (Medium-sized Objects) feature, which is designed to efficiently store objects that are larger than typical HBase cells but not large enough to warrant HDFS storage. <property> <name>hbase.mob.file.expired</name> <value>30</value> <description>The number of days to keep a mob file before deleting it. Default value is 30 days.</description> </property> Happy hadooping
... View more
03-28-2025
04:23 AM
1 Kudo
@allen_chu As you correctly suspected the error indicates that HBase is trying to access a MOB (Medium Object) file in HDFS that no longer exists at the expected location hdfs://ha:8020/apps/hbase/data/archive/data/default/FPCA_ITEMS_TY_NEW/bf92b15900f190730a5482b53d350df0/cf/ab741ac0919480a47353778bda55d142202502239bf346dbbfc6475c8967734c2edfaaf4 Potential Root Causes: The MOB file was manually deleted from HDFS HBase's MOB cleanup process didn't properly update metadata The HDFS path was changed without proper HBase configuration updates Corrupted HBase metadata Incomplete data migration Filesystem inconsistency Incomplete compaction or archiving process 1. Verify HDFS Integrity # Check HDFS file system health hdfs fsck /apps/hbase/data/archive/data/default/FPCA_ITEMS_TY_NEW 2. HBase Data Consistency Check # Start HBase shell hbase shell # Major compact the table to rebuild metadata major_compact 'FPCA_ITEMS_TY_NEW' # Verify table status status 'detailed' 3. Immediate Recovery Options Option A: Repair the table hbase hbck -j <path_to_hbase_classpath> -repair FPCA_ITEMS_TY_NEW 4. Advanced Recovery Options # If previous methods fail, consider: # a) Snapshot and restore hbase shell snapshot 'FPCA_ITEMS_TY_NEW', 'FPCA_ITEMS_TY_NEW_SNAPSHOT' # b) Clone the snapshot clone_snapshot 'FPCA_ITEMS_TY_NEW_SNAPSHOT', 'FPCA_ITEMS_TY_NEW_RECOVERED' Option B: Disable and re-enable the table disable 'FPCA_ITEMS_TY_NEW' enable 'FPCA_ITEMS_TY_NEW' Option C: Run MOB compaction hbase org.apache.hadoop.hbase.mob.mapreduce.Sweeper FPCA_ITEMS_TY_NEW cf Replace the 'cf' with your actual column family name Option 😧 If data is not critical, you can reset MOB references alter 'FPCA_ITEMS_TY_NEW', {NAME => 'cf', MOB_COMPACT_PARTITION_POLICY => 'daily'} major_compact 'FPCA_ITEMS_TY_NEW' 3. Preventive Measures <property> <name>hbase.mob.file.expired.period</name> <value>86400</value> <!-- 1 day in seconds --> </property> Additional Checks 1.Verify HDFS permissions for the HBase user 2.Check HDFS health (namenode logs, datanode availability) 3.Review HBase MOB configuration for the table describe 'FPCA_ITEMS_TY_NEW' With the above steps you should be able to resolve your hbase issue Happy hadooping
... View more
03-23-2025
05:23 AM
2 Kudos
@shiva239 You've set up GitLabFlowRegistry in NiFi 2.0's controller settings and can import flows from GitLab and commit changes back through the UI your existing CI/CD pipeline uses NiFi toolkit commands. As of NiFi 2.0, there aren't direct toolkit commands specifically designed for the new GitLabFlowRegistry feature someone should correct me here. However, you have a few options to achieve what you need. Option 1: Use the NiFi REST API The GitLabFlowRegistry operations that you perform through the UI are backed by REST API endpoints. You can integrate these REST API calls into your CI/CD pipeline # Example REST API call to import/sync from Git repository curl -X POST "https://your-nifi-host:port/nifi-api/versions/process-groups/{processGroupId}/flow-registry-sync" \ -H "Content-Type: application/json" \ -d '{"registryId":"your-gitlab-registry-id","bucketId":"your-bucket-id","flowId":"your-flow-id","version":1}' \ --key your-key.key --cert your-cert.crt --cacert your-ca.crt Option 2: Hybrid Approach Use a combination of existing toolkit commands and REST API calls: Use toolkit commands for basic operations (deployment to a target NiFi instance) Use REST API calls for GitLabFlowRegistry-specific operations Option 3: Create Custom Scripts You could create wrapper scripts that combine the toolkit functionality with the necessary REST API calls for GitLabFlowRegistry operations: #!/bin/bash # Custom script to sync from GitLabFlowRegistry and deploy # First pull from Git registry via REST API curl -X POST "https://your-nifi-host:port/nifi-api/versions/process-groups/{processGroupId}/flow-registry-sync" \ -H "Content-Type: application/json" \ -d '{"registryId":"your-gitlab-registry-id","bucketId":"your-bucket-id","flowId":"your-flow-id","version":1}' \ --key your-key.key --cert your-cert.crt --cacert your-ca.crt # Then use toolkit commands for additional operations as needed ./bin/cli.sh nifi pg-import ... I recommend adopting Option 3 (custom scripts) for your CI/CD pipeline. This approach: Takes advantage of the new GitLabFlowRegistry feature Maintains compatibility with your existing toolkit-based CI/CD pipeline Provides flexibility to customize the process based on your specific requirements For implementing the REST API calls, consult the NiFi API documentation for the full set of endpoints related to the new GitLabFlowRegistry functionality. The NiFi Admin Guide for version 2.0 should also have details on these new REST endpoints. Happy hadooping
... View more