Member since
01-19-2017
3682
Posts
633
Kudos Received
373
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1731 | 06-04-2025 11:36 PM | |
| 2166 | 03-23-2025 05:23 AM | |
| 1035 | 03-17-2025 10:18 AM | |
| 3963 | 03-05-2025 01:34 PM | |
| 2710 | 03-03-2025 01:09 PM |
06-01-2026
09:48 PM
A few things stand out from the numbers you've shared. On a 64-core machine, ingesting ~11 million rows in 17 minutes (around 10–11K rows/sec) is significantly below what I'd expect if the workload were effectively parallelized. Before focusing on CPU count, I'd investigate where the bottleneck actually is. Some areas worth checking: Storage throughput: Is the data being written to local SSDs, network-attached storage, or slower disks? Ingest workloads are often I/O-bound rather than CPU-bound. File size and partitioning strategy: Large numbers of small files can severely impact write performance. Compression settings: Certain codecs provide better compression but consume more CPU during ingest. Thread parallelism: Verify that the ingestion framework is actually utilizing all available cores rather than being limited by a small worker pool. Memory pressure and GC activity: If the JVM is spending significant time in garbage collection, additional CPU cores won't help much. Network throughput: If data is being pulled from a remote source, the bottleneck may be upstream rather than on the ingest node itself. I'd also recommend collecting: CPU utilization during ingest Disk IOPS and throughput metrics Memory usage and GC logs Number of concurrent ingest tasks Average file size being generated One quick diagnostic is to look at overall CPU utilization. If the machine is only using 10–20% of available CPU during ingest, then the workload is likely blocked on I/O, synchronization, network transfers, or application-level limits rather than raw compute capacity. Can you share: Which ingestion tool/framework you're using? The storage type (SSD, NVMe, HDD, cloud volume, etc.)? Average CPU utilization during the 17-minute ingest? Whether the target table is partitioned and, if so, by what key? Those details would make it much easier to determine whether the bottleneck is CPU, disk, network, or configuration-related.
... View more
05-12-2026
05:34 AM
@AlokKumar You can validate the Discovery URL is working by opening it in a browser you should see a JSON document.
... View more
05-11-2026
01:58 AM
@Abhijith_Nayak @Sofiane-CH Try below steps for Default Resource Pool Selection. 1. In your Cloudera Manager page, navigate to Clusters (Left side blue pane) > Impala Admission Control Configuration. 2. Click "Default Settings". Select "Allow these users and groups to submit to this pool". Add users or groups separated by comma with no extra spaces. For e.g: A,B,C,D,E 3. Now, click "Edit" in your respective resource pools. In tab "Submission Access Control". Select "Allow these users and groups to submit to this pool". Now, again add users or groups separated by comma with no extra spaces. Below is an example for your mentioned pools resource_pool_1 and resource_pool_2: resource_pool_1 --> A,B,C resource_pool_2 --> D,E 4. Once, the above is done, click "Refresh Dynamic Resource Pools" and restart Impala. References: https://docs.cloudera.com/runtime/7.3.1/impala-manage/topics/impala-dynamic-pool-configure.html https://docs.cloudera.com/runtime/7.3.1/impala-manage/topics/impala-dynamic-pool-settings.html
... View more
01-10-2026
12:24 AM
@rizalt FYI ➤It appears your cluster is experiencing a Quorum Failure, where the critical metadata services (NameNode, JournalNodes, and ZooKeeper) are losing the ability to maintain a majority when one Data Center (DC) goes offline. ➤ Analyzing Your Failure In a High Availability (HA) setup, the NameNodes rely on a "Quorum" of JournalNodes (JN) and ZooKeeper (ZK) nodes to stay alive. If you have 5 JNs, you must have at least 3 running for either NameNode to function. Based on your diagram: The Problem: If you split your 5 JournalNodes evenly (e.g., 2 in DC1 and 3 in DC2), and the DC with 3 JNs goes down, the remaining 2 JNs cannot form a quorum. This causes both NameNodes to shut down immediately to prevent data corruption ("Split Brain"). DataNodes vs. NameNodes: In HDFS, the number of DataNodes (DN) that fail has no direct impact on whether the NameNode stays "Up" or "Down." You could lose 15 out of 16 DataNodes, and the NameNode should still stay active. The fact that your NameNodes are crashing when 8 DNs (one full server/DC) go down proves that your Quorum nodes (JN/ZK) are failing, not the DataNodes. ➤ The Solution: Quorum Placement To survive a full Data Center failure (50% of your physical infrastructure), you cannot rely on an even split of nodes. You need a third location (a "Witness" site) or an asymmetric distribution. 1. The 3-Site Strategy (Recommended) To handle a 1-DC failure with 5 JournalNodes and 5 ZooKeeper nodes, place them as follows: -DC 1: 2 JN, 2 ZK -DC 2: 2 JN, 2 ZK -Site 3 (Witness): 1 JN, 1 ZK (This can be a very small virtual machine or cloud instance). Why this works: If DC1 or DC2 fails, the remaining site + the Witness site equals 3 nodes, which satisfies the quorum ($3 > 5/2$). 2. Maximum DataNode Failure Theoretically: You can lose all but 1 DataNode ($N-1$), and the NameNode will stay "Active." Practically: If you have a replication factor of 3, and you lose 50% of your nodes, many blocks will become "Under-replicated," and some may become "Missing" if all three copies were in the DC that died. Solution: Ensure your Rack Awareness is configured so HDFS knows which nodes belong to which DC. This forces HDFS to keep at least one copy of data in each DC. ➤ Why 11 JN and 11 ZK didn't work Increasing the number of nodes to 11 actually makes the cluster more fragile if they are only placed in two locations. With 11 nodes, you need 6 to be alive to form a quorum. If you have 5 in DC1 and 6 in DC2, and DC2 fails, the 5 remaining nodes in DC1 cannot reach the 6-node requirement. ➤ Checklist for Survival Reduce to 5 JNs and 5 ZKs: Too many nodes increase network latency and management overhead. Add a 3rd Location: Even a single low-power node in a different building or cloud region to act as the "tie-breaker." Check dfs.namonode.edits.dir: Ensure the NameNodes are configured to point to all JournalNodes by URI. ZooKeeper FC: Ensure DFS ZK Failover Controller is running on both NameNode hosts.
... View more
12-25-2025
12:35 AM
@Shelton still same JDK version used in the Hadoop cluster( for both 7.1.9 & 7.3.1 ). dev - 7.3.1 -> java --version openjdk 17.0.16 2025-07-15 LTS OpenJDK Runtime Environment (Red_Hat-17.0.16.0.8-1) (build 17.0.16+8-LTS) OpenJDK 64-Bit Server VM (Red_Hat-17.0.16.0.8-1) (build 17.0.16+8-LTS, mixed mode, sharing) Prod 7.1.9 -> java -version openjdk version "1.8.0_462" OpenJDK Runtime Environment (build 1.8.0_462-b08) OpenJDK 64-Bit Server VM (build 25.462-b08, mixed mode) https://supportmatrix.cloudera.com/ https://docs.cloudera.com/cdp-private-cloud-base/7.3.1/cdp-private-cloud-base-installation/topics/cdpdc-java-requirements.html From cloudera docs: If you are using JDK 17 on your cluster, you must add the following JVM options to the service: --add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED
--add-exports=java.management/com.sun.jmx.mbeanserver=ALL-UNNAMED
--add-exports=java.base/sun.net.dns=ALL-UNNAMED
--add-exports=java.base/sun.net.util=ALL-UNNAMED
to ensure the jobs run successfully. any guidance on this might the issue for job failures.
... View more
09-26-2025
04:06 AM
@Shelton Thank you for the detailed answer, much appreciated !
... View more
06-05-2025
12:37 AM
@sydney- The SSL handshake error you're encountering is a common issue when connecting NiFi instances to NiFi Registry in secure environments it indicates that your NiFi instances cannot verify the SSL certificate presented by the NiFi Registry server. javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider. certpath.SunCertPathBuilder
Exception:
unable to find valid certification path to requested target Based on your description, there are several areas to address. The certificate used by NiFi Registry is self-signed or not issued by a trusted Certificate Authority (CA) The certificate chain is incomplete The truststore configuration is incorrect 1. Certificate Trust Configuration Verify Certificate Chain: # Check if certificate is in NiFi truststore (repeat for each instance)
keytool -list -v -keystore /path/to/nifi/truststore.jks -storepass [password]
# Check if certificate is in Registry truststore
keytool -list -v -keystore /path/to/registry/truststore.jks -storepass [password]
# Verify the Registry's certificate chain
openssl s_client -connect nifi-registry.example.com:443 -showcerts Ensure Complete Certificate Chain: Add the Registry's complete certificate chain (including intermediate CAs) to NiFi's truststore Add NiFi's complete certificate chain to the Registry's truststore # Add Registry certificate to NiFi truststore
keytool -import -alias nifi-registry -file registry-cert.pem -keystore /path/to/nifi/conf/truststore.jks -storepass [password]
# Add NiFi certificate to Registry truststore
keytool -import -alias nifi-prod -file nifi-cert.pem -keystore /path/to/registry/conf/truststore.jks -storepass [password] 2. Proper Certificate Exchange Ensure you've exchanged certificates correctly export NiFi Registry's public certificate keytool -exportcert -alias nifi-registry -keystore /path/to/registry/keystore.jks -file registry.crt -storepass [password] Import this certificate into each NiFi instance's truststore keytool -importcert -alias nifi-registry -keystore /path/to/nifi/truststore.jks -file registry.crt -storepass [password] -noprompt 3. NiFi Registry Connection Configuration In your NiFi instance (nifi.properties), verify # Registry client properties
nifi.registry.client.name=NiFi Registry
nifi.registry.client.url=https://nifi-registry.example.com/nifi-registry
nifi.registry.client.timeout.connect=30 secs
nifi.registry.client.timeout.read=30 secs Verify these configuration files in NiFi (production/development) # nifi.properties:
nifi.registry.client.ssl.protocol=TLS
nifi.registry.client.truststore.path=/path/to/truststore.jks
nifi.registry.client.truststore.password=[password]
nifi.registry.client.truststore.type=JKS In NiFi Registry # nifi-registry.properties:
nifi.registry.security.truststore.path=/path/to/truststore.jks
nifi.registry.security.truststore.password=[password]
nifi.registry.security.truststore.type=JKS 4. LDAP Configuration For your LDAP integration issues in authorizers.xml ensure you have <accessPolicyProvider>
<identifier>file-access-policy-provider</identifier>
<class>org.apache.nifi.registry.security.authorization.FileAccessPolicyProvider</class>
<property name="User Group Provider">ldap-user-group-provider</property>
<property name="Authorizations File">./conf/authorizations.xml</property>
<property name="Initial Admin Identity">cn=admin-user,ou=users,dc=example,dc=com</property>
<property name="NiFi Identity 1">cn=dev-nifi,ou=servers,dc=example,dc=com</property>
</accessPolicyProvider> In the authorizations.xml add appropriate policies for the dev-nifi identity <policy identifier="some-uuid" resource="/buckets" action="READ">
<user identifier="dev-nifi-uuid"/>
</policy> 5. Proxy Configuration For proxy user requests, add in nifi.properties nifi.registry.client.proxy.identity=cn=dev-nifi,ou=servers,dc=example,dc=com 6. Restart Order After making changes restart the Nifi instance in the below order NiFi Registry first Then restart all NiFi instances Happy hadoping
... View more
04-29-2025
08:15 AM
@Shelton We just followed Steps 1,3 4 and 5 to generate the automated report to Elasticsearch. It was pretty straight forward. Only things is we had to do was enable firewall in our Docker container and update Input Port's Access Policies. Thanks
... View more
04-28-2025
07:05 AM
@Shelton Please read my previous answer carefully. None of the properties provided by you are in hbase codebase
... View more