Member since
10-11-2022
133
Posts
20
Kudos Received
11
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 442 | 08-19-2025 01:50 AM | |
| 1893 | 11-07-2024 10:00 PM | |
| 2497 | 05-23-2024 11:44 PM | |
| 2197 | 05-19-2024 11:32 PM | |
| 9634 | 05-18-2024 11:26 PM |
05-05-2026
03:48 AM
1. Cloudera Operational Database backup guide: https://docs.cloudera.com/operational-database/cloud/managing-database/topics/cod-backing-up-table.html https://docs.cloudera.com/cdp-public-cloud/cloud/requirements-azure/topics/mc-az-minimal-setup-for-cloud-storage.html HBase snapshot export also covered in Runtime HBase backup docs. 2. Default backup location: Mainly stores cluster-level backups (Data Lake, FreeIPA, logs/telemetry). Not HBase table data. 3. Different container for manual backups: Use --snapshot-location abfss://othercontainer@account.dfs.core.windows.net/path in snapshot/export command. No direct Hadoop config edit needed in CDP Public Cloud. 4. Give HBase rights Assign Storage Blob Delegator role to HBase Managed Identity (at Storage Account level). Grant Storage Blob Data Owner/Contributor on target container. Set POSIX ACLs (Execute + Read/Write) on root and target path via Storage Explorer. Verify IDBroker mapping for HBase user. Test access with hdfs dfs -ls on target path. Enable "Allow trusted Microsoft services" if option available. @MintberryCrunch
... View more
05-05-2026
03:16 AM
Enabling a CMK at the environment level is meant for new encryption use in that environment, not for changing how already running services are encrypted. It should not disrupt existing CDE, CDF, or CML services that are already deployed and running. Existing services generally continue using the encryption setup they already have. The CMK choice is typically applied to new resources or new clusters created after the CMK is configured. In practice, the main impact is on future deployments, not on the current installed services. The CMK setting is usually a one-time environment configuration for that environment. @Lorenzo_F
... View more
03-17-2026
02:49 PM
Hello @APentyala Could be please let us know if the solution provided by @RAGHUY fixed your problem? If you still face same issue, let me know so we can help you.
... View more
02-08-2026
12:10 AM
@MarlinGomez For that CCA175 streaming scenario with inconsistent formats, cleansing/transforming to HDFS, better to go with Spark Structured Streaming + schema evolution as the most exam-realistic pick. It handles real-time ingestion efficiently via micro-batches, infers/evolves schemas on the fly (especially with JSON/Avro), and lets you apply transformations like filter/map before writing Parquet to HDFS. Separate ETL pipelines per format add too much complexity/overhead for exam constraints, and pure schema-on-read skips proactive cleansing. QuickStart with Kafka source, schema merging enabled: .option("mergeSchema", "true").writeStream... to HDFS.This nails the "perform ETL on data using Spark API" objective perfectly. Good luck on your prep.
... View more
01-09-2026
09:51 PM
@allen_chu FYI ➤ This issue—characterized by high CPU usage, a large number of threads stuck in DataXceiver, and a high load average—is a classic symptom of TCP socket leakage or connection hanging within the HDFS Data Transfer Protocol. ➤ Based on your top output and jstack, here is the detailed breakdown of what is happening and how to resolve it. ➤ Analysis of the Symptoms 1. CPU Saturation (99% per thread): Your top output shows dozens of DataXceiver threads consuming nearly 100% CPU each. This usually indicates that the threads are in a "busy-wait" or spinning state within the NIO epollWait call. 2. Stuck in epollWait: The jstack shows threads sitting in sun.nio.ch.EPollArrayWrapper.epollWait. While this is a normal state for a thread waiting for I/O, in your case, these threads are likely waiting for a packet from a client that has already disconnected or is "half-closed," but the DataNode hasn't timed out the connection. 3. Thread Exhaustion: With 792 threads, your DataNode is approaching its default dfs.datanode.max.transfer.threads limit (usually 4096, but often throttled by OS ulimit). As these threads accumulate, the DataNode loses the ability to accept new I/O requests, becoming unresponsive. ➤ Recommended Solutions 1. Increase Socket Timeouts (Immediate Fix) The most common cause is that the DataNode waits too long for a slow or dead client. You should tighten the transfer timeouts to force these "zombie" threads to close. => Update your hdfs-site.xml: dfs.datanode.socket.write.timeout: Default is often 0 (no timeout) or several minutes. Set this to 300000 (5 minutes). dfs.datanode.socket.reuse.keepalive: Set to true to allow better connection management. dfs.datanode.transfer.socket.send.buffer.size & recv.buffer.size: Ensure these are set to 131072 (128KB) to optimize throughput and prevent stalls. 2. Increase the Max Receiver Threads If your cluster handles high-concurrency workloads (like Spark or HBase), the default thread count might be too low. <property> <name>dfs.datanode.max.transfer.threads</name> <value>16384</value> </property> 3. Check for Network "Half-Closed" Connections Since the threads are stuck in read, it is possible the OS is keeping sockets in CLOSE_WAIT or FIN_WAIT2 states. a.] Check socket status: Run netstat -anp | grep 9866 | awk '{print $6}' | sort | uniq -c. b.] OS Tuning: Adjust the Linux kernel to more aggressively close dead connections. Add these to /etc/sysctl.conf: net.ipv4.tcp_keepalive_time = 600 net.ipv4.tcp_keepalive_intvl = 60 net.ipv4.tcp_keepalive_probes = 20 4. Address HDFS-14569 (Software Bug) Hadoop 3.1.1 is susceptible to a known issue where DataXceiver threads can leak during block moves or heavy balancer activity. Issue: DataXceiver fails to exit if a client stops sending data mid-packet but keeps the TCP connection open. Recommendation: If possible, upgrade to Hadoop 3.2.1+ or 3.3.x. These versions contain significantly improved NIO handling and better logic for terminating idle Xceivers. ➤ Diagnostic Step: Finding the "Bad" Clients To identify which clients are causing this, run this command on the DataNode: . netstat -atp | grep DataXceiver | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nr This will tell you which IP addresses are holding the most DataXceiver connections. If one specific IP (like a single Spark executor or a specific user's edge node) has hundreds of connections, that client's code is likely not closing DFSClient instances correctly.
... View more
11-13-2025
03:14 AM
Hello, Please try using the hdfs mover command. Refer: https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html#Mover_-_A_New_Data_Migration_Tool
... View more
09-05-2025
05:39 AM
1 Kudo
However, I was able to resolve this by leveraging ExecuteStreamCommand (ESC). Specifically, I used the Output Destination Attribute property to push the required attributes into it, which I can then process separately.
... View more
08-19-2025
10:33 PM
Is the source table a JdbcStorageHandler table? Please provide the DDL of the source table, the query used, and any sample data if possible. This information will help us understand the problem better. Also, validate the set -v command, especially configurations like hive.tez.container.size.
... View more
08-19-2025
07:10 AM
@RAGHUY Thank you! I figured that later but Router starting to fail with below error.I have the jaas.conf in place. Any help on this is appreciated. ERROR client.ZooKeeperSaslClient - SASL authentication failed using login context 'ZKDelegationTokenSecretManagerClient' with exception: {} javax.security.sasl.SaslException: Error in authenticating with a Zookeeper Quorum member: the quorum member's saslToken is null. at org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslToken(ZooKeeperSaslClient.java:312) at org.apache.zookeeper.client.ZooKeeperSaslClient.respondToServer(ZooKeeperSaslClient.java:275) at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:882) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:363) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223) 2025-08-19 20:45:37,097 ERROR curator.ConnectionState - Authentication failed 2025-08-19 20:45:37,098 INFO zookeeper.ClientCnxn - Unable to read additional data from server sessionid 0x1088d05c6550015, likely server has closed socket, closing socket connection and attempting reconnect 2025-08-19 20:45:37,098 INFO zookeeper.ClientCnxn - EventThread shut down for session: 0x1088d05c6550015 2025-08-19 20:45:37,212 ERROR imps.CuratorFrameworkImpl - Ensure path threw exception org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hdfs-router-tokens
... View more
08-19-2025
01:50 AM
Hi, @quangbilly79 Yes, you can continue to use HDFS normally while the Balancer is running. The Balancer only moves replicated block copies between DataNodes to even out disk usage; it does not modify the actual data files. Reads and writes are fully supported in parallel with balancing, and HDFS ensures data integrity through replication and checksums. The process may add some extra network and disk load, so you might see reduced performance during heavy balancing. There is no risk of data corruption caused by the Balancer. You don’t need to wait — it’s safe to continue your normal operations.
... View more