About RAGHUY

Ghilani · ‎09-05-2025

However, I was able to resolve this by leveraging ExecuteStreamCommand (ESC). Specifically, I used the Output Destination Attribute property to push the required attributes into it, which I can then process separately.

ggangadharan · ‎08-19-2025

Is the source table a JdbcStorageHandler table? Please provide the DDL of the source table, the query used, and any sample data if possible. This information will help us understand the problem better. Also, validate the set -v command, especially configurations like hive.tez.container.size.

Hadoop16 · ‎08-19-2025

@RAGHUY Thank you! I figured that later but Router starting to fail with below error.I have the jaas.conf in place. Any help on this is appreciated. ERROR client.ZooKeeperSaslClient - SASL authentication failed using login context 'ZKDelegationTokenSecretManagerClient' with exception: {} javax.security.sasl.SaslException: Error in authenticating with a Zookeeper Quorum member: the quorum member's saslToken is null. at org.apache.zookeeper.client.ZooKeeperSaslClient.createSaslToken(ZooKeeperSaslClient.java:312) at org.apache.zookeeper.client.ZooKeeperSaslClient.respondToServer(ZooKeeperSaslClient.java:275) at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:882) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:101) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:363) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223) 2025-08-19 20:45:37,097 ERROR curator.ConnectionState - Authentication failed 2025-08-19 20:45:37,098 INFO zookeeper.ClientCnxn - Unable to read additional data from server sessionid 0x1088d05c6550015, likely server has closed socket, closing socket connection and attempting reconnect 2025-08-19 20:45:37,098 INFO zookeeper.ClientCnxn - EventThread shut down for session: 0x1088d05c6550015 2025-08-19 20:45:37,212 ERROR imps.CuratorFrameworkImpl - Ensure path threw exception org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hdfs-router-tokens

RAGHUY · ‎08-19-2025

Hi, @Hz In HDFS 2.7.3, setting a storage policy on a directory does not immediately place new blocks directly into the target storage (e.g., ARCHIVE). New writes still go to default storage (usually DISK), and the Mover process is required to relocate both existing and newly written blocks to comply with the policy. The storage policy only marks the desired storage type, but actual enforcement happens through the Mover. This is expected behavior and you did not miss any configuration. There’s no way in 2.7.3 to bypass the Mover and force blocks to land directly in cold storage on write. Later Hadoop versions introduced improvements, but for your version, running the Mover is required.

RAGHUY · ‎08-19-2025

Hi, @quangbilly79 Yes, you can continue to use HDFS normally while the Balancer is running. The Balancer only moves replicated block copies between DataNodes to even out disk usage; it does not modify the actual data files. Reads and writes are fully supported in parallel with balancing, and HDFS ensures data integrity through replication and checksums. The process may add some extra network and disk load, so you might see reduced performance during heavy balancing. There is no risk of data corruption caused by the Balancer. You don’t need to wait — it’s safe to continue your normal operations.

RAGHUY · ‎08-19-2025

Hi, @allen_chu Your jstack shows many DataXceiver threads stuck in epollWait, meaning the DataNode is waiting on slow or stalled client/network I/O. Over time, this exhausts threads and makes the DataNode unresponsive. Please check network health and identify if certain clients (e.g., 172.18.x.x) are holding connections open. Review these configs in hdfs-site.xml: dfs.datanode.max.transfer.threads, dfs.datanode.socket.read.timeout, and dfs.datanode.socket.write.timeout to ensure proper limits and timeouts. Increasing max threads or lowering timeouts often helps. Also monitor for stuck jobs on the client side.

RAGHUY · ‎07-16-2025

If you're using Conda Create the environment conda create -n pyspark_env python=3.9 numpy Activate it conda activate pyspark_env Tell Spark to use it export PYSPARK_PYTHON=$(which python) export PYSPARK_DRIVER_PYTHON=$(which python)

VidyaSargur · ‎11-17-2024

@Bhavs, Did the response help resolve your query? If it did, kindly mark the relevant reply as the solution, as it will aid others in locating the answer more easily in the future.

flashone · ‎11-11-2024

Thanks ~ Good Answer~

DianaTorres · ‎06-03-2024

@sibin Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.

Online	Offline
Last Visited	‎09-28-2025 10:47 PM

Member Since	‎10-11-2022 11:06 PM
Last Visited	‎09-28-2025 10:47 PM
Posts	128
Kudos received	20

Cloudera Community

Re: Can I still use HDFS like normal when HDFS Bal...

Re: Does the service stop if a disk io error occur...

Re: HBase Authorization Issue

Re: Insert Into Multiple Partitions with one Query

Re: java.lang.ClassNotFoundException: org.apache.n...

Re: How to update FlowFile attributes in Python no...

Re: NiFI Failed to update Hive for FlowFile; java....

Re: HDFS router based federation daemon failing to...

Re: Do new data arriving into a cold storage folde...

Re: Can I still use HDFS like normal when HDFS Bal...

Re: DataXceiver threads stuck with high CPU usage

Re: How to import Numpy and other libraries when u...

Re: How to disable kerberos in hive once it has be...

Re: Does the service stop if a disk io error occur...

Re: Cloudera after setting up custom kerberos for ...