Member since
12-06-2022
32
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2752 | 06-08-2023 11:41 PM |
08-19-2025
01:50 AM
Hi, @quangbilly79 Yes, you can continue to use HDFS normally while the Balancer is running. The Balancer only moves replicated block copies between DataNodes to even out disk usage; it does not modify the actual data files. Reads and writes are fully supported in parallel with balancing, and HDFS ensures data integrity through replication and checksums. The process may add some extra network and disk load, so you might see reduced performance during heavy balancing. There is no risk of data corruption caused by the Balancer. You don’t need to wait — it’s safe to continue your normal operations.
... View more
04-22-2024
12:32 AM
1 Kudo
Hi @quangbilly79 any advanced config will be treated as CM advanced config, CM is overriding it, so I tried the same config in my test cluster and it will be stored in the process directory yarn-site.xml file. You can ssh to your Yarn NM node and check the latest yarn process directory for the same and find the shuffle property in it. Example path: /var/run/cloudera-scm-agent/process/415-yarn-NODEMANAGER/yarn-site.xml
... View more
02-26-2024
01:05 AM
1 Kudo
@quangbilly79, Thanks for reaching out to the Community. Your best course of action is to email certification@cloudera.com. The certification team will look into the incident and reply to you directly.
CC: @Dgati
... View more
10-19-2023
04:36 PM
1 Kudo
The node must have a NodeManager role to take part of the processing, Spark gateway, and Yarn Gateway
... View more
06-08-2023
11:41 PM
I've successfully setup Spark 3.3.0 on CDH 6.2 (we used YARN). Here are some important step 1. Back up the current spark come from Cloudera package (v2.4.0 I think) at /opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/spark 2. Download the spark version from Spark homepage, for ex "spark-3.3.0-bin-hadoop3.tgz". Extract, delete old spark folder and replace with new spark folder (rename it to "spark") at /opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/spark 3. Copy all the config files from old spark conf folder to the new spark conf folder 4. Copy the Yarn-related config file into spark conf folder too 4.1. Copy file spark-3.3.0-yarn-shuffle.jar from spark/yarn to spark/jars folder 5. Make some modifications to spark-default.conf file, mostly disable log and point to the right jar folder 6. Modify some yarn config like below (yarn-site.xml) 7. Restart the cluster and run spark-shell command. Run some queries for testing. You could modify the yarn-site.xml file in the spark conf folder directly to make sure.
... View more
05-19-2023
02:21 AM
Yes, User group mapping should be across the cluster nodes not only on name-node.
... View more
05-17-2023
07:57 PM
Turn out there will be two icons if you need to "redeploy client conf" If the blue icon below appears, means you have to tick the "redeploy client conf" button to restart the whole cluster If only this orange icon appears, mean you don't need to do that
... View more
05-15-2023
05:04 AM
1 Kudo
@quangbilly79 NO, you should not be adding gateway to every node. This gateway should only be installed on the edge/utility nodes, where you give access to external systems and users. These gateway nodes then are able to reach rest of the service(s) nodes.
... View more
05-14-2023
07:53 PM
I successfully installed it on 3 nodes. Normally you only need to install everything on 1 node (things like java, python you have to install on 3 nodes first of course). When go to the CM UI website, you can add another node and Cloudera will automatically install everything for you. In case you want to install things manually. Install all 3 packages "cloudera-manager-daemons", "cloudera-manager-agent", "cloudera-manager-server" on your main node, and for other nodes only install "cloudera-manager-daemons" "cloudera-manager-agent" and start these agent services. After that, you will see that two nodes are "managed" on the CM UI, meaning that you can skip the "Install Agent" step (since you've already installed "cloudera-manager-agent" and start it)
... View more
03-30-2023
04:31 AM
Hi @quangbilly79 Cloudera will support YARN and Kubernets deployment mode and it will not support Standalone mode (In standalone mode you can access the Spark Master using 7077 port). In order to check which node driver is launched and which node is executor is launched you need to go to Spark UI or Spark History Server UI of that application. From there go to Executors tab. You can see list of executors. In the second table you find executor id. Where the executor id is 'driver' that is the one Driver Node and remaining all are executors.
... View more