About quangbilly79

quangbilly79 · ‎08-06-2025

Just add a new node into the current cluster, and now I'm running the Balancer command to balance disk space into the new node. But the Balancer runs for so long, we're still waiting for it to complete before doing anything. We fear that reading/writing to HDFS while the Balancer is running will cause corrupted data files. Should we wait, or just work like normal

quangbilly79 · ‎02-25-2024

I'm going to take a Cloudera Certificate Exam (CDP Data Engineer Exam CDP-3002). I have already read the FAQ section, but I still have some questions about it. Please understand that I'm just a fresh graduate student from a 3rd world country, so the 333$ cost is very high for me. So I need to make sure I am fully prepared before paying for the exam. 1. From payment, how long does it take until I receive an email with detailed information about the test? 2. From payment, how long does it take until the test starts? 3. The email will guide me in detail on the steps to start the exam, right? 4. I read through the FAQ section, is there a person called "proctor" who will instruct me on what to do before the exam? 5. I just need to install the Questionmark Secure software, right? 6. There are only multiplier choice questions and no exercise questions (I have to code), right? Sorry for all the questions, please understand for me because I'm very confused right now. There isn't much thing on the internet about this certificate exam.

quangbilly79 · ‎07-24-2023

I added some new nodes to my cluster and it works fine. Then I add Spark Gateway roles to all the new nodes. We're using Yarn to manage and distribute Spark work. Does adding Spark Gateway roles to new nodes enough to make Yarn think like "Hey there are some new nodes here, let's distribute some containers and work to these new nodes"? Or do I have to add Yarn Gateway roles to these new nodes too? How to make sure that Yarn will use these new nodes when executing jobs to reduce the overall workload of my cluster

quangbilly79 · ‎06-08-2023

I've successfully setup Spark 3.3.0 on CDH 6.2 (we used YARN). Here are some important step 1. Back up the current spark come from Cloudera package (v2.4.0 I think) at /opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/spark 2. Download the spark version from Spark homepage, for ex "spark-3.3.0-bin-hadoop3.tgz". Extract, delete old spark folder and replace with new spark folder (rename it to "spark") at /opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/spark 3. Copy all the config files from old spark conf folder to the new spark conf folder 4. Copy the Yarn-related config file into spark conf folder too 4.1. Copy file spark-3.3.0-yarn-shuffle.jar from spark/yarn to spark/jars folder 5. Make some modifications to spark-default.conf file, mostly disable log and point to the right jar folder 6. Modify some yarn config like below (yarn-site.xml) 7. Restart the cluster and run spark-shell command. Run some queries for testing. You could modify the yarn-site.xml file in the spark conf folder directly to make sure.

quangbilly79 · ‎06-07-2023

I'm using CDH 6.2. I add and change some Yarn config inside Cloudera Manager Web UI (related to yarn shuffle service), something like "NodeManager Advanced Configuration Snippet (Safety Valve) for yarn-site.xml" I preview the result and Cloudera shows that some lines will be added to yarn-site.xml. But after restarting everything, I check yarn-site.xml at "/etc/hadoop/conf.cloudera.yarn/yarn-site.xml", I can't find any setting above, even the word "shuffle" doesn't appear anywhere at all? I also check for other files like marped-site.xml or core-site.xml, no luck so far So where the heck do all the settings above add to? I'm confused. I need the proper yarn-site.xml file with all the settings just like in the Cloudera web UI

quangbilly79 · ‎06-05-2023

My manager forces me to find a way to install and use Spark 3 on CDH 6.x cluster. Is there any change? When I did some research, I found out that only CDP 7. supports Spark 3, and CDH 6.x only support Spark 2. But my manager said that you don't need to install Spark through Cloudera Manager, you can install Spark 3 separately (by downloading a tar from the internet or sth like that) and then find a way to make that Spark service connect with Cloudera service like Hive, HDFS,... (by copying the hive-site, hdfs-site,... to spark conf folder maybe?) So does anyone have any experience with this? My manager is insane!!!!

quangbilly79 · ‎05-18-2023

Thank. The user hasn't existed at OS Level (Centos). I create the user and it's fine. Also, the created user must be in "admin.groups" in Sentry conf to have the privileges on the Grant Commands. Also, may I ask how Sentry recognize user/group? Does it take the user/group from Hue, or HDFS, or local OS (in the case of using Cloudera Cluster)? At first, I think it was the OS level, but I have some problems related to the user/group, seems like Sentry doesn't recognize properly user/group setting on the OS level. Or do I have to create the same user/group for all nodes in cluster, not just in the main name-node?

quangbilly79 · ‎05-17-2023

I followed this guide on the Cloudera website, I've finished "Installing and Upgrading the Sentry Service" step. Now what do I do next? I tried to start Beeline and execute some queries as normal but get a privilege error kinit -k -t /home/vgdata/vega.keytab [email protected] (get tgt for user vega) beeline -u "jdbc:hive2://data-node01:10000/test;principal=hive/[email protected]" (access beeline with kerberos principal) select * from test; (execute query) But I got an error like below Error: Error while compiling statement: FAILED: SemanticException No valid privileges User vega does not have privileges for SWITCHDATABASE The required privileges: Server=server1->Db=*->Table=+->Column=*... User "vega" is hdfs superuser, and also the main user that we use to connect to all Hadoop services (there is a Kerberos principal "vega" too) I tried to execute some commands like create role admin; grant role .... But all get an error like "No groups found for user vega" Now where do I start? Is there a default "admin" user that I can do everything (including grant...)? I want to grant user "vega" the "admin" role, which can do everything. Something like my SQL grant privilege command GRANT ALL PRIVILEGES ON database.table TO user; GRANT ALL PRIVILEGES ON *.* TO vega;

quangbilly79 · ‎05-17-2023

Turn out there will be two icons if you need to "redeploy client conf" If the blue icon below appears, means you have to tick the "redeploy client conf" button to restart the whole cluster If only this orange icon appears, mean you don't need to do that

quangbilly79 · ‎05-17-2023

When I make some changes to some service configurations, Cloudera asks me to "restart stale service". And there is a window pop-up asking me should I "Redeploy Client Configuration" too like below: The default option is No (not tick), so does this mean this is optional? When should I tick this? Since if I tick this, it will take a long time to restart everything, even I only make a small change. I wonder if is it necessary.

Online	Offline
Last Visited	‎03-09-2026 07:03 PM

Member Since	‎12-06-2022 05:51 PM
Last Visited	‎03-09-2026 07:03 PM
Posts	32
Kudos received	1

Cloudera Community

Re: Is there any chance to use Spark 3 on CDH 6.x ...

Can I still use HDFS like normal when HDFS Balance...

Question about Cloudera Data Engineering Certifica...

How to make Yarn deploy resources to new added nod...

Re: Is there any chance to use Spark 3 on CDH 6.x ...

Change Yarn config on Cloudera Manager Web UI does...

Is there any chance to use Spark 3 on CDH 6.x clus...

Re: I installed Sentry on Cluster, now where to st...

I installed Sentry on Cluster, now where to start?

Re: Should I tick on the "Redeploy Client Configur...

Should I tick on the "Redeploy Client Configuratio...