Member since
12-06-2022
29
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1362 | 06-08-2023 11:41 PM |
02-25-2024
11:07 PM
1 Kudo
I'm going to take a Cloudera Certificate Exam (CDP Data Engineer Exam CDP-3002). I have already read the FAQ section, but I still have some questions about it. Please understand that I'm just a fresh graduate student from a 3rd world country, so the 333$ cost is very high for me. So I need to make sure I am fully prepared before paying for the exam. 1. From payment, how long does it take until I receive an email with detailed information about the test? 2. From payment, how long does it take until the test starts? 3. The email will guide me in detail on the steps to start the exam, right? 4. I read through the FAQ section, is there a person called "proctor" who will instruct me on what to do before the exam? 5. I just need to install the Questionmark Secure software, right? 6. There are only multiplier choice questions and no exercise questions (I have to code), right? Sorry for all the questions, please understand for me because I'm very confused right now. There isn't much thing on the internet about this certificate exam.
... View more
Labels:
- Labels:
-
Cloudera Data Engineering (CDE)
07-24-2023
08:58 PM
I added some new nodes to my cluster and it works fine. Then I add Spark Gateway roles to all the new nodes. We're using Yarn to manage and distribute Spark work. Does adding Spark Gateway roles to new nodes enough to make Yarn think like "Hey there are some new nodes here, let's distribute some containers and work to these new nodes"? Or do I have to add Yarn Gateway roles to these new nodes too? How to make sure that Yarn will use these new nodes when executing jobs to reduce the overall workload of my cluster
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
06-08-2023
11:41 PM
I've successfully setup Spark 3.3.0 on CDH 6.2 (we used YARN). Here are some important step 1. Back up the current spark come from Cloudera package (v2.4.0 I think) at /opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/spark 2. Download the spark version from Spark homepage, for ex "spark-3.3.0-bin-hadoop3.tgz". Extract, delete old spark folder and replace with new spark folder (rename it to "spark") at /opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/spark 3. Copy all the config files from old spark conf folder to the new spark conf folder 4. Copy the Yarn-related config file into spark conf folder too 4.1. Copy file spark-3.3.0-yarn-shuffle.jar from spark/yarn to spark/jars folder 5. Make some modifications to spark-default.conf file, mostly disable log and point to the right jar folder 6. Modify some yarn config like below (yarn-site.xml) 7. Restart the cluster and run spark-shell command. Run some queries for testing. You could modify the yarn-site.xml file in the spark conf folder directly to make sure.
... View more
06-07-2023
10:00 PM
I'm using CDH 6.2. I add and change some Yarn config inside Cloudera Manager Web UI (related to yarn shuffle service), something like "NodeManager Advanced Configuration Snippet (Safety Valve) for yarn-site.xml" I preview the result and Cloudera shows that some lines will be added to yarn-site.xml. But after restarting everything, I check yarn-site.xml at "/etc/hadoop/conf.cloudera.yarn/yarn-site.xml", I can't find any setting above, even the word "shuffle" doesn't appear anywhere at all? I also check for other files like marped-site.xml or core-site.xml, no luck so far So where the heck do all the settings above add to? I'm confused. I need the proper yarn-site.xml file with all the settings just like in the Cloudera web UI
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
-
Cloudera Manager
06-05-2023
07:48 PM
Oh, I don't have user vega and group vega in my local OS at all
... View more
06-05-2023
07:28 PM
How do you check hive meta store version on Cloudera?
... View more
06-05-2023
02:05 AM
My manager forces me to find a way to install and use Spark 3 on CDH 6.x cluster. Is there any change? When I did some research, I found out that only CDP 7. supports Spark 3, and CDH 6.x only support Spark 2. But my manager said that you don't need to install Spark through Cloudera Manager, you can install Spark 3 separately (by downloading a tar from the internet or sth like that) and then find a way to make that Spark service connect with Cloudera service like Hive, HDFS,... (by copying the hive-site, hdfs-site,... to spark conf folder maybe?) So does anyone have any experience with this? My manager is insane!!!!
... View more
Labels:
- Labels:
-
Apache Spark
-
Cloudera Manager
05-18-2023
07:56 PM
Thank. The user hasn't existed at OS Level (Centos). I create the user and it's fine. Also, the created user must be in "admin.groups" in Sentry conf to have the privileges on the Grant Commands. Also, may I ask how Sentry recognize user/group? Does it take the user/group from Hue, or HDFS, or local OS (in the case of using Cloudera Cluster)? At first, I think it was the OS level, but I have some problems related to the user/group, seems like Sentry doesn't recognize properly user/group setting on the OS level. Or do I have to create the same user/group for all nodes in cluster, not just in the main name-node?
... View more
05-17-2023
11:48 PM
Where do you get/access to this kind of UI? I'm stuck with the user-group thingy when working with Sentry too.
... View more
05-17-2023
09:12 PM
I followed this guide on the Cloudera website, I've finished "Installing and Upgrading the Sentry Service" step. Now what do I do next? I tried to start Beeline and execute some queries as normal but get a privilege error kinit -k -t /home/vgdata/vega.keytab vega@BI.VEGA.COM (get tgt for user vega) beeline -u "jdbc:hive2://data-node01:10000/test;principal=hive/data-node01.vega.com@BI.VEGA.COM" (access beeline with kerberos principal) select * from test; (execute query) But I got an error like below Error: Error while compiling statement: FAILED: SemanticException No valid privileges User vega does not have privileges for SWITCHDATABASE The required privileges: Server=server1->Db=*->Table=+->Column=*... User "vega" is hdfs superuser, and also the main user that we use to connect to all Hadoop services (there is a Kerberos principal "vega" too) I tried to execute some commands like create role admin; grant role .... But all get an error like "No groups found for user vega" Now where do I start? Is there a default "admin" user that I can do everything (including grant...)? I want to grant user "vega" the "admin" role, which can do everything. Something like my SQL grant privilege command GRANT ALL PRIVILEGES ON database.table TO user; GRANT ALL PRIVILEGES ON *.* TO vega;
... View more
Labels:
- Labels:
-
Apache Sentry