Member since
12-06-2022
10
Posts
1
Kudos Received
0
Solutions
02-01-2023
05:49 PM
I'm using a tool in which I have to point out the master node (driver node) of the Cloudera Spark Cluster (spark :// <some-spark-master> : 7077). Also as I learned, Spark has "Master Node", "Driver Node" and "Worker Nodes". So I decided to go to the Cloudera Web Manager and checked the Configuration Tab of the Spark service, but all I found are "Gateway instance" and "History Server instance". Where are the "Driver instance" and "Worker instance"? I can't add these two instances in the "Add Role Instances" too My guess is that it's in Yarn service configuration, but I can't find anything related to "Master", "Driver" or "Worker" either. So what is the link to "Spark Master" that ends with 7077 (what is the Node)? I can't find it anywhere in the Configuration tab
... View more
Labels:
- Labels:
-
Apache Spark
01-31-2023
06:14 PM
I'm using a tool in which I have to point out the master node (driver node) of the Cloudera Spark Cluster (spark :// <some-spark-master> : 7077). Also as I learned, Spark has "Master Node" (Driver Node) and "Worker Nodes". So I decided to go to the Cloudera Web Manager and checked the Configuration Tab of the Spark service, but all I found are "Gateway instance" and "History Server instance". Where are the "Driver instance" and "Worker instance"? I can't add these two instances in the "Add Role Instances" too My guess is that it's in Yarn service configuration, but I can't find anything related to "Master"/"Driver" or "Worker" either. So what is the link to "Spark Master" that ends with 7077? I can't find it anywhere in the Configuration tab
... View more
Labels:
- Labels:
-
Apache Spark
12-28-2022
10:55 PM
What is Kafka Gateway and Kafka MirrorMaker when add Role Instances to Kafka? I created a Kafka service on Cloudera Manager. Now I want to add a new Kafka broker instance inside the Kafka service. I followed the guide on the internet. Choose Instances -> Add Role Instances A new window comes up. But I notice that I can only add Kafka Brokers Instances (this guide said that I can add Kafka Connect Instance too). Also, there are two other instances called Gateway and Mirrormaker, which I don't know what are they? I search google but only find some info about Kafka MirrorMaker but no luck on finding anything about Kafka Gateway
... View more
Labels:
- Labels:
-
Apache Kafka
12-28-2022
07:39 PM
Hi. Our company has already had Kafka and Zookeeper Instances on Cloudera. But it lacks of some useful functions that Confluent has for handling streaming data (mostly from Kafka Connect) so we want to use Confluent with Kafka instance from Cloudera. But I don’t know where to start. I read some guides on the Confluent homepage (link) but it’s for local installation with its own Zookeeper and Kafka. So I wonder is there any way to “integrate” Confluent with our Kafka/Zookeeper from Cloudera? Does anyone has already done this before and shown me how to do it? We're using Cloudera 6.2.0, I think it comes with Apache Kafka 2.1.0
... View more
Labels:
- Labels:
-
Apache Kafka
12-20-2022
06:08 PM
Hi. I want to know Where is the Jar folder for Spark in Cloudera? In my previous company, we just put all the needed jars inside $SPARK_HOME/jar folder (on every node), so that we don't have to worry much about the --jars, --packages,... when running the spark-submit job. Also, it saves lots of disk space and time since we don't need to include every package when building a jar. But in my new company, which uses Cloudera, I don't know where is this jar folder. I found 2 places (maybe not the right one): - /opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/jars - /opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/spark/jars Where should I put the needed jars file on? It seems like all the jars file on the second one are linked to the first one. Also, I found dozens of lib/jar folders everywhere in Cloudera. Or is there any other way to do this with Cloudera? I read some guides about modified Spark configs on the Cloudera manager on the internet.
... View more
- Tags:
- cdh
- jar folder
- Spark
Labels:
- Labels:
-
Apache Spark
12-18-2022
10:41 PM
I encountered the same problem, any solutions?
... View more
12-08-2022
08:15 PM
Nah I figure it out. First, go to /etc/spark/conf.cloudera.spark_on_yarn/classpath.txt then delete the last line (which contains the path to hbase-class.jar). Then you download hbase-spark-1.0.0.7.2.15.0-147.jar, then when you run spark-shell, add --jars pathToYourDownloadedjar, then you add option("hbase.spark.pushdown.columnfilter", false) before load data like this: val sql = spark.sqlContext val df = sql.read.format("org.apache.hadoop.hbase.spark").option("hbase.columns.mapping", "name STRING :key, email STRING c:email, " + "birthDate STRING p:birthDate, height FLOAT p:height").option("hbase.table", "person").option("hbase.spark.use.hbasecontext", false).option("hbase.spark.pushdown.columnfilter", false).load() df.createOrReplaceTempView("personView") val results = sql.sql("SELECT * FROM personView where name = 'alice'") results.show()
... View more
12-06-2022
06:02 PM
I read a Document guide of Cloudera on this Schedule Job link. The problem is I don't have access to " Cloudera Data Platform (CDP) management console" (which looks like below): . I only have access to Cloudera Web server UI on xxx.xxx.xxx.xxx:7180, which looks like this: Please note that we run Cloudera on a host machine that runs centos, and I have to ssh to that machine there isn't any UI like the first picture, there is only a black window with line commands. I only have the webserver. Same problem for many other guides on the website, they're always require you have access to CDP management console UI
... View more
- Tags:
- CDP
- UI
- web server
Labels: