Member since
01-19-2017
3676
Posts
632
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 495 | 06-04-2025 11:36 PM | |
| 1038 | 03-23-2025 05:23 AM | |
| 543 | 03-17-2025 10:18 AM | |
| 2040 | 03-05-2025 01:34 PM | |
| 1268 | 03-03-2025 01:09 PM |
07-18-2020
01:18 AM
@tanishq1197 Please can you copy and paste here exactly the command you run?
... View more
07-18-2020
01:16 AM
@borisgersh Sorry for misunderstanding you but you were neither specific nor clear in your earlier posting as being specific to Spark-shell etc, Increasing memory for spark interactively is done by using the --driver-memory option to set the memory for the driver process. Here are simple examples of executions of standalone [1 node] and Cluster executions these are version-specific Run spark-shell on spark installed on standalone mode spark version 1.2.0 ./spark-shell --driver-memory 2g Run spark-shell on spark installed on the cluster ./bin/spark-shell --executor-memory 4g Spark 1.2.0 you can set memory and cores by giving following arguments to spark-shell ./spark-shell --driver-memory 10G --executor-memory 15G --executor-cores 8 Using Spark version 2.4.5 to run the application locally on 8 cores ./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master local[8] \
/path/to/examples.jar \
100 Run-on a Spark standalone cluster in client deploy mode ./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://207.184.161.138:7077 \
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000 Run-on a Spark standalone cluster in cluster deploy mode with supervise ./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://207.184.161.138:7077 \
--deploy-mode cluster \
--supervise \
--executor-memory 20G \
--total-executor-cores 100 \
/path/to/examples.jar \
1000 Run-on a YARN cluster you will need to export HADOOP_CONF_DIR=XXX then ./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \ # can be client for client mode
--executor-memory 20G \
--num-executors 50 \
/path/to/examples.jar \
1000 So in the above examples, you have to adjust the below parameters --executor-memory
--total-executor-cores Get help commands to spark shell spark-shell --help Hope that answers your question on how to interactive increase available memory to be able to run rings around spark there is no better source than spark-docs. Happy hadooping
... View more
07-17-2020
04:04 PM
@borisgersh I think what is being said and misunderstood her is the runtime variables like in hive, where you can personalize your environment by executing a file at startup. See this example and many more online examples There are total of three available for holding variables. hiveconf - hive started with this, all the hive configuration is stored as part of this conf. Initially, variable substitution was not part of hive and when it got introduced, all the user-defined variables were stored as part of this as well. Which is definitely not a good idea. So two more namespaces were created. hivevar: To store user variables system: To store system variables. If you do not provide namespace as mentioned below, variable var will be stored in hiveconf namespace. set var="default_namespace"; So, to access this you need to specify hiveconf namespace select ${hiveconf:var}; Hope that helps
... View more
07-17-2020
03:24 PM
@Fawze In a Hadoop distribution ACLs are disabled by default. When ACLs are disabled, the NameNode rejects all attempts to set an ACL you will need to enable that manually in CM/Ambari. Enabling HDFS ACLs Using Cloudera Manager Go to the Cloudera Manager Admin Console and navigate to the HDFS service Click the Configuration tab. Select Scope > Service_name (Service-Wide) Select Category > Security Locate the Enable Access Control Lists property and select its checkbox to enable HDFS ACLs. [Enter a Reason for change, and then] Click Save Changes to commit the changes. The above sequence sets the dfs.namenode.acls.enabled property to true in the NameNode's hdfs-site.xml. This is a cluster-wide operation so only after doing the above you can run the setfacl or getfacl commands against an HDFS file etc <property>
<name>dfs.namenode.acls.enabled</name>
<value>true</value>
</property> The hdfs-site.xml is distributed by CM, when you make a change you are usually warned that there is a stale configuration and you are asked to restart the service this triggers the distribution of the new hdfs-site.xml to all nodes in the cluster and that's precisely the reason to use CM rather than manually editing. Happy hadooping
... View more
07-17-2020
01:37 PM
@Fawze AFAIK all changes made manually by editing the files using VI or nano is not persisted in the CM or Ambari database which means CDH or HDP is even not aware of that change so at startup it will get the values of the configuration from the persisted database. When you use CM/Ambari to change values the save button triggers an update in the underlying CM/Ambari tables that said it's NOT advisable to manually edit config files. Happy hadooping
... View more
07-17-2020
11:04 AM
@tanishq1197 I think your syntax is wrong because you have initiated a connection using the below snippet # sudo -u postgres psql The correct syntax is without need of prior log on $ sudo /opt/cloudera/cm/schema/scm_prepare_database.sh [options] <databaseType> <databaseName> <databaseUser> <password> Substituting the correct value s the below should run successfully with the default user/password scm/scm $ sudo /opt/cloudera/cm/schema/scm_prepare_database.sh postgresql scm scm scm Assuming you intend to use the derby databases postgres please see Setting up the Cloudera Manager Database Hope that helps
... View more
07-17-2020
10:36 AM
@florianc Ambari uses a backend database to store all the configuration and changes after the initial install. It can be a derby, Mysql, Oracle, or MariaDB like it is in my case. Ambari 3.1.0 has around 111 tables that reference each other through primary and foreign keys so it's very easy to write an efficient SQL to output the desired information once you target the right tables. In a nutshell, the tables are intuitively named like alert_*, blueprint_* Cluster_* host_* _repo_* topology_* etc Logically the Cluster family should be our focus. I spun a single node cluster to demo this I have changed the knox.token.ttl because if you have ever worked with Knox Admin the UI is timesout so fast quite annoying. Below is an Ambari Ui and Database proof to confirm that all changes are persisted in the Ambari back end database. Note It has happened to me in cases of upgrade where the upgrade is stuck in incomplete status I have ad to go and physically change the status in the database to bring my cluster to life 🙂 MariaDB [(none)]> use ambari;
MariaDB [ambari]> show tables; I zeroed on the serviceconfig table MariaDB [ambari]> describe serviceconfig; From the output, I easily chose my rows MariaDB [ambari]> select service_name ,user_name,note from serviceconfig; I then added a filter to grab the version and note to validate it against my Ambari UI MariaDB [ambari]> select service_name,version ,user_name,note from serviceconfig; So here is the Ambari UI screenshot Voila the confirmation that changes doe in Ambari UI is persisted to whatever backend database is plugged Ambari. I have used the same technic to get blueprints : Happy hadooping
... View more
07-14-2020
09:34 PM
@Sagar1244 Have you copied the Oracle jdbc jar file to /usr/hdp/current/zeppelin-server/interpreter/jdbc/ then configure the zeppelin jdbc interpreter as shown. Restart the interpreter and retest
... View more
07-14-2020
03:35 PM
@saur Previously I had written a comprehensive note on this issue but unfortunately, I can't locate it. I have just completed fresh documentation but since attachments were disabled, please download the document from my adobe share https://documentcloud.adobe.com/link/track?uri=urn:aaid:scds:US:ee72188c-cfb5-48f8-b1cc-e5eae799910b I am sure it will help you keep me posted. Happy hadooping
... View more