About benhadoop

benhadoop · ‎08-20-2019

Hi @R_SHETH, great to hear! Please mark the reply as the accepted answer if it solved your problem 🙂

benhadoop · ‎08-20-2019

Based on your output (auto mode in alternatives) you can try to add it with a higher priority (the 2.4 spark-submit has priority 10): /usr/sbin/alternatives --install /usr/bin/spark-submit spark-submit /opt/cloudera/parcels/SPARK2/bin/spark2-submit 100 Be aware that Cloudera Manager might (try to) overwrite this on CDH updates. Again, I would recommend using the packaged Spark that comes with CDH and have doubts that using CDS in CDH 6.x is supported (also see [CDS Requirements - CDH Versions] [Migrating Apache Spark Before Upgrading to CDH 6]) Regards Benjamin

benhadoop · ‎08-20-2019

You can override the "spark-submit" association with alternatives, e.g.: /usr/sbin/alternatives --set spark-submit <path-to-spark2-submit> # <path-to-spark2-submit> could be like "/opt/cloudera/parcels/SPARK2-2.2.0-cloudera1-cdh5.13.3.p0.611179/bin/spark2-submit" # This is a path of CDH 5 to the Spark parcel directory. You need to adjust it to your path (the one that spark2-shell is pointing to) # You can find this path with: /usr/sbin/alternatives --display spark2-submit However, from CDH 6.x, it is normal to use spark-submit instead of spark2-submit because there is only Spark2 included in CDH. Also, it is normal to use the packaged version inside of the CDH distribution, which seems to be 2.4.0 for your CDH 6.1.1. I am not sure, if using a different version is supported or recommended by Cloudera. How did you install that Spark 2.2 version in your CDH 6 cluster? Also see https://www.cloudera.com/documentation/enterprise/6/6.1/topics/spark.html

benhadoop · ‎08-20-2019

Hi @lsouvleros, as you already pointed out: this is influenced by a number of factors and widely influenced by your use case and existing organizational context. Comparing to an HDFS in a classic compute/storage-coupled Hadoop cluster, some of the discussions from here do also apply: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_sdx_vpc.html. This is, because Isilon is a network-attached storage and - similar to using Cloudera Virtual clusters - this has some implications on performance, especially for workloads with high-performance requirements. I have also seen environments where using Isilon instead of HDFS had impact on Impala performance. In terms of reliability and stability, you can argue each way - depending on your architecture. However, a multi-datacenter-deployment is likely to be more easy to realize with Isilon, due to its enterprise-proof replication and failover capabilities. In terms of efficiently using storage space, Isilon will have advantages. However, the higher cost compared to JBOD-based HDFS might make this point irrelevant. For scalability, I guess it depends again on your organizational setup. You can easily scale up Isilon by buying more boxes from EMC. There are certainly really large Isilon deployments out there. On the other hand, scaling HDFS is also not hard and can help you to realize huge deployments. In the end it will be a tradeoff of higher costs with Isilon but with more easy management vs. lower costs by higher efforts with HDFS. This is my personal opinion and both EMC and Cloudera might have stronger arguments for their respective storage (e.g. [EMC link]). You can also look for the latest announcement for the blog. Regards, Benjamin

benhadoop · ‎08-20-2019

Hi @pollard, looking at Impala Configs, you can find the /varz servlet at the debug WebUI of Impala Statestore, Catalogserver and Daemon. When you use the default ports these should be: http://state-store-host:25010/varz http://catalog-server-host:25020/varz http://impala-daemon-host:25000/varz On the Impala Daemon, you also have a servlet for Hadoop vars: http://impala-daemon-host:25000/hadoop-varz Besides these servlets, Impala also prints it flags (which you were asking for in the second paragraph) during startup in the INFO-logs of each service. This may help, if your debug WebUIs are disabled for security reasons. For instance for Impala daemon: /var/log/impalad/impalad.INFO I0819 17:43:55.279785 18999 logging.cc:156] Flags (see also /varz are on debug webserver): --catalog_service_port=26000 --catalog_topic_mode=full […] --symbolize_stacktrace=false --v=1 --vmodule= For other services, the /conf servlets at WebUIs or the Cloudera Manager configs (see other reply) mostly apply. If this or the first answer was helpful to you, please set it as accepted solution. Regards, Benjamin

benhadoop · ‎08-19-2019

Hi pollard, most services expose a "/conf" servlet in their WebUI which gives you the most complete set of actually used parameters. This should be the most promising source of thruth. Impala has a similar Servlet with path "/varz". The instance process view of Cloudera Manager shows you the actually distributed config files - which often helps a lot but does not include default values. You can reach it from a service (e.g. Impala) by clicking on "Instances" -> (the instance you want to see, e.g. Impala Catalog Server on a node) -> Processes Regards, Benjamin

benhadoop · ‎08-15-2019

I am using CDH/CM 6.2. Will update the cluster and test again. However, according to the docs, it should already work since 5.14.

benhadoop · ‎08-15-2019

Hi community, looking at security, I am in process of disabling any interfaces without proper authentication / authorization (or even encryption). I came across the debug web UIs of Cloudera Management services. According to https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_ports_cm.html, the debug WebUIs can be disabled by setting the port property to -1. This works for Reports Manager (8083), Event Server (8084), Navigator Audit Server (8089), Telemetry Publisher (10111). This does not work, however, for Service Monitor (8086 / 9086 TLS), Activity Monitor (8087 / 9087 TLS), Host Monitor (8091 / 9091 TLS). Setting port to -1 leads to non-starting services without a proper ERROR in the log file. Cloudera Manager agent even tries to check, if the server successfully bound to port -1 and runs into errors: [15/Aug/2019 12:06:03 +0000] 65646 Thread-14 process ERROR [918-cloudera-mgmt-HOSTMONITOR] Failed port check: Command '['ss', '-np', 'state', 'listening', '(', 'sport', '=', '-1', 'or', 'sport', '=', '9995', 'or', 'sport', '=', '9994', ')']' returned non-zero exit status 255 How do you disable the debug web UIs for those management services. Or is there a way to properly secure them by authentication and authorization? Thanks and best regards Benjamin

benhadoop · ‎10-18-2018

Well, I have solved it: Undeclared / auto-created queues seem not to inherit their preemption threshold / timeout settings from the parent queue but from the global default settings. These are definable in Cloudera Manager by selecting "Default Settings" in the "Dynamic Resource Pool Configuration" in Cloudera Manager. Applications in a queue, however, seem to inherit the preemption settings from the queue they are in. Leaving it here for other users searching such info.

benhadoop · ‎10-18-2018

Hi community, I am trying to isolate my users from each other in YARN (CDH 5.15.1). For that, I am using a queue root.users.<username>, in which each user is directed. The queues for users are not created, but will be created on submission - they are undeclared. To guarantee resources, I would like to active FairShare Preemption on all user queues, although they are undeclared. Enabling Preemption on the root.users Queue did achieve any preemption. I have set: - Preemption is activated globally - Fair Share Preemption Threshold of root.users is set to 0.5 - Fair Share Preemption Timeout of root.users is set to 5 - Preemptable of root.users is set to true In my test, the cluster resources are fully allocated to root.users.alice. Then bob submits a job to root.users.bob but does not receive any resources. Best, Benjamin

Online	Offline
Last Visited	‎09-02-2019 06:07 AM

Member Since	‎12-14-2015 11:46 PM
Last Visited	‎09-02-2019 06:07 AM
Posts	89
Kudos received	7

Cloudera Community

Re: spark-submit still pointing to Spark-version a...

Re: CM interface to see exact settings

Re: Fair Share Preemption for undeclared (user) qu...

Re: Oozie not passing Ambari Service checks -> YAR...

Re: Ranger for YARN RM: Not using group membership

Re: spark-submit still pointing to Spark-version a...

Re: spark-submit still pointing to Spark-version a...

Re: spark-submit still pointing to Spark-version a...

Re: Isilon HDFS vs CDH HDFS

Re: CM interface to see exact settings

Re: CM interface to see exact settings

Re: Disable Cloudera Management Debug WebUIs (Host...

Disable Cloudera Management Debug WebUIs (Host Mon...

Re: Fair Share Preemption for undeclared (user) qu...

Fair Share Preemption for undeclared (user) queues...