Member since
12-14-2015
89
Posts
7
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
592 | 08-20-2019 04:30 AM | |
773 | 08-20-2019 12:29 AM | |
1239 | 10-18-2018 05:32 AM | |
1038 | 12-15-2016 10:52 AM | |
244 | 11-10-2016 09:21 AM |
08-20-2019
06:33 AM
1 Kudo
Hi @EranK, can you please double-check the Windows Server version with your Active Directory team? The releases of Windows Server include 2012 and 2012 RC2 but not 2013. Hence, you might be using Windows Server 2012 (or 2012 RC2) which fits to the referenced documentation page. While I can not provide you with a support matrix, from personal experience I know that a Windows Server 2012 KDC does work together with Cloudera. However, pay close attention to the chosen encryption types to choose ones that are supported / activated in your specific Active Directory. Regards Benjamin
... View more
08-20-2019
06:11 AM
While the output of the log does not provide any insights (are these the logs before the server crashed?) the journalctl output could hint a OOM of the Cloudera Manager Server. You can try to change the heap to 4GB and test if the behaviour persists. To do so, alter /etc/default/cloudera-scm-server and change -Xmx2G to -Xmx4G in this line: export CMF_JAVA_OPTS="-Xmx2G -XX:MaxPermSize=256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp"
... View more
08-20-2019
05:56 AM
Hi @hiveexport, can you please clarify, if you ran this command exactly as you wrote it or which parts were anonymized to hide sensitive values?
... View more
08-20-2019
05:19 AM
Hi @R_SHETH, great to hear! Please mark the reply as the accepted answer if it solved your problem 🙂
... View more
08-20-2019
04:35 AM
Hi YurriL, can you please provide the latest output from /var/log/cloudera-scm-server/cloudera-scm-server.log any errors or warnings from "journalctl -xe"
... View more
08-20-2019
04:30 AM
Based on your output (auto mode in alternatives) you can try to add it with a higher priority (the 2.4 spark-submit has priority 10): /usr/sbin/alternatives --install /usr/bin/spark-submit spark-submit /opt/cloudera/parcels/SPARK2/bin/spark2-submit 100 Be aware that Cloudera Manager might (try to) overwrite this on CDH updates. Again, I would recommend using the packaged Spark that comes with CDH and have doubts that using CDS in CDH 6.x is supported (also see [CDS Requirements - CDH Versions] [Migrating Apache Spark Before Upgrading to CDH 6]) Regards Benjamin
... View more
08-20-2019
01:56 AM
You can override the "spark-submit" association with alternatives, e.g.: /usr/sbin/alternatives --set spark-submit <path-to-spark2-submit>
# <path-to-spark2-submit> could be like "/opt/cloudera/parcels/SPARK2-2.2.0-cloudera1-cdh5.13.3.p0.611179/bin/spark2-submit"
# This is a path of CDH 5 to the Spark parcel directory. You need to adjust it to your path (the one that spark2-shell is pointing to)
# You can find this path with:
/usr/sbin/alternatives --display spark2-submit However, from CDH 6.x, it is normal to use spark-submit instead of spark2-submit because there is only Spark2 included in CDH. Also, it is normal to use the packaged version inside of the CDH distribution, which seems to be 2.4.0 for your CDH 6.1.1. I am not sure, if using a different version is supported or recommended by Cloudera. How did you install that Spark 2.2 version in your CDH 6 cluster? Also see https://www.cloudera.com/documentation/enterprise/6/6.1/topics/spark.html
... View more
08-20-2019
01:45 AM
To use the system's own commands like "curl", you also need to import the AD certificate (or its root CA's certificate) to the system non-Java truststore. For RHEL 7, the procedure is as follows: Copy the pem-file to /etc/pki/ca-trust/source/ and run update-ca-trust extract.
... View more
08-20-2019
01:09 AM
Hi @lsouvleros, as you already pointed out: this is influenced by a number of factors and widely influenced by your use case and existing organizational context. Comparing to an HDFS in a classic compute/storage-coupled Hadoop cluster, some of the discussions from here do also apply: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_sdx_vpc.html. This is, because Isilon is a network-attached storage and - similar to using Cloudera Virtual clusters - this has some implications on performance, especially for workloads with high-performance requirements. I have also seen environments where using Isilon instead of HDFS had impact on Impala performance. In terms of reliability and stability, you can argue each way - depending on your architecture. However, a multi-datacenter-deployment is likely to be more easy to realize with Isilon, due to its enterprise-proof replication and failover capabilities. In terms of efficiently using storage space, Isilon will have advantages. However, the higher cost compared to JBOD-based HDFS might make this point irrelevant. For scalability, I guess it depends again on your organizational setup. You can easily scale up Isilon by buying more boxes from EMC. There are certainly really large Isilon deployments out there. On the other hand, scaling HDFS is also not hard and can help you to realize huge deployments. In the end it will be a tradeoff of higher costs with Isilon but with more easy management vs. lower costs by higher efforts with HDFS. This is my personal opinion and both EMC and Cloudera might have stronger arguments for their respective storage (e.g. [EMC link]). You can also look for the latest announcement for the blog. Regards, Benjamin
... View more
08-20-2019
12:29 AM
Hi @pollard, looking at Impala Configs, you can find the /varz servlet at the debug WebUI of Impala Statestore, Catalogserver and Daemon. When you use the default ports these should be: http://state-store-host:25010/varz http://catalog-server-host:25020/varz http://impala-daemon-host:25000/varz On the Impala Daemon, you also have a servlet for Hadoop vars: http://impala-daemon-host:25000/hadoop-varz Besides these servlets, Impala also prints it flags (which you were asking for in the second paragraph) during startup in the INFO-logs of each service. This may help, if your debug WebUIs are disabled for security reasons. For instance for Impala daemon: /var/log/impalad/impalad.INFO I0819 17:43:55.279785 18999 logging.cc:156] Flags (see also /varz are on debug webserver):
--catalog_service_port=26000
--catalog_topic_mode=full
[…]
--symbolize_stacktrace=false
--v=1
--vmodule= For other services, the /conf servlets at WebUIs or the Cloudera Manager configs (see other reply) mostly apply. If this or the first answer was helpful to you, please set it as accepted solution. Regards, Benjamin
... View more
08-19-2019
07:48 AM
From the sent screenshots both errors regarding port binding and missing directory seem to originate from the Edge Server (where there should not be any DataNode from your first screenshots). Can you please double-check that you are not trying to run a DataNode on your Edge Node? The logs may not be relevant then. From your other screenshot there might be a problem in the interconnection of DataNodes and NameNode. Can you please check the NameNode log?
... View more
08-19-2019
07:32 AM
Hi pollard, most services expose a "/conf" servlet in their WebUI which gives you the most complete set of actually used parameters. This should be the most promising source of thruth. Impala has a similar Servlet with path "/varz". The instance process view of Cloudera Manager shows you the actually distributed config files - which often helps a lot but does not include default values. You can reach it from a service (e.g. Impala) by clicking on "Instances" -> (the instance you want to see, e.g. Impala Catalog Server on a node) -> Processes Regards, Benjamin
... View more
08-19-2019
05:57 AM
Hi SS, you could try to declare the disks of the additional nodes as SSD-tier and flag the temporary data with One_SSD storage policy. This way, data should only reside on the declared "SSD-disks" and by that on the "burst nodes". However, keep in mind the performance implications when storing data only on a subset of your cluster. Jobs that primarily use that data might create more heavy network load and suffer from a lower aggregated IO bandwith thus leading to degraded performance. Regards, Benjamin
... View more
08-19-2019
05:37 AM
Hi, are you using an environment managed by Cloudera Manager? Setting parameters such as the secure datanode user should not be required when using the Cloudera-Manager-guided Kerberos path. Please refer to https://www.cloudera.com/documentation/enterprise/5-11-x/topics/cm_sg_intro_kerb.html Please note that using root for Datanodes was required in HDFS to bind to privileged ports (<1024) in order to protect against attackers spinning up rouge DataNodes inside of YARN jobs. With SASL protection and SSL this is not required anymore and the user is "hdfs" again. Regards Benjamin
... View more
08-19-2019
02:45 AM
Hi folks, Isilon and CDH has been strongly promoted in the last years. However, for the current CDH releases (5.16.x, 6.x), the support seems to be missing (CDH/Isilon Support matrix) Can somebody share insights, when there will be support for these versions? Thank you Benjamin
... View more
08-15-2019
10:01 PM
I am using CDH/CM 6.2. Will update the cluster and test again. However, according to the docs, it should already work since 5.14.
... View more
08-15-2019
03:11 AM
Hi community,
looking at security, I am in process of disabling any interfaces without proper authentication / authorization (or even encryption). I came across the debug web UIs of Cloudera Management services.
According to https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_ports_cm.html, the debug WebUIs can be disabled by setting the port property to -1. This works for Reports Manager (8083), Event Server (8084), Navigator Audit Server (8089), Telemetry Publisher (10111).
This does not work, however, for Service Monitor (8086 / 9086 TLS), Activity Monitor (8087 / 9087 TLS), Host Monitor (8091 / 9091 TLS). Setting port to -1 leads to non-starting services without a proper ERROR in the log file.
Cloudera Manager agent even tries to check, if the server successfully bound to port -1 and runs into errors:
[15/Aug/2019 12:06:03 +0000] 65646 Thread-14 process ERROR [918-cloudera-mgmt-HOSTMONITOR] Failed port check: Command '['ss', '-np', 'state', 'listening', '(', 'sport', '=', '-1', 'or', 'sport', '=', '9995', 'or', 'sport', '=', '9994', ')']' returned non-zero exit status 255
How do you disable the debug web UIs for those management services. Or is there a way to properly secure them by authentication and authorization?
Thanks and best regards
Benjamin
... View more
Labels:
04-30-2019
06:48 AM
Nope Sentry is set up and all other grants work. It is only the SELECT GRANT that makes problems and does not work in Impala.
... View more
04-28-2019
10:51 PM
Hi, I am using the same user for the access both from Hive and Impala. Besides, both users are in the Sentry admin group list. Best Benjamin
... View more
02-11-2019
06:40 AM
Quite strangely, the same GRANT Query seems to work when running through HiveServer2 (e.g. using beeline). Is this an Impala Bug?
... View more
02-11-2019
06:13 AM
Hi community,
I got CDH 5.16 and a cluster with Impala / Hive / Sentry (database-backed) set up.
In Impala, I got different databases and want to define policies so that a group / role can access all databases read-only.
I tried this Grant but it does not work:
"GRANT SELECT ON DATABASE ALL TO ROLE my_role;"
How can I define SELECT-permissions for all databases in Sentry?
Thanks!
Benjamin
... View more
11-21-2018
09:54 AM
Hi community, I've got a Spark 2.3 application in which I need to broadcast rather large (1-3 GB) objects. To do so, I am collecting the DataSets and broadcasting them. Performance measurements show, that the driver spends a long time, serializing the objects, which are fairly complex indeed. During broadcasting / serialisation, the driver is only busy doing this task. I am wondering, how to reduce this waiting-time. Is there a way to parallelize tasks such as broadcasting / serialization on the driver? It would be for instance be helpful to erform multiple broadcasts in parallel, continue with other driver code during broadcasting or having a way to parallelize an individual broadcasting. Best, Benjamin
... View more
Labels:
10-29-2018
07:47 AM
Yes, there is. In Spark 2.x set the parameter "spark.ui.killEnabled" to false in "Custom spark2-defaults" / spark-defaults.conf.
... View more
10-18-2018
05:32 AM
1 Kudo
Well, I have solved it: Undeclared / auto-created queues seem not to inherit their preemption threshold / timeout settings from the parent queue but from the global default settings. These are definable in Cloudera Manager by selecting "Default Settings" in the "Dynamic Resource Pool Configuration" in Cloudera Manager. Applications in a queue, however, seem to inherit the preemption settings from the queue they are in. Leaving it here for other users searching such info.
... View more
10-18-2018
01:27 AM
Hi community, I am trying to isolate my users from each other in YARN (CDH 5.15.1). For that, I am using a queue root.users.<username>, in which each user is directed. The queues for users are not created, but will be created on submission - they are undeclared. To guarantee resources, I would like to active FairShare Preemption on all user queues, although they are undeclared. Enabling Preemption on the root.users Queue did achieve any preemption. I have set: - Preemption is activated globally - Fair Share Preemption Threshold of root.users is set to 0.5 - Fair Share Preemption Timeout of root.users is set to 5 - Preemptable of root.users is set to true In my test, the cluster resources are fully allocated to root.users.alice. Then bob submits a job to root.users.bob but does not receive any resources. Best, Benjamin
... View more
Labels:
07-31-2018
06:21 AM
Did you authenticate using Keytabs or using a password-based kinit? Could you please send the result of "klist" and "klist -kte <keytab-file>"
... View more
03-16-2018
10:25 AM
Hi @Jinyu Li your issue is likely produced by Hive Permission Inheritance. After creating the tables, the Sqoop app tries to change the owner/mode of the created HDFS files. Ranger permissions (even rwx) do not give rights to change POSIX owner/mode, which is why the operation fails. Such failure is classified as "EXECUTE" action by Ranger. You can find more details in the HDFS Audit log, stored locally on the NameNode. Solution: Could you please try to set "hive.warehouse.subdir.inherit.perms" to false and re-run the job? This stops Hive Imports from trying to set permissions, which is fine when Ranger is the primary source of authorization. see https://cwiki.apache.org/confluence/display/Hive/Permission+Inheritance+in+Hive for more details. Best, Benjamin
... View more
11-02-2017
07:46 AM
Hi community, does
anybody have experience with the following scenario in Kerberos setup: My scenario: Run every
Hadoop Service as the same Linux user, let’s call him myhadoopuser. Create the
following principals for the cluster: Service
Principals for all nodes: myhadoopuser/_HOST HTTP/_HOST Shared
user principal for HDFS, ambari smoketest, spark, ... All of the
above principals are mapped to myhadoopuser with auth_to_local My question is: Does it work technically and are there strong reasons not do it? Potential
issues I see:
missing
isolation among Hadoop services (anybody got an example what would be
possible and why this is bad?)
inability
to set different proxyuser privileges for different services Thank you! Best, Roland
... View more
05-09-2017
01:21 PM
Hi guys, I have problems upgrading from HDP 2.5.0 to 2.6.0 on a RHEL 6.7 system. The Ambari Upgrade to 2.5.0 is finished. Installing packages for HDP 2.6, Ambari fails with this error: Package Manager failed to install packages. Error: Execution of
'/usr/bin/yum -d 0 -e 0 -y install hadoop_2_6_0_3_8-client' returned 1.
Error: Package: hadoop_2_6_0_3_8-hdfs-2.7.3.2.6.0.3-8.x86_64
(HDP-2.6.0.3)
Requires: libtirpc-devel
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles --nodigest
Traceback (most recent call last): And indeed, the package is not available. # yum install libtirpc-devel
Loaded plugins: product-id, security, subscription-manager
Setting up Install Process
No package libtirpc-devel available.
Error: Nothing to do Is there guidance, how to work around this problem? RHEL 6.7 is explicitly supported by Ambari 2.5 and HDP 2.6. Installing unofficial packages is not allowed in our environment. Thanks for your help PS: I saw this community questions which, however, is about RHEL 7.x: https://community.hortonworks.com/questions/96763/hdp-26-ambari-install-fails-on-rhel-7-on-libtirpc.html
... View more
Labels:
05-03-2017
08:58 AM
@Adi Jabkowsky This did the trick! Thank you. Now it works.
... View more