About benhadoop

pollard · ‎08-26-2019

Just a note about the CM API: From what I can tell, the API doesn't bring more parameters back that what you can see in the Configuration tab for each app/service. Being able to see them in the INFO logs was exactly what was needed. However, it would be nice to be able to use the API to get the same info as the INFO logs provides. I could see some automation opportunities in the future...

benhadoop · ‎08-20-2019

Hi @R_SHETH, great to hear! Please mark the reply as the accepted answer if it solved your problem 🙂

benhadoop · ‎08-20-2019

Hi @lsouvleros, as you already pointed out: this is influenced by a number of factors and widely influenced by your use case and existing organizational context. Comparing to an HDFS in a classic compute/storage-coupled Hadoop cluster, some of the discussions from here do also apply: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_sdx_vpc.html. This is, because Isilon is a network-attached storage and - similar to using Cloudera Virtual clusters - this has some implications on performance, especially for workloads with high-performance requirements. I have also seen environments where using Isilon instead of HDFS had impact on Impala performance. In terms of reliability and stability, you can argue each way - depending on your architecture. However, a multi-datacenter-deployment is likely to be more easy to realize with Isilon, due to its enterprise-proof replication and failover capabilities. In terms of efficiently using storage space, Isilon will have advantages. However, the higher cost compared to JBOD-based HDFS might make this point irrelevant. For scalability, I guess it depends again on your organizational setup. You can easily scale up Isilon by buying more boxes from EMC. There are certainly really large Isilon deployments out there. On the other hand, scaling HDFS is also not hard and can help you to realize huge deployments. In the end it will be a tradeoff of higher costs with Isilon but with more easy management vs. lower costs by higher efforts with HDFS. This is my personal opinion and both EMC and Cloudera might have stronger arguments for their respective storage (e.g. [EMC link]). You can also look for the latest announcement for the blog. Regards, Benjamin

michalis · ‎08-17-2019

This was reported as a bug, and has already been fixed in CM 6.3.0, 6.2.1 as part of OPSAPS-49111

benhadoop · ‎10-18-2018

Well, I have solved it: Undeclared / auto-created queues seem not to inherit their preemption threshold / timeout settings from the parent queue but from the global default settings. These are definable in Cloudera Manager by selecting "Default Settings" in the "Dynamic Resource Pool Configuration" in Cloudera Manager. Applications in a queue, however, seem to inherit the preemption settings from the queue they are in. Leaving it here for other users searching such info.

benhadoop · ‎03-16-2018

Hi @Jinyu Li your issue is likely produced by Hive Permission Inheritance. After creating the tables, the Sqoop app tries to change the owner/mode of the created HDFS files. Ranger permissions (even rwx) do not give rights to change POSIX owner/mode, which is why the operation fails. Such failure is classified as "EXECUTE" action by Ranger. You can find more details in the HDFS Audit log, stored locally on the NameNode. Solution: Could you please try to set "hive.warehouse.subdir.inherit.perms" to false and re-run the job? This stops Hive Imports from trying to set permissions, which is fine when Ranger is the primary source of authorization. see https://cwiki.apache.org/confluence/display/Hive/Permission+Inheritance+in+Hive for more details. Best, Benjamin

benhadoop · ‎04-04-2017

Thank you for that answer. I was not sure, if there are any specialities, as Hive did some custom checks for read/write rights until: https://issues.apache.org/jira/browse/HIVE-7583 and https://issues.apache.org/jira/browse/HDFS-6570

benhadoop · ‎03-28-2017

This is the answer I was hoping for. Thanks

benhadoop · ‎03-29-2017

Just to sum it up: I have now chosen to place some regex in the auth-to-local rules to match exactly those hosts, which are used in a certain cluster. While this adds operations overhead, it will make the cluster more secure. The guys of Cloudera have a good summary about that in their documentation: https://www.cloudera.com/documentation/enterprise/5-9-x/topics/sg_auth_to_local_isolate.html

benhadoop · ‎05-03-2017

@Adi Jabkowsky This did the trick! Thank you. Now it works.

Online	Offline
Last Visited	‎09-02-2019 06:07 AM

Member Since	‎12-14-2015 11:46 PM
Last Visited	‎09-02-2019 06:07 AM
Posts	89
Kudos received	7

Cloudera Community

Re: spark-submit still pointing to Spark-version a...

Re: CM interface to see exact settings

Re: Fair Share Preemption for undeclared (user) qu...

Re: Oozie not passing Ambari Service checks -> YAR...

Re: Ranger for YARN RM: Not using group membership

Re: CM interface to see exact settings

Re: spark-submit still pointing to Spark-version a...

Re: Isilon HDFS vs CDH HDFS

Re: Disable Cloudera Management Debug WebUIs (Host...

Re: Fair Share Preemption for undeclared (user) qu...

Re: yarn can read/write to hdfs, but cannot execut...

Re: Hive Metastore Authorization and how it is con...

Re: Secure Ambari Infra (SolR) using Ranger Author...

Re: Possibility to use Principals across clusters

Re: Permissions problem in Capacity Scheduler view...