About aanghel

aanghel · ‎12-20-2016

Hmm, okay, but I expect this to be transient as we're talking about TCP outgoing connections. The connection that could have potentially caused the issue it's not there any longer. So it's more important when you start Zeppelin and get the "This address is already in use" also to check the netstat at around the same time. The other error you get when you tried to install Zeppelin again is not from Zeppelin but from Ambari trying to create the zeppelin user home folder in HDFS. It looks like HDFS (WebHDFS in this case) is not working so check that please (lenu.dom.hdp on port 50070).

aanghel · ‎12-20-2016

Hi @Narasimma varman After running ambari-server setup-ldap did you restart the Ambari Server? The localhost:33389 error means Ambari Server hasn't been restarted and it's using the default configuration.

aanghel · ‎12-19-2016

Hi @Hoang Le Use the following doc to configure cpu scheduling: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_yarn-resource-management/content/ch_cpu_scheduling.html But it's also recommended to configure cgroups for this to be effective: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_yarn-resource-management/content/enabling_cgroups.html However, these can be a bit of pain. Most of the time, increasing the container size (map, reduce and tez) would reduce the CPU load. Setting YARN container memory values is a balancing act between memory (RAM), processors (CPU cores), and disks so that processing is not constrained by any one of these cluster resources. However, if the application usage pattern is known, the containers can be tuned to influence and maximize the resources utilization. For example, if CPU usage and load average are too large, an increase in the container size reduces the number of containers allocated per node. A good rule of thumb is not to have the load average bigger than 4 times the number of physical core. Then a likely cause for having large amounts of CPU used is Garbage Collections and especially continuous Garbage Collection which would be helped by a larger container. Increase by 50% the following and observe any changes: tez.task.resource.memory.mb hive.tez.container.size mapreduce.map.memory.mb mapreduce.reduce.memory.mb

aanghel · ‎12-19-2016

Hi @Dmitry Otblesk What exactly did you check with netstat? If you've only checked what ports are used by listening services (netstat -l) I suggest to check all ports. I've seen cases when Hadoop services tried to listen on ports used as the source port by other TCP connections: netstat -anp|grep 9995

aanghel · ‎12-19-2016

Hi @Connor O'Neal The main reason for why swap is enabled in the first place is to prevent the Linux OOM (Out-Of-Memory) Killer terminating processes when the memory pressure is too high (memory usage without buffers is close to 100%). The general recommendation for worker nodes is to have the Swap disabled. The logic is that in a distributed system, it's preferable to have the OS terminate processes (which can easily recover) than having 1 or 2 processes (YARN containers) that greatly degrade the performance of a distributed job running on the cluster. If there's an internal policy that requires for swap to be present, the least intrusive action is to set swappiness to 1, which will reduce the likelihood of swapping as much as possible (only swap when absolutely necessary). The general recommendation for master nodes is to have the Swap enabled but reduce the likelihood of swapping. If master services are abruptly terminated by the OOM killer (similar with kill -9) then the cluster availability is affected (especially if there are no HA services) and increases the possibility of some data corruption (as the services are not allowed to gracefully terminate). In conclusion, the recommendation is to set swappiness to 1 on all cluster nodes and discuss with your systems administrator the possibility to set swappiness to 0 (equivalent to disabling swap) on the worker nodes. This can be achieved on a running system with the following command: echo 1 > /proc/sys/vm/swappiness For a permanent setting, add vm.swappiness=1 to /etc/sysctl.conf. Also a word of caution regarding CentOS/RHEL7. For a permanent setting, which can last after reboot, updating /etc/sysctl.conf in RHEL7 might not always work. RHEL7 introduces a new service called tuned which overwrites values set in /etc/sysctl.conf. Thus if this service is active, create the file, for example /etc/tuned/hdp/tuned.conf with the following content: [main] include=throughput-performance [sysctl] vm.swappiness=1 [vm] transparent_hugepages=never And run the following command: tuned-adm profile hdp The throughput-performance profile is already the default in RHEL7 so this only applies changes on top of it.

aanghel · ‎09-23-2016

I feel that this wasn't answered clearly. I stumbled across this recently and tested with various configurations and full packet captures with tcpdump. There are 3 possibilities when hive.server2.thrift.sasl.qop is set to auth-conf: Client connects with ;saslQop=auth-conf - traffic is encrypted Client tries to connect with ;saslQop=auth - connection is refused with javax.security.sasl.SaslException: No common protection layer between client and server exception Client connects without any saslQop parameter set (this is especially the case with ODBC drivers and software such as Tableau where you cannot - easily - set the JDBC parameters) - traffic is still encrypted. I'm mentioning this as some documentation asks to explicitly set saslQop in the client, but this isn't required, unless you want to enforce this so it doesn't go over unencrypted connections if the server setting changes.

aanghel · ‎09-21-2016

HDFS has an inotify feature which essentially translates those log entries into events that can be consumed. https://issues.apache.org/jira/browse/HDFS-6634 Here's a Java based example: https://github.com/onefoursix/hdfs-inotify-example Alternatively, rather than having Oozie monitor many directories and waste resources, a script can execute 'hdfs dfs -ls -R /folder|grep|sed' every minute or so but that's still not event based, so it depends how fast of a reaction you need vs how easy you can implement/use the inotify API.

aanghel · ‎09-09-2016

1) What could be the root cause ? I think it's just the wrong ldapsearch filter, should be ldapsearch -h unix-ldap.company.com -p 389-x -b "dc=company,dc=SE""(&(cn=devdatalakeadm)(memberUid=ojoqcu))" cn=devdatalakeadm,ou=Group,dc=company,dc=se is actually the full dn and you cannot search on it as it's not an attribute. 2) Your problem is still the userDnTemplate, that's why you're still getting the LDAP authentication exception ldapRealm.userDnTemplate = uid={0},cn=devdatalakeadm,ou=Group,dc=company,dc=se Why are you trying to search the user inside the cn=devdatalakeadm subtree? That's not how users and groups are represented in LDAP (unless you did something very specific). Users and Groups are normally in separate trees and membership is only decided by the memberUid parameter in your case. But if memberUid is ojoqcu it doesn't mean uid=ojoqcu,cn=devdatalakeadm,ou=Group,dc=company,dc=se actually exist, ojoqcu user could be in a separate tree/ou, like uid=ojoqcu,ou=User,dc=company,dc=se To further help you finding out the correct userDnTemplate, I'd need an ldapsearch output for a user, just like you showed for groups.

aanghel · ‎09-08-2016

1. Yes, this means anonymous has been allowed access 2. make a copy cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml, edit the file and set that variable to false 3. When I put the wrong userDnTemplate, I get the following, so it's something to look for in the logs: LoginRestApi.java[postLogin]:99) - Exception in login: org.apache.shiro.authc.AuthenticationException: Authentication token of type [class org.apache.shiro.authc.UsernamePasswordToken] could not be authenticated by any configured realms. Please ensure that at least one realm can authenticate these tokens. But really, you should get the right LDAP template, it might not be a problem with CN or uid, but a problem with the path (for example, the user might be in ldapRealm.userDnTemplate=CN={0},ou=Users,dc=company,dc=SE, not ou=Group,dc=company,dc=SE) How do you use this LDAP in other projects / apps? Run ldapsearch on it: ldapsearch -h unix-ldap.company.com -p 389 -x -b "dc=company,dc=SE" (although you might not be allowed to bind anonymously). Ask your LDAP admin, etc Good luck!

aanghel · ‎09-08-2016

You should really install a newer Zeppelin version as there have been quite a few changes and enhancements in terms of security. I wouldn't advise trying security with that old Zeppelin version. The 0.6.0.2.4.2.0-258 from the HDP2.4.2 repo doesn't come with the org.apache.zeppelin.server.LdapGroupRealm class so you won't be able to use it (the error you receive is absolutely normal). If you don't want to upgrade to HDP2.5 you can at least manually compile 0.6.2 from https://github.com/apache/zeppelin/tree/branch-0.6: git clone https://github.com/apache/zeppelin.git -b branch-0.6 cd zeppelin/ mvn clean package -DskipTests -Pspark-1.6 -Phadoop-2.6 -Dhadoop.version=2.7.1 If you get the UI when you cancel the login, that's probably because anonymous is still allowed, so set zeppelin.anonymous.allowed to false in conf/zeppelin-site.xml Lastly, as a curiosity, I tried 0.6.0.2.4.2.0-258 from HDP2.4.2 with ldapRealm = org.apache.shiro.realm.ldap.JndiLdapRealm and it works just fine, even if I sometimes get the Invalid ticket error in the logs. Your LDAP layout might be different and the shiro config wrong. Are you sure the userDN is uid={0} and not CN={0}? Are you sure the users are in the ou=Group,dc=company,dc=SE baseDN? Usually groups are just other entries in the LDAP and the group membership is controlled by member attribute rather than putting users in Group subtrees.

Online	Offline
Last Visited	‎08-17-2019 02:55 PM

Member Since	‎02-21-2019 01:50 AM
Last Visited	‎08-17-2019 02:55 PM
Posts	69
Kudos received	44

Cloudera Community

Re: HBase Table REST Endpoint

Re: Nifi flow: Bash script execution, regex on std...

Re: HDF3 install via blueprint, NiFi fails to star...

Re: systemctl stop iptables or systemctl disable i...

Re: Yarn queue manager error: Couldn't connect to ...

Re: Zeppelin fails - address is already in use

Re: LDAP File Sync issue

Re: How to prevent CPU high load

Re: Zeppelin fails - address is already in use

Re: Why not set swappiness to zero?

Re: Hive SASL QOP setting on client and server

Re: HDFS Best way to trigger execution at File arr...

Re: Zeppelin security : Issues while securing Zepp...

Re: Zeppelin security : Issues while securing Zepp...

Re: Zeppelin security : Issues while securing Zepp...