Member since
11-29-2016
17
Posts
2
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1026 | 12-13-2016 09:36 PM | |
589 | 11-30-2016 03:37 PM |
07-27-2017
01:02 AM
1 Kudo
Typically you either need to scale out due to HDFS disk usage, or you need to scale out for computational reasons. If I have 10 or so datanodes and they are each allocated 80% of the system memory for YARN, would them all running 100% of their YARN allocation for a majority of the day indicate that I need to scale out for computational reasons? Currently only my HDFS is at 60% utilization. I am primarily running Tez jobs, CPU doesn't seem to be hit as much, but my YARN memory allocation is constantly 100% and I have users complaining about slow running jobs. I assume this is because they have to wait for other jobs to free up resources for them to get their job to run. Are there any things I could look for in this situation? Running Ambari 2.5.1 and HDP 2.6.1.
... View more
Labels:
- Labels:
-
Apache YARN
01-25-2017
06:40 PM
@apappu Sorry, I was just able to get to the office to try this out. This was the issue, thanks for your help!
Just out of curiosity, why can't a non-root user use port 443?
... View more
01-24-2017
10:13 PM
Hello, I enabled HTTPS for my Ambari Server before I changed it to run as a non-root daemon user. After I enabled non-root daemon, I'm getting the following error: 24 Jan 2017 17:06:48,001 WARN [main] AbstractLifeCycle:204 - FAILED SslSelectChannelConnector@0.0.0.0:443: java.net.SocketException: Permission denied
java.net.SocketException: Permission denied
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
at org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
at org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
at org.eclipse.jetty.server.ssl.SslSelectChannelConnector.doStart(SslSelectChannelConnector.java:631)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.eclipse.jetty.server.Server.doStart(Server.java:293)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.apache.ambari.server.controller.AmbariServer.run(AmbariServer.java:617)
at org.apache.ambari.server.controller.AmbariServer.main(AmbariServer.java:927)
It seems as though even though I've put in all the sudo settings (starting here: https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-security/content/commands_server.html ) the non-root user still doesn't have enough permissions to read the certificate to use for SSL binding. Does anyone know what is needed for this permission issue to be resolved? The SSL certificate and key are installed in /etc/ssl/certs/ I've been searching and I can't seem to find an answer to this. Thanks
... View more
Labels:
- Labels:
-
Apache Ambari
01-19-2017
03:33 PM
@Oliver Fletcher Yup, this was the issue. I enabled LDAPS on our domain and it works now.
... View more
01-11-2017
07:38 PM
@lraheja
Sure, it's no longer timing out, it's just back to what it was doing before.
kerberos-stack-2.txt
... View more
01-11-2017
06:26 PM
@lraheja I did not go through the ambari-server setup-ldap steps, I must've gone past this some how. After configuring this and restarting Ambari the LDAP tests seem to be getting further but are now just timing out.
My krb5.conf is not configured at all, it's the default conf file. I assumed Ambari was going to configure this through the wizard, is that not the case?
... View more
01-11-2017
06:03 PM
Hi @rguruvannagari thanks for the reply.
I just confirmed with my AD guy that our AD is not set up for SSL at all. I was able to parse AD using the ldapsearch tool using the same DN and ldap url I'm specifying. I'll keep trying different DN's
... View more
01-11-2017
05:39 PM
Hello, I'm receiving this error: Failed to connect to KDC - Failed to communicate with the Active Directory at LDAP://hq.domain.com/OU=Production,OU=domain,DC=hq,DC=domain,DC=com: simple bind failed: hq.domain.com:389
Update the KDC settings in krb5-conf and kerberos-env configurations to correct this issue.
I've been following this guide: https://www.ibm.com/support/knowledgecenter/SSPT3X_4.2.0/com.ibm.swg.im.infosphere.biginsights.admin.doc/doc/admin_kerb_activedir.html as well as the HDP documentation on this. I'm doing the automated kerberos wizard. JCE has been distributed to all of the nodes, I'm using Oracle JDK 1.8. Attached is the full stack trace: kerberos-stack.txt The KDC Test Connection passes just fine, I can see expected network traffic between my domain controller and the Ambari server. The only main difference is that I'm not using SSL on AD. I figure this should be fine and Ambari can just use the plaintext 389 port. I realize this is a security concern but I have no way around this right now. I don't have control over this area of our domain setup. Could this be my issue? Any help appreciated. Thanks.
EDIT: I was able to successfully parse AD using the ldapsearch tool using the same DN and LDAP url that I'm specifying. Also with the same admin user.
... View more
Labels:
12-13-2016
09:36 PM
1 Kudo
This is resolved. There were a couple directories leftover in the /usr/hdp location using the old version, it seems Ambari will use this file path to determine the version needed, not sure how to articulate this further but yeah, something metadata wise was changed based on these files existing still. Also doing "python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py --silent" on every machine helped clean up things that were missed. I forgot to run this step.
... View more
12-13-2016
04:51 AM
Hi @rgangappa Thanks for the reply. Unfortunately I've tried this already and it didn't fix the issue. I don't believe the HDP.repo file is the problem, the correct repo is being used, it's just Ambari is instructing yum to install the wrong package - - a package in which does not exist in the correct repo that is loaded into yum.
... View more
12-13-2016
12:02 AM
Hello all, I'm doing an install of HDP 2.5.3. using Ambari 2.4.2.0-136, and the final install continues to fail because of this:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/ATLAS/0.1.0.2.3/package/scripts/atlas_client.py", line 57, in <module>
AtlasClient().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/ATLAS/0.1.0.2.3/package/scripts/atlas_client.py", line 45, in install
self.install_packages(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 567, in install_packages
retry_count=agent_stack_retry_count)
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 54, in action_install
self.install_package(package_name, self.resource.use_repos, self.resource.skip_repos)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/yumrpm.py", line 51, in install_package
self.checked_call_with_retries(cmd, sudo=True, logoutput=self.get_logoutput())
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 86, in checked_call_with_retries
return self._call_with_retries(cmd, is_checked=True, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 98, in _call_with_retries
code, out = func(cmd, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 293, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/bin/yum -d 0 -e 0 -y install atlas-metadata_2_5_0_0_1245' returned 1. Error: Nothing to do The error "Nothing to do" is being thrown because "atlas-metadata_2_5_0_0_1245" doesn't exist in the HDP 2.5.3 repo, manually trying to install this shows this. When I do "yum search atlas" it shows this version: atlas-metadata_2_5_3_0_37 but yet Ambari is not set to install this version for some reason even though in the install wizard I specified 2.5.3.
Here is what my HDP.repo looks like on every node: [HDP-2.5]
name=HDP-2.5
baseurl=http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.5.3.0
path=/
enabled=1
gpgcheck=0
I ran through this exact same install 2 weeks ago (2.5, but it might have been 2.5.0, I had it grab the latest 2.5), and I had zero problems. However I should mention these machines were all used in this previous install and I'm doing a re-install right now. I've ensured that everything has been wiped off of the machines based on information found here: https://community.hortonworks.com/questions/1110/how-to-completely-remove-uninstall-ambari-and-hdp.html and here: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_command-line-installation/content/ch_uninstalling_hdp_chapter.html
Ambari checks before install all passed with no errors. Can anyone please help me figure this out? Reinstalling all of the machines from scratch is nearly impossible right now. But I don't think there is anything wrong with the repo installation on the machines, Ambari installed that right, it simply is just not looking for the right packages. Note: Datanode / YARN services were installed properly, this seems to just be Atlas & App Timeline Server. Thanks
... View more
Labels:
11-30-2016
03:37 PM
I figured this out. It is because of my network layout. The HDP machines are all binding to the 172 addresses and listening on that. When Ambari goes to connect to the name node on port 50070 for instance, it can't because the name node is listening on 50070 on 172 network which Ambari isn't on. I resolved the issue by setting up multi-homing: https://community.hortonworks.com/articles/24277/parameters-for-multi-homing.html
... View more
11-29-2016
03:05 PM
Hello All, This is a brand new install on HDP 2.5 using Ambari 2.4.1.0-22 on CentOS 7. Currently all my nodes are online, however only half of my metrics are showing up (see attachment) capture.png I have a feeling it has something to do with my network setup. Currently I have all of the machines (name/secondary name / data nodes) all on their own private network (172.x.x.x) using their own set of switches (Nexus 3k's) using a fiber card on the physical machines. On the second nic's (regular ethernet) I have them all attached to our regular network (192.x.x.x) so Ambari (running as a VM) can reach and orchestrate them and so they can reach the internet (I'm not using a local repo). My secondary name node is the metrics collector and on that machine I see all of the data nodes establishing a connection on port 6188 when I run "netstat -anp | grep 6188". I've looked at the /var/log/ambari-metrics-monitor/ambari-metrics-monitor.out file for any errors and I can't find any. Typically in the past I would see "Connection refused" when the monitor couldn't connect to the collector. Just to give you a better idea: Ambari - 192.x.x.x (main network)
HDP machines - 172.x.x.x (Nexus / fiber network) / 192.x.x.x (main network) NOTE: Grafana Dashboard is showing all possible metrics from what I see. But this dashboard is hosted within the 172.x.x.x network on the secondary name node. With this network setup, would I need to set the " -Dhttp.proxyHost= <yourProxyHost> -Dhttp.proxyPort= <yourProxyPort>" parameters? I can't see why I would need to; I'm seeing this as a common solution to my symptoms. However, all of the machines can access the internet as is using their secondary nic that is attached to our main network. But maybe somehow this network separation (with Ambari not on the fiber network) is causing the metrics data to not return to Ambari? On Ambari, the IP Address that it has all of the nodes as are the private IP's (Attachment 2) capture2.png . Perhaps this is a hint to the problem, though that's what it should be set to. All of the machines have their private IP's set up in /etc/hosts so they only see each other on the 172.x.x.x network (minus Ambari, which is using regular DNS mapped to the 192.x.x.x IP's.) I've ensured SELinux is disabled on all of the machines, as well as the firewall daemon. NTP servers are configured and synced. I've restarted the Ambari Metrics collector / services multiple times. I'm able to telnet from the data nodes to the collector port. Thanks
... View more
Labels: