Member since
02-15-2018
30
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
492 | 04-20-2018 10:17 AM |
05-16-2019
11:46 AM
Actually the KDC gets deployed by Cloudbreak and it is on the same host as the Ambari server, so there is no Firewall involved. nc -v localhost 88 is successful. KDC is also running and I can also see that principals are created successfully. Also when resizing the cluster with Cloudbreak this works fine. So eveything seems to be fine, just that we get an error when clicking on "Test KDC connection".
... View more
05-16-2019
10:00 AM
Hi all, when we deploy a HDP cluster in Cloudbreak with Kerberos Security enabled using Test KDC everything seems to work fine but when we click on "Test KDC Connection" under Kerberos > Configs it gives an error "Connection failed" but no further details. Is this an issue or can we ignore it? Thanks! Alex
... View more
Labels:
05-02-2019
01:09 PM
Hi all, the Kerberos Wizard aof Ambari in version 2.7.3 seems to automatically activate HTTP authentication (SPNEGO) for most Hadoop Web UIs. This makes it quite complicated to access the Web UIs on non-cluster machines, especially using Windows. This automatic activation of HTTP authentication was not the case in earlier versions of Ambari (2.6.x). Is there a way to prevent this behavior or do I have to go through all the services afterwards and deactivate HTTP authentication manually? Thx for any help! Alex
... View more
04-05-2019
08:28 AM
Hi all, is it possible to move a cluster from a workspace to another workspace? Let's say I have created a cluster in my personal workspace and want to move it to another workspace where more people have access to. Is this possible? Otherwise I have to carefully think about which cluster to spawn in which workspace in advance because it cannot be changed later on. Thanks in advance! Alex
... View more
Labels:
04-05-2019
06:08 AM
OK this feature should be extended (more customization options) and added to the wizard in future Cloudbreak versions in my opinion because custom hostnames is quite common requirement.
... View more
04-04-2019
03:36 PM
Hi @mmolnar, a custom blueprint would not help in that case because the problem is the creation of the VMs for the cluster. We want to use a mix of different volumes attached to a VM (e.g. one volume based on SSD and one based on HDD) such that we can make use of HDFS data tiering. This seams to be impossible with Cloudbreak cluster creation. Cloudbreak will always use the same volume types within a VM.
... View more
04-04-2019
01:27 PM
Hi all, is there a way to import an existing cluster running on OpenStack into Cloudbreak for management? Maybe if we use the official Cloudbreak OS Images for Cluster Deployment? The ratio behind that question is that we are not able to configure all cluster paramaters as we want to (especially mixing different storage types within a VM to use HDFS data tiering) in the Cloudbreak cluster creation wizard. So we would deploy it manually (using Ansible) and then want to import it into Cloudbreak for management. Is there any way to make that possible? Best Alex
... View more
Labels:
04-04-2019
09:41 AM
Hi all, is there a way to define custom hostnames when deploying a cluster with the WebUI of Cloudbreak or do we have to use the CLI to do that? The documentation just mentions the CLI: https://docs.hortonworks.com/HDPDocuments/Cloudbreak/Cloudbreak-2.9.0/advanced-cluster-options/content/cb_custom-hostnames.html Is it possible to add the corresponding customDomain entries to the request that the Cloudbreak cluster creation wizard creates at the end without having to switch to the CLI? Thanks! Alex
... View more
Labels:
04-01-2019
03:20 PM
Is it possible to add more storage volumes with other storage types manually after cluster provisioning using Ambari? Or will this conflict with Cloudbreak cluster monitoring? Is this feature planned for a future version of Cloudbreak?
... View more
03-28-2019
02:04 PM
Hi all, we have an on-prem OpenStack cloud (Version Queens) with Cinder Storage and want to use Cloudbreak for Hadoop Cluster Deployments. Question 1 In the Cluster Creation Wizard when it comes to Hardware and Storage we can choose from the existing VM flavours that are available in our OpenStack cloud. However in the Storage section the storage types that exist in our OpenStack cloud do not show up. There is only one storage type "HDD" that shows up in the dropdown list and non of our defined storage types has this name. Let's say our storage types are "A", "B" and "C" with associated QoS specs (maxIOPs settings). Why do these storage types not show up in the dropdown list in the cluster creation wizard? And which storage type will be used if we select "HDD" (the only available option in the dropdown list)? In the official documentation of Cloudbreak I can't find any "configuration" settings for the storage types. Question 2 Next question is if it is possible to mix volumes with different storage types in the same VM? Let's say I want one fast volume (SSD storage type) and 3 slower volumes (HDD storage type). Such storage tiering is possible in HDFS but I don't see an option to define such a setup in the Cloudbreak cluster wizard. Thanks in advance for any help! Best, Alex
... View more
Labels:
02-11-2019
11:27 AM
Hi mmolnar, ok thanks for clarification. It would then make sense to note in the documentation that the Data Lake deployment option is currently only suitable for AWS, Azure and Google but not for OpenStack.
... View more
02-08-2019
02:58 PM
Hi @mmolnar, would this actually mean that workload clusters on our own on-prem OpenStack cloud would have to access or transfer the data from the cloud storage for processing? This doesn't sound feasible. We want to run the data lake and workload clusters in our on-prem OpenStack cloud. It does not make sense in this case to store all the data in a public cloud and transfer it on-prem for processing. It sounds to me that the data lake deployment option of CB is more intended for AWS, Azure and Google but not really suitable for on-prem OpenStack clouds. Would you agree on that?
... View more
02-08-2019
07:40 AM
Hi @mmolnar, is the cloud storage also used to actually store the data? For example the data stored in Hive? My interpretation of a data lake is that it holds the actual data, not only meta data and shared security services. But that would wean that in the "data lake" setup option of Cloudbreak data is actually not stored in HDFS and not on-prem but in the cloud. Is that correct? Thx for your help!
... View more
01-29-2019
02:35 PM
Hi all, Cloudbreak has a nice option to deploy a so-called "Data Lake" and attach ephemeral workload clusters to it. However, this option demands for available Cloud Storage (on AWS, Azure or Google). Instead, we want to deploy a "Production Data Lake" on our on-premise OpenStack cloud that provides the storage for HDFS but also executes our production workloads. We then would like to attach "Test Clusters" to this production cluster where we can run test workloads but access the data in the production cluster (aka Data Lake) such that we do not have to copy data from one cluster to the other. Is there a way to setup this with Cloudbreak? To be clear, we do not want to use Cloud Storage from AWS, Azure or Google. Any ideas or hints? Thanks a lot! Alex
... View more
Labels:
08-03-2018
07:59 AM
Hi @dbompart, our Spark version is 2.2.0 (HDP 2.6.3). I think it's not an issue how we set the parameter, because we actually set it this way in spark2-defaults: spark.yarn.executor.memoryOverhead=2500 And here if to output from the log: INFO YarnAllocator: Will request 12 executor container(s), each with 4 core(s) and 10692 MB memory (including 2500 MB of overhead) So the values seem to be right and the request fits my expectations (8g executor memory + 2.5g overhead). But I'm confused why the container is not killed when exceeding 10.5 GB but 12 GB: WARN YarnAllocator: Container killed by YARN for exceeding memory limits. 13.3 GB of 12 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. Is there any further parameter that I'm missing here?
... View more
08-02-2018
08:02 AM
We are getting the following error when running a PySpark job on YARN: "Container killed by YARN for exceeding memory limits. 12.3 GB of 12 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead." My question is not so much why we get this error message but where does the value for the physical memory limit come from? In our configuration we set: spark.executor.memory=8g
spark.yarn.executor.memoryOverhead=2.5g As far as I know, these are the two config values that influence the container size in YARN for a Spark job. So I would expect the physical memory limit to be 10.5 GB and not 12 GB as stated in the error message. Is there any further parameter that I'm not aware of? Thanks for clearification! Alex
... View more
Labels:
05-23-2018
10:17 AM
I was not able to fix this issue up to now. The only solution I found currently is to disable Resource Manager HA.
... View more
04-20-2018
10:17 AM
I managed to get it running by specifying all the required options usind the right flags. The documentation still seams to use the command as it was used for configs.sh (before Ambari 2.6.0) but is not valid anymore for configs.py. The documentation should be adapted to the new command for configs.py.
... View more
04-20-2018
08:46 AM
Hi, I'm trying to disable ResourceManager High Availablity because I cannot find a solution to a problem with proxy server redirection (YARN Exception: Could not determine the proxy server for redirection) and I want to check if the problem is gone when not using YARN HA. I'm following the documentation to do that (Disable ResourceManager High Availability) but I'm getting an error when executing the script in step 2. I call it like this: /var/lib/ambari-server/resources/scripts/configs.py get m0201.cl.psiori.com psiori yarn-site yarn-site.json The hostname of the Ambari server is "m0201.cl.psiori.com" and the cluster name is "psiori" but if fails with the error: configs.py: error: One of required options is not passed What am I doing wrong? And why can I not simply remove the corresponding configuration values mentioned in step 3 in the Ambari UI? Thx, Alex
... View more
- Tags:
- Hadoop Core
- YARN
Labels:
04-20-2018
08:34 AM
Is there nobody with the same issue?
... View more
04-10-2018
12:21 PM
@ssathish I checked the source code of HDP 2.6.3.0-235 and I see that the code changes applied in the patch for https://issues.apache.org/jira/browse/YARN-7269. are already present. Do you see any other reason why this error then still happens?
... View more
04-10-2018
11:36 AM
Another question. The release notes of HDP 2.6.3 lists the patch for YARN-7269 as included in HDP 2.6.3. Should this not mean that it is included in the "official" version in the public repo? The build version in the public repo is 235. So if the patch is not included in build number 235, is the information in the release notes than not very confusing or even incorrect?
... View more
04-10-2018
11:21 AM
@ssathish Ok, thanks for the hint, that seems to be the reason! How can I find out which is the latest build number of 2.6.3.0? I checked the github repo (https://github.com/hortonworks/hadoop-release/releases) but the numbering is a bit confusing. I can see that the latest release of 2.6.3 is 2.6.3.36 but which build number is this? And is it possible to upgrade the installed build of 2.6.3 (build 235 in our case) to the latest version of 2.6.3?
... View more
04-09-2018
12:52 PM
yarn-site.xml We are running a cluster with HDP 2.6.3.0-235. YARN is configured for HA with two Resourcemanagers. When I run a spark-shell session and want to access the tracking URL of the Spark UI on the server where I started the spark-shell, I get the following error: HTTP ERROR 500
Problem accessing /. Reason:
Server Error
Caused by:
javax.servlet.ServletException: Could not determine the proxy server for redirection
at org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.findRedirectUrl(AmIpFilter.java:205)
at org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:145)
at org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)
at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:461)
at org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.spark_project.jetty.server.Server.handle(Server.java:524)
at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:319)
at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:253)
at org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:95)
at org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:745) However, the shell is working fine and I can also access the Spark UI following the link to the ApplicationMaster in the Resourcemanager UI. We didn't have this problem before and I assume it started to exist after configuring YARN for HA (using Ambari wizzard). Attached you can find the yarn-site.xml (exported with Ambari). Is there any misconfiguration or does somebody know this problem? Thx, Alex
... View more
- Tags:
- Hadoop Core
- proxy
- YARN
Labels:
04-06-2018
11:22 AM
@Sindhu
We are already using HDP 2.6.3 (upgraded from 2.6.2 several weeks ago) but anyway facing this issue. Could that also be related to something else?
In the YARN config in Ambari, I still see that config params related to non-HA are still set, like
yarn.resourcemanager.hostname=host2
HA-related configs are also present like yarn.resourcemanager.hostname.rm1=host1
yarn.resourcemanager.hostname.rm2=host2 But Amabri does not allow me to remove the old settings related to non-HA. It says that this field is required. Is that normal behavior? The YARN documentation for HA says that the old settings must be replaced with the ones related to HA.
... View more
04-05-2018
10:15 AM
We use Jupyterhub with our HDP cluster. When starting a PySpark kernel we have the problem the the link to the Spark UI is not working. When following the link we get an HTTP ERROR 500 with the following detail: javax.servlet.ServletException: Could not determine the proxy server for redirection
at org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.findRedirectUrl(AmIpFilter.java:205)
The Spark application itself is running and the Spark UI can also be accessed by following the link in the Resourcemanager UI. We are currently thinking that this problem is related with an issue in YARN in a secure HA setup: https://issues.apache.org/jira/browse/YARN-6625 This issue seems to be fixed in YARN 2.9.0 but the current version in HDP is 2.7.3. My question is: Are you aware of that issue and do you maybe know any workarounds?
... View more
Labels:
04-04-2018
12:53 PM
I updated the KDC configuration. But I had to create a realm definiton in kdc.conf as well under [realms], just putting the configuration values under [kdcdefaults] didn't help. But still, I'm confused why this is necessary at all. Why does the Metrics Collector not simply issue a new ticket instead of renewing it?
... View more
04-04-2018
09:32 AM
I modified the principals such that they can issue renewable tickets. I don't get errors now when renewing the ticket. But I'm wondering why this is necessary at all? None of the principals in the KDC can issue renewable tickets and all other services work fine. If a ticket is not renewable, the service could simply request a new ticket. Or do I misunderstand something here?
... View more
04-04-2018
07:35 AM
Hi,
we have a Kerberos secured cluster and currently facing
issues with Ambari Metrics.
After starting Ambari Metrics everythin is fine but after a
couple of days we get alerts from Ambari like this: NameNode Service RPC Processing Latency (Hourly)
Unable to retrieve metrics from the Ambari Metrics service. When I check the logs oft he Metrics Collector I can find
entries like: 2018-03-28 11:19:47,013 WARN org.apache.hadoop.security.UserGroupInformation: Exception encountered while running the renewal command for amshbase/s0202.cl.psiori.com@PSIORI.COM.
(TGT end time:1522228847000, renewalFailures:
org.apache.hadoop.metrics2.lib.MutableGaugeInt@388f50cd,renewalFailuresTotal:
org.apache.hadoop.metrics2.lib.MutableGaugeLong@7d8dc9b8)
ExitCodeException exitCode=1: kinit: KDC can't fulfill requested option while renewing credentials
at org.apache.hadoop.util.Shell.runCommand(Shell.java:954)
at org.apache.hadoop.util.Shell.run(Shell.java:855)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1163)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1257)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1239)
at org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:987)
at java.lang.Thread.run(Thread.java:745)
2018-03-28 11:19:47,014 ERROR org.apache.hadoop.security.UserGroupInformation: TGT is expired. Aborting renew thread for amshbase/s0202.cl.psiori.com@PSIORI.COM.
In the following I then see aggregation errors: 2018-03-28 11:27:08,188 INFO TimelineClusterAggregatorMinute: Started Timeline aggregator thread @ Wed Mar 28 11:27:08 CEST 2018
2018-03-28 11:27:08,189 INFO TimelineClusterAggregatorMinute: Skipping aggregation function not owned by this instance.
2018-03-28 11:27:08,205 ERROR TimelineMetricHostAggregatorHourly: Exception during aggregating metrics.
java.sql.SQLTimeoutException: Operation timed out.
at org.apache.phoenix.exception.SQLExceptionCode$14.newException(SQLExceptionCode.java:364)
at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:831)
So this seems to be related to Kerberos. When I check the
log oft he KDC there is not much info: Mar 28 11:19:47 sql.cl.psiori.com krb5kdc[879](info): TGS_REQ (8 etypes {18 17 20 19
16 23 25 26}) 10.11.1.21: TICKET NOT RENEWABLE: authtime 0,
amshbase/s0202.cl.psiori.com@PSIORI.COM
for krbtgt/PSIORI.COM@PSIORI.COM,
KDC can't fulfill requested option
...
Mar 28 11:20:48 sql.cl.psiori.com krb5kdc[879](info): AS_REQ (4 etypes {18 17 16 23}) 10.11.1.21: ISSUE: authtime 1522228848, etypes {rep=18 tkt=18 ses=18}, amshbase/s0202.cl.psiori.com@PSIORI.COM for krbtgt/PSIORI.COM@PSIORI.COM
Mar 28 11:20:48 sql.cl.psiori.com krb5kdc[879](info): TGS_REQ (4 etypes {18 17 16 23}) 10.11.1.21: ISSUE: authtime 1522228848, etypes {rep=18 tkt=18 ses=18}, amshbase/s0202.cl.psiori.com@PSIORI.COM for nn/m0201.cl.psiori.com@PSIORI.COM
When I check the principal
amshbase/s0202.cl.psiori.com@PSIORI.COM
in the KDC I get the following: Principal: amshbase/s0202.cl.psiori.com@PSIORI.COM
Expiration date: [never]
Last password change: Mo Mär 19 11:24:05 CET 2018
Password expiration date: [never]
Maximum ticket life: 1 day 00:00:00
Maximum renewable life: 0 days 00:00:00
Last modified: Mo Mär 19 11:24:05 CET 2018 (admin/admin@PSIORI.COM)
Last successful authentication: [never]
Last failed authentication: [never]
Failed password attempts: 0
Number of keys: 2
Key: vno 1, aes256-cts-hmac-sha1-96
Key: vno 1, aes128-cts-hmac-sha1-96
MKey: vno 1
Attributes:
Policy: [none]
Ist hat normal? Maximum renewable life is set to 0 so ticket
renewal is not possible. But that is also true for all other principals in the
KDC and all other services work normally.
This is the content of krb5.conf: [libdefaults]
renew_lifetime = 7d
forwardable = true
default_realm = PSIORI.COM
ticket_lifetime = 24h
dns_lookup_realm = false
dns_lookup_kdc = false
default_ccache_name = /tmp/krb5cc_%{uid}
#default_tgs_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5
#default_tkt_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5
[domain_realm]
.cl.psiori.com = PSIORI.COM
cl.psiori.com = PSIORI.COM
[logging]
default = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
kdc = FILE:/var/log/krb5kdc.log
[realms]
PSIORI.COM = {
admin_server = sql.cl.psiori.com
kdc = sql.cl.psiori.com
}
I have not applied any changes to the kdc.conf so it has the
default content: [kdcdefaults]
kdc_ports = 88
kdc_tcp_ports = 88
[realms]
EXAMPLE.COM = {
#master_key_type = aes256-cts
acl_file = /var/kerberos/krb5kdc/kadm5.acl
dict_file = /usr/share/dict/words
admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal
arcfour-hmac:normal camellia256-cts:normal camellia128-cts:normal
des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal
}
Is there any misconfiguration? Unfortunately the Hortonworks installation docu doesn't give detailed information about how to configure Kerberos KDC correctly, it just forwards to the official MIT KDC docu.
When I restart the service then everything is fine again
(for some time).
Any suggestions or help is very welcome.
Best regards, Alex
... View more
Labels: