Member since
01-09-2019
401
Posts
163
Kudos Received
80
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2016 | 06-21-2017 03:53 PM | |
3081 | 03-14-2017 01:24 PM | |
1952 | 01-25-2017 03:36 PM | |
3121 | 12-20-2016 06:19 PM | |
1545 | 12-14-2016 05:24 PM |
08-08-2016
04:17 AM
So, you have one dead node where 50010 is already taken up by some process, so datanode is not starting. It could be a case on datanode process not shutting down cleanly. You can get the process id from netstat and see if kill -9 clears that port.
... View more
08-08-2016
04:14 AM
How big are your dimension tables? For best speed, some denormalization will help. However, with various improvements to hive and if your dimension tables are small enough for map join, you may not see a lot of difference between the two.
... View more
08-03-2016
02:36 PM
Best way to change dfs.datanode.data.dir is to decommission the datanode, change the value and add it back. This would avoid directly access files and moving them. You could move the contents from directory to another (without decommission) but not recommended. For namenode metadata directories, you can change this value and restart standby NN first. Once Standby NN comes back (and it will start using new directory), then restart active NN. It picks the new directory on restart as well. Once both NNs are up, you can clear the contents of the older metadata directories.
... View more
07-27-2016
08:49 PM
2 Kudos
For smaller distcp jobs, I think setup time on dynamic strategy will be longer than for the uniform size strategy. And if all maps are running at similar speeds, then you won't gain much using dynamic strategy and lose the setup time. However, not all maps run at similar speeds. With dynamic strategy, slower running maps will get to process less data and faster running maps process more data. I haven't got the exact amount of data where one works better than other, but in general on larger datasets and on heterogenous cluster (not all workers are same hardware), dynamic strategy has advantage.
... View more
07-09-2016
02:24 AM
@zblanco can knox sso use an existing kerberos ticket to authenticate?
... View more
07-07-2016
08:20 PM
Thanks. If you already have a jira for this, please post so we can keep track.
... View more
07-07-2016
07:52 PM
Running into below error on a secure cluster with KMS configured. Surprising part is that this exception comes only for one user who is no different from other users not running into this. Any thoughts on when this error can occur ? java.io.IOException: java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:892)
at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86)
at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2291)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:206)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242)
at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
at org.apache.spark.api.java.JavaRDDLike$class.partitions(JavaRDDLike.scala:64)
at org.apache.spark.api.java.AbstractJavaRDDLike.partitions(JavaRDDLike.scala:46)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1727)
at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:874)
... 30 more
Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: Authentication failed, status: 400, message: Bad Request
at org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:274)
at org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:77)
at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:214)
at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
at org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:285)
at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:166)
at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:371)
at org.apache.hadoop.crypto.key.kms.KMSClientProvider$2.run(KMSClientProvider.java:879)
at org.apache.hadoop.crypto.key.kms.KMSClientProvider$2.run(KMSClientProvider.java:874)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
... 31 more
... View more
Labels:
- Labels:
-
Apache Hadoop
07-07-2016
07:18 PM
Working with AD integration with LDAP and wondering if Ambari can use existing kerberos ticket instead of an explicit login. This is an AD where users don't have passwords but hardware key based passcodes. Once they login to their system, they have a valid kerberos ticket from AD. Can this be used instead of asking user to login?
... View more
Labels:
- Labels:
-
Apache Ambari
07-07-2016
04:24 PM
I am not sure what they mean by ORC not being a general purpose format. Anyway, in this case, you are still going through HCatalog (there are HCatalog APIs for MR and Pig). When I said you can transform this data as necessary, I mean things like creating new Partitions, Buckets, Sorting, Bloom filters and even redesigning tables for better access. There will be data duplication with any data transforms if you want to keep raw data as well.
... View more
07-07-2016
12:42 PM
1 Kudo
Avoid using root user to run jobs in kerberized cluster. There is a configuration to set minimum uid and unless you have specific requirement to run as root, don't use it. The parameter is min.user.id and default is 1000. Linux super users generally end up below 1000 uids.
... View more