About ravi1

ravi1 · ‎08-08-2016

So, you have one dead node where 50010 is already taken up by some process, so datanode is not starting. It could be a case on datanode process not shutting down cleanly. You can get the process id from netstat and see if kill -9 clears that port.

ravi1 · ‎08-08-2016

How big are your dimension tables? For best speed, some denormalization will help. However, with various improvements to hive and if your dimension tables are small enough for map join, you may not see a lot of difference between the two.

ravi1 · ‎08-03-2016

Best way to change dfs.datanode.data.dir is to decommission the datanode, change the value and add it back. This would avoid directly access files and moving them. You could move the contents from directory to another (without decommission) but not recommended. For namenode metadata directories, you can change this value and restart standby NN first. Once Standby NN comes back (and it will start using new directory), then restart active NN. It picks the new directory on restart as well. Once both NNs are up, you can clear the contents of the older metadata directories.

ravi1 · ‎07-27-2016

For smaller distcp jobs, I think setup time on dynamic strategy will be longer than for the uniform size strategy. And if all maps are running at similar speeds, then you won't gain much using dynamic strategy and lose the setup time. However, not all maps run at similar speeds. With dynamic strategy, slower running maps will get to process less data and faster running maps process more data. I haven't got the exact amount of data where one works better than other, but in general on larger datasets and on heterogenous cluster (not all workers are same hardware), dynamic strategy has advantage.

ravi1 · ‎07-09-2016

@zblanco can knox sso use an existing kerberos ticket to authenticate?

ravi1 · ‎07-07-2016

Thanks. If you already have a jira for this, please post so we can keep track.

ravi1 · ‎07-07-2016

Running into below error on a secure cluster with KMS configured. Surprising part is that this exception comes only for one user who is no different from other users not running into this. Any thoughts on when this error can occur ? java.io.IOException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:892) at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86) at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2291) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:206) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:240) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242) at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:240) at org.apache.spark.api.java.JavaRDDLike$class.partitions(JavaRDDLike.scala:64) at org.apache.spark.api.java.AbstractJavaRDDLike.partitions(JavaRDDLike.scala:46) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1727) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:874) ... 30 more Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: Authentication failed, status: 400, message: Bad Request at org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:274) at org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:77) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:214) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128) at org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:285) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:166) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:371) at org.apache.hadoop.crypto.key.kms.KMSClientProvider$2.run(KMSClientProvider.java:879) at org.apache.hadoop.crypto.key.kms.KMSClientProvider$2.run(KMSClientProvider.java:874) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) ... 31 more

ravi1 · ‎07-07-2016

Working with AD integration with LDAP and wondering if Ambari can use existing kerberos ticket instead of an explicit login. This is an AD where users don't have passwords but hardware key based passcodes. Once they login to their system, they have a valid kerberos ticket from AD. Can this be used instead of asking user to login?

ravi1 · ‎07-07-2016

I am not sure what they mean by ORC not being a general purpose format. Anyway, in this case, you are still going through HCatalog (there are HCatalog APIs for MR and Pig). When I said you can transform this data as necessary, I mean things like creating new Partitions, Buckets, Sorting, Bloom filters and even redesigning tables for better access. There will be data duplication with any data transforms if you want to keep raw data as well.

ravi1 · ‎07-07-2016

Avoid using root user to run jobs in kerberized cluster. There is a configuration to set minimum uid and unless you have specific requirement to run as root, don't use it. The parameter is min.user.id and default is 1000. Linux super users generally end up below 1000 uids.

Online	Offline
Last Visited	‎12-18-2021 05:54 PM

Member Since	‎01-09-2019 05:01 PM
Last Visited	‎12-18-2021 05:54 PM
Posts	401
Kudos received	163

Cloudera Community

Re: 2 hosts not running master services

Re: ambari restart and service restart updating kr...

Re: How to automate sqoop incremental import using...

Re: Path to core-site.xml in sandbox?

Re: Curious to know why majority of the people are...

Re: Data node down java.net.BindException: Addres...

Re: Data Modeling in Big Data - Star schema into H...

Re: how do I change namenode and datanode dir for ...

Re: When to use distcp -strategy dynamic and why i...

Re: Ambari SSO

Re: Ambari SSO

UndeclaredThrowableException when running mapreduc...

Ambari SSO

Re: Revisited : Import to Hive or HDFS ?

Re: Getting error when connecting to hive CLI fro...