Member since
09-22-2016
33
Posts
3
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3774 | 04-19-2017 12:19 PM | |
281 | 02-22-2017 05:37 PM | |
4188 | 02-21-2017 02:25 PM |
10-24-2017
08:52 PM
I have an issue in our environment for AD groups via usersync: we are thinking to usersync ranger with AD; below is the issue I have: AD group name: cfyG_GG-HDP_HadoopAdmins SSD mapped group on linux machine: hadoopadmin This command yields $hdfs groups hdpadmin hdpadmin : hdpadmin hadoopadmin hadoopdev hadoopusers ------------------ Now the problem is I can save the AD group to lower case in ranger as : cfyg_gg-hdp_hadoopadmins but, if I use this group to give permission it wont work, since the linux group name is hadoopadmin, as mapped in SSSD. How can I over come this issue? any help is appreciated. Suri
... View more
Labels:
10-24-2017
08:44 PM
I have a similar issue in our environmet: we are thinking to usersync ranger with AD. below is the issue I have: AD group name: cfyG_GG-HDP_HadoopAdmins SSD mapped group on linux machine: hadoopadmin This command yields $hdfs groups hdpadmin hdpadmin : hdpadmin hadoopadmin hadoopdev hadoopusers ------------------ Now the problem is I can save the AD group to lower case in ranger as : cfyg_gg-hdp-hadoopadmins but, if I use this group to give permission it wont work, since the linux group name is hadoopadmin, as mapped in SSSD. How can I over come this issue? any help is appreciated. Suri
... View more
08-31-2017
09:23 PM
hi @Wynner, thanks for the reply. Yes, I have set nifi.security.user.login.identity.provider as ldap-provider. login-identity-providers.xml as: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<loginIdentityProviders>
<provider>
<identifier>ldap-provider</identifier>
<class>org.apache.nifi.ldap.LdapProvider</class>
<property name="Authentication Strategy">SIMPLE</property>
<property name="Manager DN">CN=Administrator,CN=Users,DC=LABHADOOP,DC=COMPANY,DC=COM</property>
<property name="Manager Password">COMPANY2017</property>
<property name="Referral Strategy">FOLLOW</property>
<property name="Connect Timeout">10 secs</property>
<property name="Read Timeout">10 secs</property>
<property name="Url">ldap://xx.xx.xx.xx:389</property>
<property name="User Search Base">CN=Users,DC=LABHADOOP,DC=COMPANY,DC=COM</property>
<property name="User Search Filter">sAMAccountName={0}</property>
<property name="Identity Strategy">USE_USERNAME</property>
<property name="Authentication Expiration">12 hours</property>
</provider>
</loginIdentityProviders> But for some reason its not prompting login page and no errors in logs. Thanks, Suri
... View more
08-31-2017
09:10 PM
I have setup the NiFi to use LDAP(AD) with no configuration issues. But It did not laoding the login page, instead its logging in anonymously like before LDAP. I did not see any issues in the log. Can some one help me fix it. Thnaks, Suri
... View more
Labels:
06-14-2017
11:03 AM
Does Hadoop (CDH and Kafka) supports IPv6? Thanks, Suri
... View more
04-19-2017
12:19 PM
Need to use this command as kafka user.
... View more
04-18-2017
11:45 AM
I am trying to integrate Kafka(2.1.x) with Sentry(CDH 5.0.1). When I ran "kafka-sentry -lr" this command i am getting the following errors. Any idea what coyld be wrong here? Note: We have enabled SSL and kerberos which are working fine. #kafka-sentry -lr SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/libs/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/libs/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 17/04/18 13:33:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/04/18 13:33:41 WARN security.UserGroupInformation: PriviledgedActionException as:user@RALM.COM (auth:KERBEROS) cause:org.apache.thrift.transport.TTransportException: Peer indicated failure: Problem with callback handler 17/04/18 13:33:41 ERROR tools.SentryShellKafka: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1711) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericServiceClientDefaultImpl$UgiSaslClientTransport.open(SentryGenericServiceClientDefaultImpl.java:99) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericServiceClientDefaultImpl.<init>(SentryGenericServiceClientDefaultImpl.java:155) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericServiceClientFactory.create(SentryGenericServiceClientFactory.java:31) at org.apache.sentry.provider.db.generic.tools.SentryShellKafka.run(SentryShellKafka.java:51) at org.apache.sentry.provider.db.tools.SentryShellCommon.executeShell(SentryShellCommon.java:241) at org.apache.sentry.provider.db.generic.tools.SentryShellKafka.main(SentryShellKafka.java:96) Caused by: org.apache.thrift.transport.TTransportException: Peer indicated failure: Problem with callback handler at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:199) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:307) at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericServiceClientDefaultImpl$UgiSaslClientTransport.baseOpen(SentryGenericServiceClientDefaultImpl.java:115) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericServiceClientDefaultImpl$UgiSaslClientTransport.access$000(SentryGenericServiceClientDefaultImpl.java:71) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericServiceClientDefaultImpl$UgiSaslClientTransport$1.run(SentryGenericServiceClientDefaultImpl.java:101) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericServiceClientDefaultImpl$UgiSaslClientTransport$1.run(SentryGenericServiceClientDefaultImpl.java:99) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) ... 6 more The operation failed. Message: Peer indicated failure: Problem with callback handler
... View more
02-22-2017
05:37 PM
You achieve it by setting appropriate value: in yarn-site.xml yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds Then yarn will aggreagate the logs for the running jobs too. https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml Suri
... View more
02-22-2017
05:35 PM
I would like to index or search Yarn Aggregated application logs using Solr. Since the aggregated files are stored as t-file format, SOLR was not able to read them to index. Is there a way to index these YARN log aggregated files? Any help would be appreciated. Suri
... View more
Labels:
02-22-2017
09:29 AM
I would like to index or search Yarn Aggreagated application logs using Solr(Cloudera Search). Since the aggreagated files are stored as t-file, SOLR was not able to read them to index. Is there a way to index these YARN log aggregated files? Any help would be appreciated. Suri
... View more
02-22-2017
07:22 AM
I have Spark Streaming jobs running on the cluster, When I want to see te container logs, its too slow to load in the esourceManager WebUI. Is there a optimum size to set up the file size and is there a best way to look at the streaming job logs other than ResourceManager WebUI? Suri
... View more
02-21-2017
04:35 PM
Hi, We need to find a way to maintain and search logs for the Long running Sprk streaming jobs on YARN. We have Log aggregation disabled in our cluster. We are thinking about Solr/Elastic search and may be Flume or Kafka to read the Sprk job logs. any suggestions on how to implement search the on these logs and easily manage them? Thanks, Suri
... View more
Labels:
02-21-2017
02:25 PM
You achieve it by setting appropriate value: in yarn-site.xml yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds Then yarn will aggreagate the logs for the running jobs too. https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml Suri
... View more
02-21-2017
02:17 PM
Thank you, I Will try it out.
... View more
02-21-2017
01:53 PM
@mbigelow but from some other sources they said "set the yarn.log-aggregation.retain-check-interval-seconds to specify how often the log retention check should be run. By default, it is one-tenth of the log retention time" - What I understood from this was, it will only check for the retenstion and may not aggregate the logs based on that interval. Did I understood it correct? Suri
... View more
02-21-2017
01:49 PM
The documentation for YARN log aggregation says that logs are aggregated after an application completes. Does this rule out the applicability of YARN log aggregation for Spark streaming jobs because in theory streaming jobs run for a much longer duration and potentially don't ever terminate. I want to get the Spark Streaming jobs into HDFS before the job completes; Since Streaming jobs runs forever. Is there a good way to get Spark log data into HDFS? Suri
... View more
02-21-2017
01:46 PM
Thanks, @mbigelow. So, if I set yarn.log-aggregation.retain-check-interval-seconds to 60 Seconds, It will send the logs to HDFS (every 60 seconds) even when the job was not finished? (Since streaming jobs run forever) Suri
... View more
02-21-2017
01:15 PM
The documentation for YARN log aggregation says that logs are aggregated after an application completes. Streaming jobs run for a much longer duration and potentially don't ever terminate. I want to get the logs into HDFS for my streaming jobs before the application completes or terminates. What are the better ways to do it, since Log aggregation only do it after the jobs are completed. Suri
... View more
02-21-2017
08:33 AM
We eant to searh for key phrases and at the same time we want developers to look in to the raw logs too for their troubleshooting and alerts for specific errors.
... View more
02-21-2017
07:26 AM
@mbigelow You are right. We turned it off because of the long runnig jobs. Do you know any other ways to implement log serach other than Solr/elastic? Suri
... View more
02-20-2017
04:53 PM
Hi, We need to find a way to maintain and search logs for the Long running Sprk streaming jobs on YARN. We have Log aggregation disabled in our cluster. We are thinking about Solr/Elastic search and may be Flume or Kafka to read the Sprk job logs. any suggestions on how to implement search the on these logs and easily manage them? Thanks, Suri
... View more
12-17-2016
11:10 AM
1 Kudo
Hi Damion, can you tell us how you were able to solve this issue? Thanks, Suri
... View more
11-22-2016
05:38 PM
1 Kudo
As of now HDFS does not create the home directories with AD integration . One way to create home directories automatically is with HUE. Hue has an option to create home directory while setting up the users to use Hue. If the home directory is already present Hue will ignore that. Thanks, Suri
... View more
11-14-2016
03:14 AM
Thank you, srai.
... View more
11-09-2016
07:41 PM
What are different measures and best practices for Capacity Planning my Hadoop cluster? We are planning to have large amounts of data coming in, so we would like to maintain best practices in capacity planning and hardware to grow the cluster. Please advice on this. Thanks, Suri
... View more
Labels:
11-09-2016
12:37 PM
Why we need to Configure Encrypted Client/Server Communication Using TLS/SSL for HiveServer2. What are the use cases for this scenario. Also, Please provide how to cofigure it without any issue. I alreadu found some information here: http://www.cloudera.com/documentation/enterprise/latest/topics/sg_hive_encryption.html#concept_tp1_whc_dr Thanks, Suri
... View more
11-09-2016
11:37 AM
I am not sure the the board related topic I selected is right: But my question is regarding Capacity planning the clusters. What are different measures and best practices for Capacity Planning my Hadoop cluster? We are planning to have large amounts of data coming in, so we would like to maintain best practices in capacity planning and hardware to grow the cluster. Please advice on this. Thanks, Suri
... View more
Labels:
10-02-2016
03:33 PM
Timothy, Thank you for your response. But I am looking for best ways to replicate HDFS also using Distcp. Suri
... View more