About smartninja723

drussell · ‎05-25-2016

Just as an aside, if you also happen to be a paying Hortonworks support cluster, I can't speak highly enough about SmartSense, which will analyse the configs of your cluster and provide you with performance, stability and security recommendations specific to your exact environment. This service is included in every support contract, for more info take a look at: http://hortonworks.com/services/smartsense/ There was also a recent session at the Dublin Hadoop Summit which is worth watching for general tuning suggestions and recommendations (not security specific): https://www.youtube.com/watch?v=sCB6HmfdTZ4

hellmar_becker · ‎06-07-2016

We came across a similar issue and our solution was to create a custom synchronization script which replaces the standard LDAP sync process. We define a "super-group" whose members are all groups that are visible/relevant to Hadoop. This is helpful for several reasons: It limits the group selection in Ranger itself It limits the users that are pulled into Ranger - only members of one of the relevant groups will be visible to Ranger It limits the amount of data that needs to be transfered during synchronization. (We have around 50k users in our Active Directory.) It gives us an efficient filter for LDAP queries. (We cannot filter by base DN because of AD policy.) The synchronization process knows only the DN of the super-group - it fetches that one LDAP entry; from there it determines the members, which are the authorization groups, and then the members of each authorization group, which are th authorized users.

smartninja723 · ‎05-19-2016

Sure @Jitendra Yadav

stellium · ‎09-11-2017

@Rakesh Gupta Thank you very much. Smart debug sir. Saved me my day today.

smartninja723 · ‎05-19-2016

Thanks @vshukla, @Timothy Spann, @Jitendra Yadav, @Yuta Imai

ravi1 · ‎05-26-2016

You are going to use hive account to run spark thrift server. So, if it is a manual install, then ./sbin/start-thriftserver.sh --master yarn-client --executor-memory 512m --hiveconf hive.server2.thrift.port=10015 will be run as user hive (with su hive) instead of user spark in secure setup. Similarly /var/run/spark and /var/log/spark should be read/write to hive. So, just seeing contents as user hive is not enough, you need to be able to write to those folders. One good easy way is to give 77x permissions on these folders. Since spark:hadoop is owner:group and hive belongs to group hadoop, it will have write access with this setup.

sluangsay · ‎05-19-2016

This doc is just showing you an example. Instead of using the principal "spark/blue1@EXAMPLE.COM", we could also consider using the principal "app1/blue1@EXAMPLE.COM" for one application, then using "app2/blue1@EXAMPLE.COM" for a second application etc.

niravcp · ‎02-02-2018

with hiveserver2 you can also submit job on spark if spark is configured as execution engine in hive, right?

smartninja723 · ‎03-17-2016

Well found the solution 🙂 posting answer in case if others face this issue in future. following line was missing: import sqlContext.implicits._

smartninja723 · ‎05-24-2016

Thanks @Ancil McBarnett.

Online	Offline
Last Visited	‎08-14-2019 10:39 AM

Member Since	‎02-24-2016 02:02 PM
Last Visited	‎08-14-2019 10:39 AM
Posts	175
Kudos received	56

Cloudera Community

Re: HDPCA Practice Exam VM not able to connect

Re: Can we not have HS2 and Spark Thrift Server (S...

Re: Weird error while converting RDD[CaseClass] to...

Re: Best Practices on Ranger, Ranger KMS and Knox

Re: How to restrict the groups seen in Ranger?

Re: Can we see sparkSQL/DF temp tables in beeline?

Re: Can we not have HS2 and Spark Thrift Server (S...

Re: Spark YARN Configuration on HDP 2.4 Recommenda...

Re: Confusion in documentation : Configuring the S...

Re: How to restrict valid users from submitting Sp...

Re: Why do we need to setup Spark Thrift Server?

Re: Weird error while converting RDD[CaseClass] to...

Re: How to configure multiple domains with Apache ...