About priyanxmail

priyanxmail · ‎05-10-2017

Thank you very much for the answer 🙂

priyanxmail · ‎05-10-2017

Thanks for the reply. But, as per my understanding, one way trust means, local KDC will trust the tickets generated by central AD. Correct? If the linux servers where Hadoop runs cannot contact AD for Kerberos, or if we have just an LDAP server in place of AD, how the Kerberos auth happens? Will I have to create user principals manually in local KDC?

priyanxmail · ‎05-08-2017

@slachterman thanks for your reply. Doesn't support means, the authentication agent used in linux systems is not able to connect to AD Kerberos ports and there are policy wise restrictions in using Kerberos on linux boxes and I know it's bit weird. So, I was thinking about some solution that doesn't depend on KDC services by AD but still having Kerberos enabled on cluster. For example, suppose we have an AD and a local MIT-KDC. Local KDC hosts service principals. AD manages cluster users. If there exists a user - A in AD, and I create a user principal A@MIT-KDC-Realm, and if there's one way trust established between AD and KDC, will user A be able to successfully launch jobs and use services on cluster ?

priyanxmail · ‎05-08-2017

Hi, In my environment, Hadoop nodes are integrated with AD for authentication. AD doesn't support Kerberos. I understand that it's possible to have users + user principals serviced by AD and have only service principals serviced by a local KDC. Question is, is it possible to set up a local KDC server for both service and user principals but actual users will reside in AD ? So, I will need to host kerberos principals and manage tickets of AD users in local KDC. AD user realm and KDC realm will also be different. Any help would be appreciated 🙂

priyanxmail · ‎04-10-2017

Looks like this is a better approach. I got some clear info from http://theckang.com/2015/remote-spark-jobs-on-yarn/ that matches your solution. Thanks much !

priyanxmail · ‎04-05-2017

@Michael M That's cool !So, this set up needs spark version > 2 ? Also, what would be the master ip and port if using Spark on YARN ? I am not a dev, so please excuse if these sound stupid 😄

priyanxmail · ‎04-05-2017

Livy is a nice option, just that we will have to make curl calls to API outside the script(?). But, something like what @Michael M sounds more interesting.

priyanxmail · ‎04-03-2017

@Kshitij Badani thanks for the reply. Forgot to mention, I am using Zeppelin and Jupyter right now. But, an IDE is more featureful and best suited in scenarios like module building. I have seen people using Spyder, pyCharm, Eclipse etc locally, but was looking to see if they could be integrated with remote multi-node Hadoop cluster.

priyanxmail · ‎04-03-2017

Has anyone ever used any python IDEs on Spark cluster ? Is there any way someone can install some python IDEs like Eclipse, Spyder etc on local windows machine and to submit spark jobs on a remote cluster via pyspark ? I could see that Spyder is available with Anaconda, but hadoop nodes where Anaconda is installed don't have GUI tools and it's not possible to see Spider UI that is initialized on remote linux edge node. Which is the best way to go about this ?

priyanxmail · ‎03-27-2017

@ccasano I set up the queues like above. Say I have 4 queues Q1 to Q4 with min 25% and max 100%. If I start a job on Q1 and it goes up to 100% utilization and later if I launch the same task on Q2, the new task will grow only up to 25% (Absolute configured capacity) and the old one will come back to 75%. Is there a way I can equally distribute the resources here ? ie, the second job should grow beyond its minimum capacity until the queue are balanced equally. Thanks in advance !

Online	Offline
Last Visited	‎04-09-2019 08:44 AM

Member Since	‎01-25-2017 01:57 PM
Last Visited	‎04-09-2019 08:44 AM
Posts	25
Kudos received	4

Cloudera Community

Re: Capacity scheduler preemption doesn't work

Re: Active directory as Directory Service and MIT ...

Re: Active directory as Directory Service and MIT ...

Re: Active directory as Directory Service and MIT ...

Active directory as Directory Service and MIT kerb...

Re: Python IDE for HDP Spark cluster

Re: Python IDE for HDP Spark cluster

Re: Python IDE for HDP Spark cluster

Re: Python IDE for HDP Spark cluster

Python IDE for HDP Spark cluster

Re: YARN Preemption with Spark using a Fair Policy