Member since
01-25-2017
25
Posts
4
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4729 | 03-27-2017 07:57 AM |
05-10-2017
07:32 PM
Thank you very much for the answer 🙂
... View more
05-10-2017
06:42 AM
Thanks for the reply. But, as per my understanding, one way trust means, local KDC will trust the tickets generated by central AD. Correct? If the linux servers where Hadoop runs cannot contact AD for Kerberos, or if we have just an LDAP server in place of AD, how the Kerberos auth happens? Will I have to create user principals manually in local KDC?
... View more
05-08-2017
06:43 PM
@slachterman thanks for your reply. Doesn't support means, the authentication agent used in linux systems is not able to connect to AD Kerberos ports and there are policy wise restrictions in using Kerberos on linux boxes and I know it's bit weird. So, I was thinking about some solution that doesn't depend on KDC services by AD but still having Kerberos enabled on cluster. For example, suppose we have an AD and a local MIT-KDC. Local KDC hosts service principals. AD manages cluster users. If there exists a user - A in AD, and I create a user principal A@MIT-KDC-Realm, and if there's one way trust established between AD and KDC, will user A be able to successfully launch jobs and use services on cluster ?
... View more
05-08-2017
02:39 PM
1 Kudo
Hi, In my environment, Hadoop nodes are integrated with AD for authentication. AD doesn't support Kerberos. I understand that it's possible to have users + user principals serviced by AD and have only service principals serviced by a local KDC. Question is, is it possible to set up a local KDC server for both service and user principals but actual users will reside in AD ? So, I will need to host kerberos principals and manage tickets of AD users in local KDC. AD user realm and KDC realm will also be different. Any help would be appreciated 🙂
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Kerberos
04-10-2017
11:53 AM
Looks like this is a better approach. I got some clear info from http://theckang.com/2015/remote-spark-jobs-on-yarn/ that matches your solution. Thanks much !
... View more
04-05-2017
08:22 PM
@Michael M That's cool !So, this set up needs spark version > 2 ? Also, what would be the master ip and port if using Spark on YARN ? I am not a dev, so please excuse if these sound stupid 😄
... View more
04-05-2017
08:18 PM
1 Kudo
Livy is a nice option, just that we will have to make curl calls to API outside the script(?). But, something like what @Michael M sounds more interesting.
... View more
04-03-2017
07:00 PM
@Kshitij Badani thanks for the reply. Forgot to mention, I am using Zeppelin and Jupyter right now. But, an IDE is more featureful and best suited in scenarios like module building. I have seen people using Spyder, pyCharm, Eclipse etc locally, but was looking to see if they could be integrated with remote multi-node Hadoop cluster.
... View more
04-03-2017
02:30 PM
Has anyone ever used any python IDEs on Spark cluster ? Is there any way someone can install some python IDEs like Eclipse, Spyder etc on local windows machine and to submit spark jobs on a remote cluster via pyspark ? I could see that Spyder is available with Anaconda, but hadoop nodes where Anaconda is installed don't have GUI tools and it's not possible to see Spider UI that is initialized on remote linux edge node. Which is the best way to go about this ?
... View more
Labels:
- Labels:
-
Apache Spark
03-27-2017
09:01 AM
@ccasano I set up the queues like above. Say I have 4 queues Q1 to Q4 with min 25% and max 100%. If I start a job on Q1 and it goes up to 100% utilization and later if I launch the same task on Q2, the new task will grow only up to 25% (Absolute configured capacity) and the old one will come back to 75%. Is there a way I can equally distribute the resources here ? ie, the second job should grow beyond its minimum capacity until the queue are balanced equally. Thanks in advance !
... View more