Member since
09-26-2015
135
Posts
85
Kudos Received
26
Solutions
About
Steve's a hadoop committer mostly working on cloud integration
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3343 | 02-27-2018 04:47 PM | |
5864 | 03-03-2017 10:04 PM | |
3491 | 02-16-2017 10:18 AM | |
1847 | 01-20-2017 02:15 PM | |
11808 | 01-20-2017 02:02 PM |
03-14-2016
03:17 PM
1 Kudo
use the slider kill-container command; it's how we test slider apps resilience to failure. There's also a built in chaos-monkey in slider; you can configure the AM to randomly kill containers (and/or its own). See Configuring the Chaos Monkey
... View more
03-14-2016
03:11 PM
2 Kudos
@Steven Hirsch -let me look at this 1. Which version of HDP, OS, etc? 2. are there any logs in the spark history server? 3. If you restart the spark history server, do the jobs come up as incomplete? The spark history server in spark <= 1.6 doesn't detects updates to incomplete jobs once they've been clicked on and loaded (SPARK-7889; will be fixed in Spark 2), but the UI listing complete/incomplete apps should work. If the Yarn history server is used as the back end for spark histories, then the code there will check with YARN to see if the application is still running. If the filesystem-based log mechanism is used, then the spark history server code doesn't ask yarn about application state. Instead it just plays back the file until it gets to the end: if there isn't a logged "application ended" event there, it will languish as "incomplete" forever
... View more
03-03-2016
11:44 AM
1 Kudo
Well, this is "interesting". I think it's that specific realmless principal, "HTTP/sandbox.hortonworks.com@"; you don't have a TGT ticket for that empty realm, so fail. I've heard of this before https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/terrors.html Follow the instructions there; if it makes it go away, then it's a sign that the krb5 in the sandbox needs fixing If you use kdestroy to delete the HTTP/sandbox.hortonworks.com@ ticket, what does that do? download Kdiag and give it a run before and after the curl call: https://github.com/steveloughran/kdiag . `export HADOOP_JAAS_DEBUG=true` for extra info; grab stdout and stderr into a single file, and attach. what does your /etc/krb5.conf say? Mine explicitly set dns_lookup_realm = false and dns_lookup_kdc = false set the env vars and JVM properties covered in troubleshooting, see what's being negotiated. https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/site/markdown/SecureMode.md
... View more
02-16-2016
05:33 PM
1 Kudo
note that even when running as OS user "yarn", an environment variable, "HADOOP_USER_NAME" passes the name of the account submitting the work into that process, which is then picked up by the HDFS client: the code should be able to work with HDFS directories as the submitter, with the same permissions and things. That is, as you may have guessed, completely insecure and open to abuse —for that you need to make the leap to Kerberos, I'm afraid.
... View more
02-06-2016
06:30 PM
1 Kudo
a few seconds isn't going to matter. Kerberos and the security system is fussy about clocks. you can usually set your network switch up as an NTP server, so they can all sync with that. Or turn one of your machines into the NTP server and again, make it a reference source of time. Ideally, if detached from the network, you could hook up a GPS unit and run gpsd to be as accurate as pretty much everything else on the internet
... View more
01-23-2016
05:27 AM
1 Kudo
@mkataria, thanks for this "I'm sure HDO engineering team must be looking into this for a solution, since KEYRING is the future for kerberos cache and the stuff you can do with it like keylist and etc." Hadoop uses the Kerberos libraries that come with the JVM. That's pretty problematic, because we have to delve into the private com.sun. and com.ibm. internal classes to get some stuff done, and that tends to break across major (and sometimes minor) releases. If we stay with the Oracle libraries, we're stuck with whatever they implement, which, as you've noticed, is pretty limited and out of date. You may be interested to hear of Apache Kerby, an ongoing project to have a pure Java Kerberos Domain Controller and, importantly for production Hadoop —a modern Kerberos client library, with CPU-acceleration of encryption when available (i.e. latest intel parts, as Hadoop already does for HDFS encryption), and broader protocol support. I don't think its ready for production use yet, and nobody has —yet— gone beyond looking at what it would take to switch Hadoop over to it. But be assured: people are looking at it. Finally, what was the error message you got? Was it the classic "unable to find TGT for user" kind of text, or something new? I'm trying to build a list list of common Kerberos error messages —adding a new message or a new possible cause of an existing message is something else to put in here.
... View more
01-19-2016
09:48 PM
@KRISHNANAND THOMMAND Can you email me, thats stevel at hortonworks.com , the exact spark-assembly JAR you've got with this problem. I want to make sure that I really am looking at the one you are seeing this problem. Once I've got it I'll have a look inside the JAR to see what's up. thanks
... View more
01-19-2016
09:46 PM
3 Kudos
Connection refused invariably means "there's no service at the destination". Here I'd assume that either the configuration of the KMS is wrong (it's URL), or its not currently running.
... View more
01-15-2016
01:26 AM
sorry, just catching up on this. ATS integration should be there, so I'm trying to work out why things don't work.
... View more
01-14-2016
05:57 PM
If you were logged in via kerberos when you submit work, they usually pick up your credentials, then request hadoop tokens off the various services. Try using "kdestroy" to remove your kerberos tickets and repeating your operations, to see what happens then
... View more