Hello, I am trying to explore the sandbox from a security perspective. This is something that I have been having much trouble with lately. This is purely for academic purposes only, I do not intend to target any live systems.
I have been exploring Hadoop Security as a research initiative for my thesis. I wanted to try and find some place in the sandbox of Hadoop ecosystem that may be potentially unsecure. I have been using the penetration tools in Kali Linux in an attempt to find any possible vulnerabilities. I have not found any. Can someone please point me in the right direction? Perhaps a theoretical approach to how the infrastructure could become potentially unsecure? Maybe an area I can explore that if left unconfigured could be a potential security risk. Also if you have any links that would help I would greatly appreciate it. Is their any way that with ip tables and selinux disabled, there may be a potential security risk? Perhaps any part of the ecosystem mis-configured that I can use some pen-testing tools on to see if that scenario works? I have been reading about Hadoop security and the many vulnerabilities that may pose risk, but I want to be able to actually try it out and see if that is the case. Otherwise come up with a theoretical approach to how a certain vulnerability can be exploited in the wrong hands. Thank you for your time.
@Michael Brown the vanilla sandbox is unsecured so one idea could be is to first secure it by: setting up kerberos (for authorization), authorization policies (via Ranger), perimeter security (via Knox), HDFS encryption (via Ranger KMS). Full details are available at http://docs.hortonworks.com -> HDP -> (version) -> security
Since it is unsecured, can you recommend an attack vector or someway to compromise the sandbox? Perhaps through one of the ports?What essentially can I exploit even without kerberos.
This is very simple. Install Hadoop Client on your laptop. Point the client config to HDP (Namenode) you will be able to access the data without any authentication.
Please read the Hortonworks documentation on securing HDP. This covers Authentication, authorization and at Rest Security.
What would be interesting is if you can poke holes to the above security and break it and share with us.
One thing to look at is the below - note that this feature is only applicable to sandbox ("shell in a box" is not installed when you install HDP via Ambari)
If you think of the standard attacks on a server they are
-DoS/DDoS; There's not much in the way of defences here...I think there's a facebook patch for detecting and rejecting a faulty client from generating too many requests off a core service (like the namenode). Note that code running on a Hadoop cluster can act as a DoS system against other bits of your infra; its why things like Caching DNS in every node is popular in some large deployments.
-SQL injection. Go for the web services there.
-Privilege escalation: run your code as a user, aim to go root or become another user.
-Object deserialization attacks. Research them if you don't know of the topic, then look at where Kryo and java serialization are used in Hive and Spark.
-Anything you can think of to go after the krypto systems.
Keeping HDFS data secure and safe should be the key goal; focus on that, credential bypassing, privilege escalation, etc. If you find something new, do let us know, and email@example.com / whichever app you've got into.