We are trying to build a setup where we have a server that submits jobs of different users to the Livy server via the REST API. We established a kerberos server to authenticate against livy. But we want to prohibit the users to access a different users' data, the filesystem, and the network.
My question would then be, how secure is livy? Users can inject custom code to run on livy, but this gives them the ability to access the filesystem on the host the livy server resides in. Even if we run livy with a different unix user, that has very little permissions on the filesystem, that could be potentially dangerous from my point of view, they could potentially access the keytab on the livy server also. And they could also potentially inject malware and run it.
I know that the session created creates also a JVM, so one session lives in a JVM, and it is impossible to see another session's data etc. without having the kerberos ticket, but could I change the security settings of that JVM to only access specific paths and specific IP addresses only? Would that mean for me to change the source code of livy?
And in the case of using HDFS with active directory to secure the datasystem, so that users need to specify a kerberos key to access their files, how could I manage multiple principals in one server, to get this working?
My conf file is as below:
livy.environment production livy.impersonation.enabled true livy.server.csrf_protection.enabled true livy.server.port 8999 livy.server.session.timeout 3600000 livy.server.auth.kerberos.keytab /home/harun/Documents/incubator-livy/keytabs/new.keytab livy.server.auth.kerberos.principal HTTP/livyserver.local@EXAMPLE.COM livy.server.auth.type kerberos #livy.server.launch.kerberos.keytab /home/harun/Documents/incubator-livy/conf/livy.headless.keytab #livy.server.launch.kerberos.principal livy@EXAMPLE.COM livy.server.access_control.enabled = true livy.server.access_control.users = livy livy.superusers=livy
PS: Does enabling launch.kerberos provide additional security to protect the keytab?
Any help to any of the questions is very much appriciated,
Thanks in forehand
By default Livy will launch an application on yarn, and usually the default master is set to yarn-cluster. This means and authenticated user could push code that could potentially run on any cluster worker nodes that have a running node manager. This containers are lunched by yarn, and the container process is always owned by the caller user (on this case the user that made the request to livy)
So this container process will be running as the caller user and only have access to the caller users authorized resources. There is no way a user could read a keytab from the /etc/security/keytab directory.
Same happens with HDFS data, unless this user has permissions to the files, user wont be able to access those. And this is valid also without Livy, as a user could use hdfs/webhdfs client to read data directly.
At the same time there are other ways to push application code which are not limited to Livy. Like using spark-submit/spark-shell. Which work in similar fashion except perhaps those tend to be used from edge nodes on which only few users have access to.
Having all that said, if you like to restrict access to Livy and not only rely on authentication. Look for Knox, Livy and Ranger integration to achieve this. This way you could reduce the number of users that use Livy's rest api by authorizing only specific groups/users.
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.