Community Articles

kgautam · ‎11-08-2017

Distributed System concepts are derived from the concepts working on a single Operating System (machine) , hence the motivation would be to understand the aspects on an Operating System which will help us understand the bigger and complex architecture of distributed system.

Main Idea	OS concepts	Distributed System concept
resource	CPU , RAM , Network	YARN
filesystem	NTFS , ext3	HDFS
process	java , perl, python process	SPARK , MR
database	mysql	NoSql
authentication	PAM module	Knox
authorization	NSS module	Ranger

Holistically Security is based on foundation of

Authentication
Authorization
Audit

Securely exchanging data during the above mentioned process is based on the concept of cryptography.

Symmetric-key

Same key is used to encrypt and decrypted the message .
Example : AES

2. Asymmetric-key

Message can be encrypted by public key and decrypted by private key or vice-versa.
A message can never be encrypted and decrypted by private key only or Public key only.
Private key is kept in a safe location and public key is distributed.
A message encrypted with private key can be decrypted by any distributed public key
Example : RSA encryption algorithm (PKCS#1), RSA encryption algorithm (PKCS#1)

3. Hashing

For a given message a consistent hash is generated
From the hash the message can never be retrieved
Example SHA DSA

===============================================================================================

Lets focus on systems and mechanism which enable authentication and authorization on OS and of Services .

ssl
ssh
Ldap
kerberos
PAM
NSS.
SSSD

1. SSL

Server has a public-private key pair . Public key is shared with any client that wants to interact with server.
Certificate Authority has a public-private key pair . The public key is distributed with anyone who wants it .
Please check your browser Tools -> page info -> security on firefox to see the public key of known Certificate Authority
Server gets a Certificate issued from the CA , CA takes the public key of server, url ... and generates a HASH out of it. It then encrypts the hash with its own private key.
The server sends this certificate (which has the public key) to client , client can calculate the hash , it decrypts the encrypted hash present in the certificate and compares them, to check validity of the publick key.
Client generates a random symmetric key and encrypts it with server public key which can only be encrypted by the server. The symmetric key is used for subsequent messages encryption.
Asymmetric key is only used for intial exchange of symmetric key. Do remember server cannot encrypt any data with its own private key as it can be decrypted by anyone having its public key
Symmetric key is used for secure data transfer.

KeyStore (Server side , private key + signed publickey certificate ) and Trustore (client side +. CA public key certificate)

1)First and major difference between trustStore and keyStore is that trustStore is used by TrustManager and keyStore is used by KeyManager class in Java. KeyManager and TrustManager performs different job in Java, TrustManager determines whether remote connection should be trusted or not i.e. whether remote party is who it claims to and KeyManager decides which authentication credentials should be sent to the remote host for authentication during SSL handshake. if you are an SSL Server you will use private key during key exchange algorithm and send certificates corresponding to your public keys to client, this certificate is acquired from keyStore. On SSL client side, if its written in Java, it will use certificates stored in trustStore to verify identity of Server. SSL certificates are most commonly comes as .cer file which is added into keyStore or trustStore by using any key management utility e.g. keytool. See my post How to add certificates into trustStore for step by step guide on adding certificates into keyStore or trustStore in Java.

2) Another difference between trustStore and keyStore in rather simple terms is that keyStore contains private keys and required only if you are running a Server in SSL connection or you have enabled client authentication on server side. On the other hand trustStore stores public key or certificates from CA (Certificate Authorities) which is used to trust remote party or SSL connection.

Read more: http://javarevisited.blogspot.com/2012/09/difference-between-truststore-vs-keyStore-Java-SSL.html#ix...

2. SSH

1. Server has private-public key pair. When a client connects it fetches public key from the server.
2. Client has to accept the servers public key which eventually gets saved in the known_hosts file.
3. Client and server finalize on a symmetric key using classic Diffe-Hellman algorithm.
4. Please note using the above algorithm the symmetric key is known to both without ever being sent on wire.
5. Client sends password encrypted using the symmetric key for authentication .

For password less authentication
1. Client generates a public-private key pair. Client public key is manually placed in authorized_key file of server.
2. In this case Server generates a random number encrypt with public key of client present in authorized_key and send it to client.
3. Client is able to decrypt that using its private key and re-encrypt using the syymetric key and send it back to server.
4. Server decrypts using the symmetric key and if found same as the original number , passwordless authentication succeeds.

5. For testing use testLink

3. Kerberos.

1. KDC is Key Distribution center which also has component of AS (Authentication Server) and TGS (Tickket Granting Server).
2. Client password is manually saved in KDC .
3. Client password is never sent over the network.
4. Client username is sent to initiate the process of interaction between client and Authentication Server.
5. AS send client a symmetric key encrypted with client password.
6. A Kerberos realm is a set of managed nodes that share the same Kerberos database.

4. LDAP :

The Lightweight Directory Access Protocol (LDAP; /ˈɛl) is an open, vendor-neutral, industry standard application protocol for accessing and maintaining distributed directory information services over an Internet Protocol (IP) network.^[1] Directory services play an important role in developing intranet and Internet applications by allowing the sharing of information about users, systems, networks, services, and applications throughout the network.

Most important terms used are :

1. DN - distinguished name (unique path)
2. OU - Organizational Unit department
3. DC - Domain Component (not domain controller for once) com org
4. CN - Common Name end

================================================================================================

Authentication and authorization in OS

PAM
NSS.
SSSD

1. PAM :

PAM is a framework that assists applications in performing what I'll call "authentication-related activities". The core pieces of PAM are a library (libpam) and a collection of PAM modules, which are dynamically linked libraries (.so) files in the folder /lib/security. PAM configuration files are stored in the /etc/pam.d/ directory.

2. NSS

The Name Service Switch (NSS) is a facility in Unix-like operating systems that provides a variety of sources for common configuration databases and name resolution mechanisms. These sources include local operating system files (such as /etc/passwd, /etc/group, and /etc/hosts), the Domain Name System (DNS), the Network Information Service (NIS), and LDAP.

NSS depends on groups passwd and shadow file for authorization.

Groups : https://www.cyberciti.biz/faq/understanding-etcgroup-file/
Shadow: https://www.cyberciti.biz/faq/understanding-etcshadow-file/
Passed : https://www.cyberciti.biz/faq/understanding-etcpasswd-file-format/

Both PAM and NSS can be linked to LDAP. LDAP also has a independent ldap client which can also be used to access LDAP.

There is a possibility that a user doesn't exist locally on a Operating system but exist in LDAP .

To helps to break things down like this in your head:

NSS - A module based system for controlling how various OS-level databases are assembled in memory. This includes (but is not limited to) passwd, group, shadow (this is important to note), and hosts. UID lookups use the passwd database, and GID lookups use the group database.
PAM - A module based system for allowing service based authentication and accounting. Unlike NSS, you are not extending existing databases; PAM modules can use whatever logic they like, though shell logins still depend on the passwd and group databases of NSS. (you always need UID/GID lookups)

The important difference is that PAM does nothing on its own. If an application does not link against the PAM library and make calls to it, PAM will never get used. NSS is core to the operating system, and the databases are fairly ubiquitous to normal operation of the OS.

Now that we have that out of the way, here's the curve ball: while pam_ldap is the popular way to authenticate against LDAP, it's not the only way.

If shadow is pointing at the ldap service within /etc/nsswitch.conf, any authentication that runs against the shadow database will succeed if the attributes for those shadow field mappings (particularly the encrypted password field) are present in LDAP and would permit login.
- This in turn means that pam_unix.so can potentially result in authentication against LDAP, as it authenticates against the shadow database. (which is managed by NSS, and may be pointing at LDAP)
If a PAM module performs calls against a daemon that in turn queries the LDAP database (say, pam_sss.so, which hooks sssd), it's possible that LDAP will be referenced.

SSSD trouble shooting.

The sssd daemon acts as the spider in the web, controlling the login process and more. The login program communicates with the configured pam and nss modules, which in this case are provided by the SSSD package. These modules communicate with the corresponding SSSD responders, which in turn talk to the SSSD Monitor. SSSD looks up the user in the LDAP directory, then contacts the Kerberos KDC for authentication and to aquire tickets.

(PAM and NSS can also talk to LDAP directly using pam_ldap and nss_ldap respectively. However SSSD provides additional functionality.)

Of course, a lot of this depends on how SSSD has been configured; there lots of different scenarios. For example, you can configure SSSD to do authentication directly with LDAP, or authenticate via Kerberos.

The sssd daemon does not actually do much that cannot be done with a system that has been "assembled by hand", but has the advantage that it handles everything in a centralised place. Another important benefit of SSSD is that it caches the credentials, which eases the load on servers and makes it possible to go offline and still login. This way you don't need a local account on the machine for offline authentication.

In a nutshell SSSD is able to provide what nss_ldap, pam_ldap, and pam_krb, and ncsd used to provide in a seamless way.

Please follow this Link to start digging how Authentication ,Authorization and audit is provided for a cluster.

Please do keep in mind that there are multiple ways to log onto the cluster and hence all the paths needs to made secured.
1. Ambari views
2. ssh onto a node
3. Login to a node through OS UI
4. Knox .

All of the component should talk to a LDAP to maintain a predefined set of user and provide authorization and authentication using Ranger and Knox.

PFA : sssd.pdf

Cloudera Community

Community Articles

Understanding Security basic for dummies

Apache Ambari

Apache Hadoop

Apache Knox

Apache Ranger

Apache Spark

Apache YARN

HDFS

Kerberos

Security

Understanding basics of HDFS and YARN

Understanding Tez Application submission and its f...

Understanding how NiFi's Content Repository Archiv...

Phoenix Index Basics - Part 1

Understanding Solr Architecture and Best practices

Hardening Apache ZooKeeper Security Part 2: TLS en...

Understanding Apache ZooKeeper Connection Rate Lim...

Comprehensive understanding of "No GC" pauses in h...

Understanding Linear Regression

Building Basic Flows with Nipyapi