About egarelnabi

egarelnabi · ‎10-25-2016

Take a look at the link below for a detailed explanation. In short though, yes, a database dump and load of the keys is necessary using the provided "exportKeysToJCEKS.sh" and "importKeysToJCEKS.sh" scripts. https://community.hortonworks.com/articles/51909/how-to-copy-encrypted-data-between-two-hdp-cluster.html

egarelnabi · ‎10-20-2016

@Normen Zoch I'm assuming you're using Ambari 2.4.0 with HDP 2.5.0. Try the latest Ambari, version 2.4.1. This is a maintenance release that fixes some stability issues with 2.4.0. Ambari 2.4.1 Repo: http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.1.0/bk_ambari-installation/content/ambari_repositories.html Ambari 2.4.0.1 & 2.4.1 Release Notes: http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.1.0/bk_ambari-release-notes/content/ambari_relnotes-2.4.1.0-fixed-issues.html Ambari 2.4.1 Documentation: https://docs.hortonworks.com/HDPDocuments/Ambari/Ambari-2.4.1.0/index.html

egarelnabi · ‎10-19-2016

This is due to an issue in the docker HDP 2.5 Sandbox, which will be addressed with the next Sandbox release. In the meantime, try the solution in the link below: https://community.hortonworks.com/questions/62271/unable-to-add-apache-nifi-in-ambari.html

egarelnabi · ‎10-19-2016

Yes. Rather than recreating users from scratch though, you can synchronize your local LDAP with you corporate AD. Having said that, especially when it comes to security, you'll be governed by your organization's policies regarding what you can and can't do more-so than the technical aspects.

egarelnabi · ‎10-18-2016

First let's clarify the difference between LDAP and AD. LDAP is an application protocol for querying and modifying items in directory service providers (e.g Active Directory). AD is a directory services provider that supports the LDAP protocol amongst others. https://jumpcloud.com/blog/difference-between-ldap-and-active-directory/ 1) what is the use of Having Ldap between AD, Hadoop and kerberos integration ? You wouldn't actually have an LDAP provider, you would just use the LDAP protocol to talk to AD 2) What is the advantage and Disadvantage on Integrating AD and hadoop and kerbores without LDAP? See answer above. You only use the LDAP protocol, not an LDAP directory service provider to connect to AD 3) what is difference between implementing MIT KDC and Direct AD setup? You can go with either https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_Security_Guide/content/_installing_and_configuring_the_kdc.html A very general rule-of thumb I follow is to use AD KDC if a cluster size is less than 100. If the cluster is greater than 100 nodes, then a local LDAP/KDC might be a better option. This is because load on AD from 100’s of service accounts can cause performance and stability issues in AD. It’s not so much KDC, it is a combination of AD lookup/ searches and the KDC being on AD that would be the challenge. Can you please provide me the doc where i can understand the integration of Hadoop Cluster into a Active Directory and Kerbores? Take a look at these links for instructions on how to enable Kerberos on HDP and integrate with AD: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_Security_Guide/content/_configuring_ambari_for_ldap_or_active_directory_authentication.html http://hortonworks.com/blog/enabling-kerberos-hdp-active-directory-integration/

egarelnabi · ‎10-18-2016

No, the temp swapspace is not encrypted. So we do have the vulnerability with intermediate shuffles. However, this is temporary short lived data and may not pose a major issue depending on the customer/usecase. If this is still a concern, then put the temp swap space to a separate disk and encrypt the disk at OS Level

egarelnabi · ‎10-08-2016

Administrator Operations The operations described in this section require superuser privileges. Allow Snapshots: Allowing snapshots of a directory to be created. If the operation completes successfully, the directory becomes snapshottable. Command: hadoop dfsadmin -allowSnapshot $path Arguments: path – The path of the snapshottable directory. See also the corresponding Java API void allowSnapshot(Path path) in HdfsAdmin. Disallow Snapshots: Disallowing snapshots of a directory to be created. All snapshots of the directory must be deleted before disallowing snapshots. Command: hadoop dfsadmin -disallowSnapshot $path Arguments: path – The path of the snapshottable directory. See also the corresponding Java API void disallowSnapshot(Path path) in HdfsAdmin. User Operations The section describes user operations. Note that HDFS superuser can perform all the operations without satisfying the permission requirement in the individual operations. Create Snapshots: Create a snapshot of a snapshottable directory. This operation requires owner privilege to the snapshottable directory. Command: hadoop dfs -createSnapshot $path $snapshotName Arguments: path The path of the snapshottable directory. snapshotName The snapshot name, which is an optional argument. When it is omitted, a default name is generated using a timestamp with the format "'s'yyyyMMdd-HHmmss.SSS", e.g. "s20130412-151029.033". See also the corresponding Java API Path createSnapshot(Path path) and Path createSnapshot(Path path, String snapshotName) in FileSystem. The snapshot path is returned in these methods. Delete Snapshots: Delete a snapshot of from a snapshottable directory. This operation requires owner privilege of the snapshottable directory. Command: hadoop dfs -deleteSnapshot $path $snapshotName Arguments: path The path of the snapshottable directory. snapshotName The snapshot name. See also the corresponding Java API void deleteSnapshot(Path path, String snapshotName) in FileSystem. Rename Snapshots: Rename a snapshot. This operation requires owner privilege of the snapshottable directory.. Command: hadoop dfs -renameSnapshot $path $oldName $newName Arguments: path The path of the snapshottable directory. oldName The old snapshot name. newName The new snapshot name. See also the corresponding Java API void renameSnapshot(Path path, String oldName, String newName) in FileSystem. Get Snapshottable Directory Listing: Get all the snapshottable directories where the current user has permission to take snapshots. Command: hadoop lsSnapshottableDir $path $snapshotName Arguments: path The path of the snapshottable directory. snapshotName The snapshot name. See also the corresponding Java API SnapshottableDirectoryStatus[] getSnapshottableDirectoryListing() in DistributedFileSystem. Get Snapshots Difference Report: Get the differences between two snapshots. This operation requires read access privilege for all files/directories in both snapshots. Command: hadoop snapshotDiff $path $fromSnapshot $toSnapshot Arguments: path The path of the snapshottable directory. fromSnapshot The name of the starting snapshot. toSnapshot The name of the ending snapshot. **See Also** HDFS Snapshots - 1) Overview

egarelnabi · ‎10-08-2016

HDFS Snapshots Overview HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system. Some common use cases of snapshots are data backup, protection against user errors and disaster recovery. The implementation of HDFS Snapshots is efficient in the following ways: 1) Snapshot creation is instantaneous. The cost is O(1) excluding the inode lookup time. 2) Additional memory is used only when modifications are made relative to a snapshot. Memory usage is O(M), where M is the number of modified files/directories. 3) Blocks in datanodes are not copied. The snapshot files record the block list and the file size. 4) Snapshots do not adversely affect regular HDFS operations, and there is a minor performance impact from accessing snapshotted data depending on the number of modifications. The snapshot data is computed by subtracting the modifications from the current data (snapshot data = current data – modifications). Also, modifications are recorded in reverse chronological order so that the current data can be accessed directly. **See Also** HDFS Snapshots - 2) Operations

egarelnabi · ‎10-08-2016

Documentation It is recommended to create the following documents as part of the Upgrade Planning and Execution phase. ● Upgrade plan ● Upgrade guide ● Upgrade log (list of issues and their workarounds) ● Validation test results ● Hand-over documents Learnings from the Field Phased approach to Upgrades ● Upgrade Dev/Test cluster first before touching production environment. This will catch issues which are specific to the environments. ● Upgrade Dev/Test cluster first, create a runbook along with all issues which occurred during upgrade ●Just setting up fresh cluster with the new version in Dev and testing applications is not enough. While it does take care of application tests, upgrade issues are not tested. It is recommended to upgrade Dev/Test in line with upgrade path for Production. Updating Configurations ● Any major update to the configurations should be designed and tested in advance. ● For example – Capacity Scheduler ●Recommend to prepare for capacity scheduler if you are not using so as part of upgrade planning. It should have been tested on Dev. ●Create and define Capacity Scheduler queues before upgrade. Allocate time and do this before the scheduled upgrade. **See Also** HDP Upgrade Best Practices - 1) Plan and Asses HDP Upgrade Best Practices - 2) Do the Upgrade

egarelnabi · ‎10-08-2016

Do the Upgrade Preparation ● For each target cluster find out as much details as possible by reviewing the following: ● Hardware configuration, operating systems and network topology ● Current deployment, cluster topology, configuration of each component ● Security configuration ● User access and management ● Current data ingestion process (if applicable) ● Current running applications and clients connecting into the cluster ● Find the applicable Upgrade Guide from docs.hortonworks.com ● Select validation applications ● Prepare an Upgrade Log to keep track of any upgrade issues and their workarounds ● [Optional but recommended] Prepare one or more Lab (virtual) clusters and install the current HDP stack and Ambari. Use these clusters for mock upgrades and rollbacks to troubleshoot any upgrade issues. Upgrading Clusters ● To upgrade a single cluster use the Upgrade Procedure given below. ● [Optional but recommended] Mock (lab) cluster upgrade: Attempt an upgrade on a Lab cluster. Some steps of the Upgrade Procedure can be skipped in order to concentrate on critical parts. ● Test (Dev) cluster upgrade: For upgrading an important, production cluster it is strongly recommended to attempt the upgrade first on a test cluster (eg. Dev) similar to the production cluster: running the current versions of HDP and Ambari, having similar topology and the components and configuration like the production cluster but on a smaller number of nodes. ● Log every issue encountered during lab and test upgrades and its workaround, so that we minimize any down time during the main cluster upgrade. ● Main (Production) cluster upgrade ● Book the upgrade date and time in advance ● Estimate cluster down-time based on results of the test upgrade. Note that regardless of the preparation and any test upgrades some new issues will appear. ● Inform all interested parties ● Confirm that the Support is on stand-by ● Do the upgrade A Single Cluster Upgrade Procedure Prepare the Cluster for the Upgrade ● Run identified validation applications before the upgrade, and record results and execution times for each of them ● Get ready for the upgrade: Correct any errors and/or alerts and warnings on the cluster ● Check the state of the HDFS filesystem and finalize it if not already finalized ● Capture the HDFS status and save the HDFS namespace ● Backup NameNode metadata and all DBs supporting the cluster (Ambari, Hive metastore, Oozie, Ranger, Hue) Perform Upgrade ● Execute cluster upgrade using the official HDP upgrade document ● Review new properties, in particular pay attention to changed property values, changed property names, and new meaning of existing properties (if any) Post Upgrade Validation ● Run the Smoke test for each service and troubleshoot any issues ● Run validation applications after the upgrade and record results and execution times ● If any validation application is failing or execution times are much longer than before the upgrade, review and adjust cluster properties, repeating validation applications until they are stable and don’t run slower than before the upgrade ● Record in the Upgrade Log any issues encountered and workarounds. Final Steps ● Install new HDP Components, not used before the upgrade (if any), run smoke test for each of them and troubleshoot any issues ● Finalize HDFS upgrade ● Configure HA of selected components (like NN, RM, HiveServer2, HBase, Oozie) ● Perform Ambari Takeover of HDP components not being managed by Ambari earlier ● Enable Kerberos Security: the KDC and existing principals and keytabs can be reused, add principals for new components ● LDAP integration (Ambari, KDC, Ranger) **See Also** HDP Upgrade Best Practices - 1) Plan and Asses HDP Upgrade Best Practices - 3) Documentation and Learnings

Online	Offline
Last Visited	‎08-14-2019 09:54 AM

Member Since	‎10-06-2015 09:21 PM
Last Visited	‎08-14-2019 09:54 AM
Posts	273
Kudos received	202

Cloudera Community

Re: Is it possible to import a complete new taxono...

Re: Is it possible in Apache Atlas to add key-valu...

Re: Do we have tag carry forward in atlas hdp2.6.1...

Re: With ATLAS, which format attribute Date is acc...

Re: Spark streaming support for stream analytics m...

Re: Using Distcp on Encryption Zones

Re: What is the latest STABLE Version of Ambari Se...

Re: Unable to delete folders on Virtualbox 2.5

Re: Integrating KERBEROS with AD and LDAP

Re: Integrating KERBEROS with AD and LDAP

Re: When I want to do hdfs-encryption, is temp swa...

HDFS Snapshots - 2) Operations

HDFS Snapshots - 1) Overview

HDP Upgrade Best Practices - 3) Documentation and ...

HDP Upgrade Best Practices - 2) Do the Upgrade