Member since
10-06-2015
273
Posts
202
Kudos Received
81
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4036 | 10-11-2017 09:33 PM | |
3562 | 10-11-2017 07:46 PM | |
2570 | 08-04-2017 01:37 PM | |
2207 | 08-03-2017 03:36 PM | |
2235 | 08-03-2017 12:52 PM |
10-25-2016
05:18 PM
Take a look at the link below for a detailed explanation. In short though, yes, a database dump and load of the keys is necessary using the provided "exportKeysToJCEKS.sh" and "importKeysToJCEKS.sh" scripts. https://community.hortonworks.com/articles/51909/how-to-copy-encrypted-data-between-two-hdp-cluster.html
... View more
10-20-2016
01:23 AM
@Normen Zoch
I'm assuming you're using Ambari 2.4.0 with HDP 2.5.0. Try the latest Ambari, version 2.4.1. This is a maintenance release that fixes some stability issues with 2.4.0.
Ambari 2.4.1 Repo:
http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.1.0/bk_ambari-installation/content/ambari_repositories.html
Ambari 2.4.0.1 & 2.4.1 Release Notes:
http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.1.0/bk_ambari-release-notes/content/ambari_relnotes-2.4.1.0-fixed-issues.html
Ambari 2.4.1 Documentation:
https://docs.hortonworks.com/HDPDocuments/Ambari/Ambari-2.4.1.0/index.html
... View more
10-19-2016
01:57 PM
1 Kudo
This is due to an issue in the docker HDP 2.5 Sandbox, which will be addressed with the next Sandbox release. In the meantime, try the solution in the link below: https://community.hortonworks.com/questions/62271/unable-to-add-apache-nifi-in-ambari.html
... View more
10-19-2016
01:04 PM
Yes. Rather than recreating users from scratch though, you can synchronize your local LDAP with you corporate AD. Having said that, especially when it comes to security, you'll be governed by your organization's policies regarding what you can and can't do more-so than the technical aspects.
... View more
10-18-2016
05:26 PM
3 Kudos
First let's clarify the difference between LDAP and AD.
LDAP is an application protocol for querying and modifying items in directory service providers (e.g Active Directory). AD is a directory services provider that supports the LDAP protocol amongst others.
https://jumpcloud.com/blog/difference-between-ldap-and-active-directory/
1) what is the use of Having Ldap between AD, Hadoop and kerberos integration ?
You wouldn't actually have an LDAP provider, you would just use the LDAP protocol to talk to AD
2) What is the advantage and Disadvantage on Integrating AD and hadoop and kerbores without LDAP?
See answer above. You only use the LDAP protocol, not an LDAP directory service provider to connect to AD
3) what is difference between implementing MIT KDC and Direct AD setup?
You can go with either
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_Security_Guide/content/_installing_and_configuring_the_kdc.html
A very general rule-of thumb I follow is to use AD KDC if a cluster size is less than 100.
If the cluster is greater than 100 nodes, then a local LDAP/KDC might be a better option. This is because load on AD from 100’s of service accounts can cause performance and stability issues in AD. It’s not so much KDC, it is a combination of AD lookup/ searches and the KDC being on AD that would be the challenge.
Can you please provide me the doc where i can understand the integration of Hadoop Cluster into a Active Directory and Kerbores?
Take a look at these links for instructions on how to enable Kerberos on HDP and integrate with AD:
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_Security_Guide/content/_configuring_ambari_for_ldap_or_active_directory_authentication.html
http://hortonworks.com/blog/enabling-kerberos-hdp-active-directory-integration/
... View more
10-18-2016
03:52 PM
2 Kudos
No, the temp swapspace is not encrypted. So we do have the vulnerability with intermediate shuffles. However, this is temporary short lived data and may not pose a major issue depending on the customer/usecase. If this is still a concern, then put the temp swap space to a separate disk and encrypt the disk at OS Level
... View more
10-08-2016
04:18 PM
2 Kudos
Administrator Operations
The operations described in this section require superuser
privileges.
Allow Snapshots: Allowing snapshots of a directory to
be created. If the operation completes successfully, the directory becomes
snapshottable.
Command:
hadoop dfsadmin -allowSnapshot $path
Arguments:
path – The path of the snapshottable directory.
See also the corresponding Java API void
allowSnapshot(Path path) in HdfsAdmin.
Disallow Snapshots: Disallowing snapshots of a
directory to be created. All snapshots of the directory must be deleted
before disallowing snapshots.
Command:
hadoop dfsadmin -disallowSnapshot $path
Arguments:
path – The path of the snapshottable directory.
See
also the corresponding Java API void disallowSnapshot(Path
path) in HdfsAdmin.
User Operations
The section describes user operations. Note that HDFS
superuser can perform all the operations without satisfying the permission
requirement in the individual operations.
Create Snapshots: Create a snapshot of a
snapshottable directory. This operation requires owner privilege to the
snapshottable directory.
Command:
hadoop dfs -createSnapshot $path $snapshotName
Arguments:
path
The path of the snapshottable directory.
snapshotName
The snapshot name, which is an optional argument. When it
is omitted, a default name is generated using a timestamp with the
format "'s'yyyyMMdd-HHmmss.SSS", e.g.
"s20130412-151029.033".
See also the corresponding Java API Path
createSnapshot(Path path) and Path createSnapshot(Path path,
String snapshotName) in FileSystem. The snapshot path is
returned in these methods.
Delete Snapshots: Delete a snapshot of from a
snapshottable directory. This operation requires owner privilege of the
snapshottable directory.
Command:
hadoop dfs -deleteSnapshot $path $snapshotName
Arguments:
path
The path of the snapshottable directory.
snapshotName
The snapshot name.
See also the corresponding Java API void
deleteSnapshot(Path path, String snapshotName) in FileSystem.
Rename Snapshots: Rename a snapshot. This operation
requires owner privilege of the snapshottable directory..
Command:
hadoop dfs -renameSnapshot $path $oldName $newName
Arguments:
path
The path of the snapshottable directory.
oldName
The old snapshot name.
newName
The new snapshot name.
See also the corresponding Java API void
renameSnapshot(Path path, String oldName, String newName) in FileSystem.
Get Snapshottable Directory Listing: Get all the
snapshottable directories where the current user has permission to take
snapshots.
Command:
hadoop lsSnapshottableDir $path $snapshotName
Arguments:
path
The path of the snapshottable directory.
snapshotName
The snapshot name.
See also the corresponding Java
API SnapshottableDirectoryStatus[]
getSnapshottableDirectoryListing() in DistributedFileSystem.
Get Snapshots Difference Report: Get the differences
between two snapshots. This operation requires read access privilege for
all files/directories in both snapshots.
Command:
hadoop snapshotDiff $path $fromSnapshot $toSnapshot
Arguments:
path
The path of the snapshottable directory.
fromSnapshot
The name of the starting snapshot.
toSnapshot
The name of the ending snapshot.
**See Also**
HDFS Snapshots - 1) Overview
... View more
Labels:
10-08-2016
04:18 PM
4 Kudos
HDFS Snapshots Overview
HDFS Snapshots are read-only point-in-time copies of the
file system. Snapshots can be taken on a subtree of the file system or the
entire file system. Some common use cases of snapshots are data backup,
protection against user errors and disaster recovery.
The implementation of HDFS Snapshots is efficient in the following ways:
1) Snapshot creation is
instantaneous. The cost is
O(1) excluding the inode lookup
time.
2) Additional memory is used
only when modifications are made relative to a snapshot. Memory usage is
O(M),
where
M is the number of modified files/directories.
3) Blocks in datanodes are not copied. The snapshot
files record the block list and the file size.
4) Snapshots do not adversely affect regular HDFS
operations, and there is a minor performance impact from accessing snapshotted data depending on the number of modifications. The snapshot data is computed by subtracting the modifications from the current data (snapshot data = current data – modifications). Also, modifications are recorded in reverse chronological order so
that the current data can be accessed directly.
**See Also**
HDFS Snapshots - 2) Operations
... View more
Labels:
10-08-2016
04:17 PM
2 Kudos
Documentation
It is recommended to create the following documents as part of the
Upgrade Planning and Execution phase.
● Upgrade plan
● Upgrade guide
● Upgrade log (list of issues and their workarounds)
● Validation test results
● Hand-over documents
Learnings from the Field
Phased approach to Upgrades
● Upgrade Dev/Test cluster first before touching
production environment. This will catch issues which are specific to the
environments.
● Upgrade Dev/Test cluster first, create a runbook
along with all issues which occurred during upgrade
●Just setting up fresh cluster with the new
version in Dev and testing applications is not enough. While it does take care
of application tests, upgrade issues are not tested. It is recommended to
upgrade Dev/Test in line with upgrade path for Production.
Updating Configurations
● Any major update to the configurations should be
designed and tested in advance.
● For example – Capacity Scheduler
●Recommend to prepare for capacity scheduler if
you are not using so as part of upgrade planning. It should have been tested on
Dev.
●Create and define Capacity Scheduler queues
before upgrade. Allocate time and do this before the scheduled upgrade.
**See Also**
HDP Upgrade Best Practices - 1) Plan and Asses
HDP Upgrade Best Practices - 2) Do the Upgrade
... View more
10-08-2016
04:17 PM
1 Kudo
Do the Upgrade Preparation
● For each target cluster find out as much details as
possible by reviewing the following:
● Hardware configuration, operating systems and network
topology
● Current deployment, cluster topology, configuration of
each component
● Security configuration
● User access and management
● Current data ingestion process (if applicable)
● Current running applications and clients connecting
into the cluster
● Find the applicable Upgrade Guide from
docs.hortonworks.com
● Select validation applications
● Prepare an Upgrade Log to keep track of any upgrade
issues and their workarounds
● [Optional but recommended] Prepare one or more Lab
(virtual) clusters and install the
current HDP stack and Ambari. Use
these clusters for mock upgrades and rollbacks to troubleshoot any upgrade
issues. Upgrading Clusters
● To upgrade a single cluster use the Upgrade Procedure
given below.
● [Optional but recommended] Mock (lab) cluster upgrade:
Attempt an upgrade on a Lab cluster. Some steps of the Upgrade Procedure can be
skipped in order to concentrate on critical parts.
● Test (Dev) cluster upgrade: For upgrading an important,
production cluster it is strongly recommended to attempt the upgrade first on a
test cluster (eg. Dev) similar to the production cluster: running the current
versions of HDP and Ambari, having
similar topology and the components and configuration like the production
cluster but on a smaller number of nodes.
● Log every issue encountered during lab and test
upgrades and its workaround, so that we minimize any down time during the main
cluster upgrade.
● Main (Production) cluster upgrade
● Book the upgrade date and time in advance
● Estimate cluster down-time based on results of the test
upgrade. Note that regardless of the preparation and any test upgrades some new
issues will appear.
● Inform all interested parties
● Confirm that the Support is on stand-by
● Do the upgrade A Single Cluster Upgrade Procedure Prepare the Cluster for the Upgrade
● Run identified validation applications before the upgrade,
and record results and execution times for each of them
● Get ready for the upgrade: Correct any errors and/or
alerts and warnings on the cluster
● Check the state of the HDFS filesystem and finalize it
if not already finalized
● Capture the HDFS status and save the HDFS namespace
● Backup NameNode metadata and all DBs supporting the
cluster (Ambari, Hive metastore, Oozie, Ranger, Hue) Perform Upgrade
● Execute cluster upgrade using the official HDP upgrade
document
● Review new properties, in particular pay attention to
changed property values, changed property names, and new meaning of existing
properties (if any) Post Upgrade Validation
● Run the Smoke test for each service and troubleshoot
any issues
● Run validation applications after the upgrade and record results and execution
times
● If any validation application is failing or execution
times are much longer than before the upgrade, review and adjust cluster
properties, repeating validation applications until they are stable and don’t
run slower than before the upgrade
● Record in the Upgrade Log any issues encountered and
workarounds. Final Steps
● Install new HDP Components, not used before the upgrade
(if any), run smoke test for each of them and troubleshoot any issues
● Finalize HDFS upgrade
● Configure HA of selected components (like NN, RM,
HiveServer2, HBase, Oozie)
● Perform Ambari Takeover of HDP components not being
managed by Ambari earlier
● Enable Kerberos Security: the KDC and existing
principals and keytabs can be reused, add principals for new components
● LDAP integration (Ambari, KDC, Ranger) **See Also**
HDP Upgrade Best Practices - 1) Plan and Asses
HDP Upgrade Best Practices - 3) Documentation and Learnings
... View more