Member since
08-16-2019
18
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1186 | 08-28-2019 06:11 AM |
01-02-2023
10:41 AM
What I am tryin mg to do is install a nifi cluster onto 3 Ubuntu 22.04 clients. I first installed nifi on each client independently. And then attempted to install a cluster using the same clients…I first stopped the nifi service running on each of the clients before attempting to install a cluster. Should I not install nifi separately on each of the clients before attempting to install the cluster? thank you for any guidance
... View more
10-31-2022
03:31 AM
This article focuses on demonstrating how to post external user/group mappings in to CM UI using CM API call. Target state on Cloudera Manger LDAP Group CM Role CMGroup1 ROLE_ADMIN CMGroup2 ROLE_CONFIGURATOR CMGroup3 ROLE_AUDITOR, ROLE_LIMITED Step 1: Get the current authRoles GET /authRoles curl -g -X GET -u admin:admin -H "Content-Type: application/json" "http://cmhost:7180/api/v43/authRoles" > cm_authroles.json Download the existing authroles from the CM UI and validate if the CM roles are mentioned in target state available in the file cm_authroles.json Step 2: Create the user mapping template example shown below cm_user_mapping.json {
"items": [
{
"name": "CMGroup1",
"type": "LDAP",
"authRoles": [
{ "name": "ROLE_ADMIN" }
]
},
{
"name": "CMGroup2",
"type": "LDAP",
"authRoles": [
{ "name": "ROLE_CONFIGURATOR" }
]
},
{
"name": "CMGroup3",
"type": "LDAP",
"authRoles": [
{ "name": "ROLE_AUDITOR" },
{ "name": "ROLE_LIMITED" }
]
}
]
} Step 3: Post the cm_user_mapping.json to the CM via externalUserMappings api curl -g -X POST -u admin:admin -H "Content-Type: application/json" -d _user_mapping.json "http://cmhost:7180/api/v43/externalUserMappings" More details about the mapping and authroles and user mapping can be found in official documentation https://archive.cloudera.com/cm7/7.2.4/generic/jar/cm_api/apidocs/index.html
... View more
Labels:
10-31-2022
02:21 AM
Introduction This article focuses on backup and restore of Atlas data during HDP3 to CDP migration. Steps to Backup on HDP3 Run the following commands on the Atlas server host of HDP3. Command to get the metrics From Atlas API or Atlas UI curl -k -g -X GET -u admin:admin -H "Content-Type: application/json" -H"Cache-Control: no-cache" "https://atlas_host:21443/api/atlas/admin/metrics" > atlas_metrics.json Extract the entityActive from atlas_metrics.json and make it as a list sample is shown below # cat metics_types.list
hive_db_ddl
hive_table
hive_db
hbase_namespace
hive_process
hive_storagedesc
hdfs_path
hbase_table
hive_column_lineage
hbase_column_family
hive_column
hive_process_execution
hive_table_ddl Export API Script to export all entities and save it as zip file mkdir /tmp/atlas_backup
cd /tmp/atlas_backup
for t in `cat metics_types.list`
do
mkdir -p $t
curl -k -X POST -u admin:admin -H "Content-Type: application/json" -H "Cache-Control: no-cache" -d "{\"itemsToExport\": [{\"typeName\": \"$t\"}], \"options\": {\"matchType\": \"forType\", \"fetchType\": \"full\"}}" "https://atlas_host:21443/api/atlas/admin/export" > $t/Atlas-$t.zip
done Note: unzip and check one of the zip file. expect to see .json files with entities informations. Steps to Import on CDP Remediation steps Unzip and extract the json files from the backup directory /tmp/atlas_backup. Expect to see .json file with entity information. Replace the Atlas cluster_name in .json files with CDP Atlas cluster_name. Note: in CDP default value of cluster_name is 'cm'. Replace the HDFS Namespace directory e.g hdfs://HDFSNamespace:8020/ Replace the patterns which are applicable in CDP e.g @cluster_name Import API - Script to Import all entities from zip file cd /tmp/atlas_backup
for t in `ls /tmp/atlas_backup/*/*.zip`
do
curl -ivk -X POST -u admin:admin -H "Content-Type: application/json" -H "Cache-Control: no-cache" -d "{\"options\": {\"fileName\": \"$t\"}}" "https://atlas_host:21443/api/atlas/admin/importfile"
done Command to get the metrics From Atlas UI curl -k -g -X GET -u admin:admin -H "Content-Type: application/json" -H"Cache-Control: no-cache" "https://atlas_host:21443/api/atlas/admin/metrics" > atlas_metrics_final.json Compare the atlas_metrics.json with atlas_metrics_final.json
... View more
07-29-2022
12:52 AM
Introduction Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides a centralised platform to define, administer and manage security policies consistently across Hadoop components. More details about Ranger can be found here https://www.cloudera.com/products/open-source/apache-hadoop/apache-ranger.html Ranger API policy documentation https://ranger.apache.org/apidocs/index.html This article focuses on Export and Import of Ranger policies using API calls during HDP to CDP migration. Export List of Services configured in Ranger ### Command to get list of services
curl -s -u admin:pass -H "Accept: application/json" -H "Content-Type: application/json" -X GET "http://<hostname>:<ranger-port>/service/public/v2/api/service" > services.json Export of Policies ### Export all policies
To export all policies
curl -X GET --header "text/json" -H "Content-Type: text/json" -o file.json -u admin:admin "http://<hostname>:<ranger-port>/service/plugins/policies/exportJson" This exported json file.json contains all policies including Tag based policies Export of users and Groups, which can be used for validation purposes. ## Api call to download all Users from Ranger
curl -s -u admin:pass -H "Accept: application/json" -H "Content-Type: application/json" -X GET "https://ranger.com/service/xusers/users" > users.json
## Api call to download all groups from Ranger
curl -s -u admin:pass -H "Accept: application/json" -H "Content-Type: application/json" -X GET "https://ranger.com/service/xusers/groups" > groups.json Import Importing policies into Target CDP cluster Step 1: Prepare the Ranger service and make sure to configure all service plugins. Step 2: Prepare servicemapping.json file which has mapping of Ranger service from HDP to CDP world cat /path/servicesMapping.json
{"cm_knox":"cm_knox","cm_hdfs":"cm_hdfs","cm_hbase":"cm_hbase","cm_yarn":"cm_yarn","cm_solr":"cm_solr","cm_kafka":"cm_kafka","cm_atlas":"cm_atlas","cm_hive":"cm_hive"} Step 3: Import the Ranger policies using Ranger API #To Import policies from JSON file with servicesMap
curl -i -X POST -H "Content-Type: multipart/form-data" -F 'file=@/path/file.json' -F ‘servicesMapJson=@/path/servicesMapping.json’ -u admin:admin http://<hostname>:<ranger-port>/service/plugins/policies/importPoliciesFromFile?isOverride=true Preparation for HDP to CDP Migration Known threats and Todo's Local users/groups in HDP Ranger must be available in Target CDP cluster. AD/LDAP users/groups in HDP Ranger must be available in Target CDP cluster. Ranger Services in HDP cluster must be configured in CDP clusters. Before importing policies into CDP Ranger must be empty ( Make sure to delete default policies which we get during enabling of services) Default policies must be reviewed and cleaned (e.g public groups and all resource are not ideal for production clusters) Useful Links Review and add Ranger policies required in CDP world which can be found here. https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/upgrade-hdp3/topics/amb3-add-ranger-policies-for-components-on-the-cdp-cluster.html
... View more
07-15-2022
05:43 AM
Introduction Apache Atlas provides open metadata management and governance capabilities for organisations to build a catalog of their data assets, classify and govern these assets and provide collaboration capabilities around these data assets for data scientists, analysts and the data governance team. This article focuses on backup and restore of Atlas data during HDP to CDP migration. Migration Paths HDP 2.6 to CDP Export Note: Following commands need to be executed on HDP Cluster Download the migration script to download on to Atlas server host. ## Download the script
wget https://archive.cloudera.com/am2cm/hdp2/atlas-migration-exporter-0.8.0.2.6.6.0-332.tar.gz
## Untar
tar zxvf atlas-migration-exporter-0.8.0.2.6.6.0-332.tar.gz
##copy the contents
mkdir /usr/hdp/2.6.5.0-292/atlas/tools/migration-exporter/
cp atlas-migration-exporter-0.8.0.2.6.6.0-332/* /usr/hdp/2.6.5.0-292/atlas/tools/migration-exporter/
chown -R atlas:hadoop /usr/hdp/2.6.5.0-292/atlas/tools/migration-exporter/ Before taking the backup Stop the atlas ## Stop Atlas via ambari UI
## /root/atals_metadata is by backup directory this is an example
## Run the migration script
[root@ccycloud-1 ~]# python /usr/hdp/2.6.5.0-292/atlas/tools/migration-exporter/atlas_migration_export.py -d /root/atlas_metadata
atlas-migration-export: starting migration export. Log file location /var/log/atlas/atlas-migration-exporter.log
atlas-migration-export: initializing
atlas-migration-export: ctor: parameters: 3
atlas-migration-export: initialized
atlas-migration-export: exporting typesDef to file /root/atlas_metadata/atlas-migration-typesdef.json
atlas-migration-export: exported typesDef to file /root/atlas_metadata/atlas-migration-typesdef.json
atlas-migration-export: exporting data to file /root/atlas_metadata/atlas-migration-data.json
atlas-migration-export: exported data to file /root/atlas_metadata/atlas-migration-data.json
atlas-migration-export: completed migration export! Make sure the exported data files in json format found in the location ## make sure the backup is available
[root@ccycloud-1 ~]# ls -ltrh /root/atlas_metadata
total 240K
-rw-r--r-- 1 root root 32K Sep 8 02:57 atlas-migration-typesdef.json
-rw-r--r-- 1 root root 205K Sep 8 02:57 atlas-migration-data.json Import Note: Following commands need to be executed on CDP Cluster Note: For this migration Atlas must be empty before following the next steps. To restore atlas metadata we need to start Atlas in migration mode this can be done by * configure via CM UI >> Atlas >>conf/atlas-application.properties_role_safety_valve atlas.migration.data.filename=/root/atlas_metadata Start the Atlas and wait until the migration status is completed After the migration is completed as shown above. Please remove the config atlas.migration.data.filename and restart atlas again. Expect to see a status as shown below. HDP 3.X to CDP Backup Note: Following commands need to be executed on HDP Cluster Atlas metadata is stored at Hbase and Infra-solr collections. Both locations need to be backed up. Hbase backup ### script this
hbase shell
disable 'atlas_janus'
snapshot 'atlas_janus', 'atlas_janus-backup-new'
enable 'atlas_janus'
disable 'ATLAS_ENTITY_AUDIT_EVENTS'
snapshot 'ATLAS_ENTITY_AUDIT_EVENTS','ATLAS_ENTITY_AUDIT_EVENTS-backup-new'
enable 'ATLAS_ENTITY_AUDIT_EVENTS'
exit
## Linux cli
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 'atlas_janus-backup-new' -copy-to hdfs:///tmp/hbase_new_atlas_backups
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 'ATLAS_ENTITY_AUDIT_EVENTS-backup-new' -copy-to hdfs:///tmp/hbase_new_atlas_backups Infra-solr collections backup ## Getting Kerberos ticket
klist -kt /etc/security/keytabs/ambari-infra-solr.service.keytab infra-solr/`hostname -f`@REALM
### Taking Dump of atlas collections e.g vertex_index (speed 1min/million records)
/usr/lib/ambari-infra-solr-client/solrCloudCli.sh --zookeeper-connect-string zookeeper_host:2181/infra-solr --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --dump-documents --collection vertex_index --output /home/solr/backup/atlas/vertex_index/data --max-read-block-size 100000 --max-write-block-size 100000
### Replace the collection and back up edge_index,fulltext_index,vertex_index on at a time. Restore Note: Following commands need to be executed on CDP Cluster Use the backup of Hbase and Infra-solr collections restore them in to CDP environment before starting atlas in CDP world. Hbase restore tables from snapshot ##copy the backup directories to Target HDFS and run the restore commands
hbase shell
list_snapshots
restore_snapshot 'atlas_janus-backup-new'
restore_snapshot 'ATLAS_ENTITY_AUDIT_EVENTS-backup-new' Infra-solr restore ## Getting Kerberos ticket
klist -kt /etc/security/keytabs/ambari-infra-solr.service.keytab infra-solr/`hostname -f`@REALM
### Restoring your collections e.g vertex_index (speed 1min/million records)
/usr/lib/ambari-infra-solr-client/solrCloudCli.sh --zookeeper-connect-string zookeeper_host:2181/solr-infra --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --upload-documents --collection vertex_index --output /home/solr/backup/atlas/vertex_index/data --max-read-block-size 100000 --max-write-block-size 100000
### make sure to restore all 3 collections i.e vertex_index, fulltext_index, edge_index After restoring the Hbase tables and Solr collections. Start the Atlas via CM UI and validate the metrics. Useful Links & Scripts Command to download the metrics of atlas data. This metrics can be used for validation before and after migration. # curl -g -X GET -u admin:admin -H "Content-Type: application/json" -H"Cache-Control: no-cache" "http://ccycloud-1.hsbcap2.root.hwx.site:21000/api/atlas/admin/metrics" >atlas_metrics.json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 261 0 261 0 0 64 0 --:--:-- 0:00:04 --:--:-- 64
# ls -ltrh atlas_metrics.json
-rw-r--r-- 1 root root 261 Sep 8 05:47 atlas_metrics.json Atlas also comes with API to export and import entities sample code can be found here https://atlas.apache.org/#/ImportExportAPI Known issue during the migration is the namespace of Hbase tables, make sure the tables are restored in default namespace if not configure the right namespace in atlas configs as below https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/upgrade-hdp/topics/amb-migrating-atlas-data.html Hbase backup and restore guide: https://community.cloudera.com/t5/Support-Questions/How-to-take-backup-of-Apache-Atlas-and-restore-it/m-p/300957 https://blog.cloudera.com/approaches-to-backup-and-disaster-recovery-in-hbase/ Full Disclosure & Disclaimer: I am an Employee of Cloudera, but this is not part of the formal documentation of the Cloudera Data platform. It is purely based on my own experience of advising people in their choice of tooling and customisations.
... View more
05-15-2022
09:17 PM
1 Kudo
Introduction The AM2CM is an offline tool that converts the Ambari blueprint to Cloudera Manager Deployment template. Import the converted template to Cloudera Manager, start the services through the Cloudera Manager UI, and validate the cluster. The latest version of the tool can be downloaded here: Software download matrix for 3.1.5 to CDP 7.1.x Note: This article is written on am2cm version 2.0.4.0-4. Latest version might have some new features which are not covered in this article. Internals of working Files comes with AM2CM script, configs files, and lib directories. am2cm-2.0.4.0-4 % ls -1
am2cm-2.0.0.2.0.4.0-4.jar
am2cm.sh
ambari7_blueprint.json
cm_migration.log
conf
configs_summary.log
kerberos_summary.json
lib
restore_collections.py AM2CM script uses blueprint from Ambari cluster as input, and based on user input HDP version, makes a decision regarding the config mapping file need to be used. This mapping files consists of mapping rule from Ambari world to CM world. Config files, such as user-settings.ini and service-config.ini, will be used as a part of mapping logic to generate CM deployment json. How to use Command to execute am2cm-2.0.4.0-4 % sh am2cm.sh -bp conf/blueprint.json -dt cm_deployment.json
INPUT Ambari Blueprint : conf/blueprint.json
OUTPUT CM Template : cm_deployment.json
Starting blueprint to CM Template migration
What is source version 1.HDP2 2.HDP3/HDF352 (1 or 2)? 1
Total number of hosts in blueprint: 6
Your cluster has services (listed below) that are not handled by this migration tool.
AMBARI_METRICS
The tool will skip the above identified service related configs.
Do you want to proceed with migration (Y OR N)? (N):y
Processing: POWERSCALE
Processing: LIVY
Processing: SOLR
Processing: TEZ
Processing: HDFS
Processing: OOZIE
Processing: SQOOP_CLIENT
Processing: NIFIREGISTRY
Processing: ZOOKEEPER
Processing: HBASE
Processing: YARN
Processing: RANGER_KMS
Processing: KNOX
Processing: ATLAS
Processing: HIVE_ON_TEZ
Processing: RANGER
Processing: HIVE
Processing: KAFKA
Processing: NIFI
Processing: SPARK_ON_YARN
Adding: QUEUEMANAGER
CM Template is generated at : am2cm-2.0.4.0-4/cm_deployment.json
Kerberos summary file is generated at : am2cm-2.0.4.0-4/kerberos_summary.json
Successfully completed The output of the command are: CM deployment.json has cluster template for Cloudera Manager Kerberos_summary.json gives us list of the keytabs required for CDP cluster configs_summary.log gives summary of the each config transformation from Ambari to Cloudera Manager cm_migration.log provide AM2CM execution log for each service. Features HDP and HDF services: AM2CM supports blueprints from clusters, HDP 2.6.5*, HDP 3* and HDF3.5* Advance/Custom Configs: AM2CM tool can handle Ambari configs in advance and custom sections of service and migrate them to CM safety valves. Host/Group Mapping: AM2CM will be able to translate config group mapping into CM role groups. Adding New Services: Using AM2CM > config> service-config.ini, we will be able to add new services that are not available in Ambari world. Example: Yarn QM etc. Hidden Features Option to dry run: Getting a blueprint with host is not possible before upgrading ambari to 7.1.* version. Here am2cm dry_run feature will be useful. While running am2cm we can pass argument `--dry_run` using which we can pass blueprint with out host mapping details and get a deployment json for validation purposes. Ignore list: AM2CM > config> service-confi.ini comes with config ignore list, which can be extended to ignore configs from blueprint and configs ignore will take default CM configs. Override configs in Blueprint: Using AM2CM > config > cm-config-mapping.ini files, we can over write configs in blueprint to new CM standards. Install new components: Using AM2CM > config > service-config.ini, we will be able to add new services that are not available in Ambari world. For example, Hue, Phoenix, etc. New cluster standards: Using AM2CM > config > user-settings.ini file, we can ingest new CM configs. For example, TLS configuration, Kerberos princ, etc. Summary of configs: Output of AM2CM provide you a log file with summary of the each config transformation from Ambari to CM. This will be useful to compare the configs before and after migration. Limitations SSL/TLS configs: does not migrate SSL and TLS configuration to CM world. Kerberos (rules, configs): AM2CM does not migrate the kerberos configs, auth_to_local rules to CM. Rack Topology: If Host Rack topology is not managed by Ambari then blueprint does not hold rack information so this will not be migrate to CM deployment Json. Knox Topologies: Most often Knox topologies are not managed by Ambari, so blueprint does not have them. So, AM2CM will not migrate Knox topologies. Ranger Plugins : Ranger plugin configs, for example, plugin name from Ambari will not be migrated to CM deployment template. Backup and Restore: AM2CM will not take care of backup and restore of services metadata, for example, Hive DB, Oozie DB, this has to be taken care separately. Config Validations: AM2CM will be not be able to validate configs in Ambari, if they are compatible with CM versions. We have to do this separately after deploying the template. HA configs (HTTPFS): AM2CM will not be able to migrate the HA configs of new services HTTPFS. Validate them manually on CM side. Config groups with correct Names: AM2CM will create convert ambari config groups in to CM role groups. But it comes up with standard names. Once we are on CM side, we have to manually validate and correct the names. Above-mentioned limitation can be handled separately either after deploying template or configuring AM2CM configs to ingest new configs. Useful Links & Scripts HDP3 to CDP: Transitioning HDP cluster to CDP Private Cloud Base cluster using the AM2CM tool HDP2 to CDP: Transitioning HDP 2.6.5 cluster to CDP Private Cloud Base 7.1.x cluster using the AM2CM tool Full Disclosure & Disclaimer: I am an Employee of Cloudera, but this is not part of the formal documentation of the Cloudera Data platform. It is purely based on my own experience of advising people in their choice of tooling and customisations.
... View more
Labels:
11-09-2021
11:08 AM
Migrating HDP clusters to CDP has been a journey many customers are going through these days.
Migrating Ranger Audits + Atlas collections in infra-solr to CDP has been a challenging task.
We hope the steps below will make simplify your journey.
Preparation:
Sample API calls to get the current status of collections in infra-solr. This is an important step to visualise how big these collections are and as a result, get an idea of how long the migration will take.
Note: In the commands below,
-k is used if https is used rather than http
change http for https if the connection is secure
check the port is correct depending on the version of infra-solr being used or if you have customized the port
Infra-Solr API queries
Gather the list of collections in infra-solr
### To list collections
curl --negotiate -u: -k 'http://solr_host:solr_port/solr/admin/collections?action=LIST'
Get the total number of records in your infra-solr collection. Example, ranger_audits
### To get total records in collection along with one entry
curl --negotiate -u: -k 'http://solr_host:solr_port/solr/ranger_audits/query?q=*:*&rows=0'
Get the first and last record of the collection. Example, ranger_audits
###First record
curl --negotiate -u: -k "http://solr_host:solr_port/solr/ranger_audits/query?q=*:*&rows=1&sort=evtTime%20asc"
###Last record
curl --negotiate -u: -k "http://solr_host:solr_port/solr/ranger_audits/query?q=*:*&rows=1&sort=evtTime%20desc"
Get the number of records per day this will help to estimate the load per day.
######audit count for each day Mapped by Day
curl --negotiate -u: -k "http://solr_host:solr_port/solr/ranger_audits/query?q=*:*&rows=1&facet.range=evtTime&facet=true&facet.range.start=NOW/DAY-30DAY&facet.range.end=NOW/DAY&facet.range.gap=%2B1DAY"
Approach 1: From Cloudera documentation guide HDP to CDP
Pre HDP Upgrade tasks
Backup Ambari Infra Solr
### pick a infra-olr server host
###Upgrade the infra-solr client to latest version
yum upgrade ambari-infra-solr-client -y
export CONFIG_INI_LOCATION=/root/ambari_solr_migration.ini
### Generating the config file
/usr/bin/python /usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py --ini-file $CONFIG_INI_LOCATION --host=ambari_hostname --port=8080 --cluster=hsbcap2 --username=admin --password=**** --backup-base-path=/root/hdp_solr_backup --java-home=/usr/lib/jvm/jre-1.8.0-openjdk/
### Backup the infra-solr collections
/usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh --ini-file $CONFIG_INI_LOCATION --mode backup | tee backup_output.txt
### Deleting the collections, upgrading the infra-solr clients and servers , restaring the Ranger and Altas which will recreate collections.
/usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh --ini-file $CONFIG_INI_LOCATION --mode delete | tee delete_output.txt
Post HDP tasks
Ambari infra-migrate and restore
###Exporting the config file
export CONFIG_INI_LOCATION=/root/ambari_solr_migration.ini
### Restoring the collections
nohup /usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh --ini-file $CONFIG_INI_LOCATION --mode migrate-restore
nohup /usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh --ini-file $CONFIG_INI_LOCATION --mode transport
Backup Infra Solr collections
###Backing up the solr collections
##Exporting the config file
export CONFIG_INI_LOCATION=/root/ambari_solr_migration-cdp.ini
### Generating the config file
/usr/bin/python /usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py --ini-file $CONFIG_INI_LOCATION --host ambari_hostname --port 8080 --cluster clustername --username admin --password ***** --backup-base-path /root/hdp_solr_backup_new --java-home /usr/lib/jvm/jre-1.8.0-openjdk/ --hdfs-base-path /opt/solrdata
#### backing up the solr collections
/usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action backup
## moving the data to HDFS
/usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action copy-to-hdfs
Post CDP tasks
amb_posttransition_solr
### Exporting the config file
export CONFIG_INI_LOCATION=/root/ambari_solr_migration-cdp.ini
###Note: Please make sure the ini file is adjusted to CDP solr url, znode location and other properties which is applicable
#### changing the permissions of HDFS location
python /root/am2cm/restore_collections.py --ini-file $CONFIG_INI_LOCATION --action change-ownership-in-hdfs
### Deleting the solr collections
python /root/am2cm/restore_collections.py --ini-file $CONFIG_INI_LOCATION --action delete-new-solr-collections
## Restoring the solr collections
python /root/am2cm/restore_collections.py --ini-file $CONFIG_INI_LOCATION --action full-restore
Approach 2: Backing up infra-solr data before your and restoring them after CDP upgrade (Speed - 1 million records in 7min)
Backing up the collections in infra-solr using the solrDataManager.py script.
This approach will take a backup of 0.1 million records and delete them from infra-solr. This will help offloading data in the infra-solr as we progress with backup.
This script can be enhanced to only save, archive, or delete mode.
Adjust the END_DATE accordingly if you wish to run the script multiple times.
The average speed at which records will be backed up is 1 million records in 7 mins
Run in nohup mode for collections with records more than 10 million records.
Step 1:
Taking the backup of the infra-solr ranger_audits collection can be run anytime even multiple times before the Ambari upgrade. Please use END_DATE accordingly to take the backup.
### shell script name collection_local.sh
# Init values:
SOLR_URL=http://solr_host:solr_port/solr
END_DATE=2021-06-25T12:00:00.000Z
OLD_COLLECTION=ranger_audits
LOCAL_PATH=/home/solr/backup/ranger/ranger_audits/data
EXCLUDE_FIELDS=_version_
# comma separated exclude fields, at least _version_ is required
# provide these with -k and -n options only if kerberos is enabled for Infra Solr !!!
INFRA_SOLR_KEYTAB=/etc/security/keytabs/ambari-infra-solr.service.keytab
INFRA_SOLR_PRINCIPAL=infra-solr/$(hostname -f)@REALM
DATE_FIELD=evtTime
# -m MODE, --mode=MODE archive | delete | save
MODE=archive
/usr/lib/ambari-infra-solr-client/solrDataManager.py -m $MODE -v -c $OLD_COLLECTION -s $SOLR_URL -z none -r 100000 -w 100000 -f $DATE_FIELD -e $END_DATE -x $LOCAL_PATH -k $INFRA_SOLR_KEYTAB -n $INFRA_SOLR_PRINCIPAL --exclude-fields $EXCLUDE_FIELDS
Note: EXCLUDE_FIELDS is not available in the infra-solr scripts that come with Ambari 2.6.*. Please upgrade your infra-solr client or remove EXCLUDE from the script. ## i.e remove --exclude-fields $EXCLUDE_FIELDS
Step 2:
During the Ambari wizard upgrade of HDP to the Ambari-managed interim HDP-7.1.x version of CDP, you do not need to backup or restore collections.
Step 3:
After transitioning to Cloudera Manager running CDP and once all the services are started, we can trigger the restore script.
Ensure the Ranger collection is created before running this script.
The speed of the restore is around 1 million every 3 minutes
#Saving data from solr:
# Init values:
SOLR_URL=http://solr_host:solr_port/solr
COLLECTION=ranger_audits
DIR_NAME=/home/solr/backup/ranger/ranger_audits/data
# provide these with -k and -n options only if kerberos is enabled for Infra Solr !!!
INFRA_SOLR_KEYTAB=/etc/security/keytabs/ambari-infra-solr.service.keytab
INFRA_SOLR_PRINCIPAL=infra-solr/$(hostname -f)@REALM
for FILE_NAME in $DIR_NAME/*.json
do
echo "Uploading file to solr - $FILE_NAME"
curl -k --negotiate -u : -H "Content-type:application/json" "$SOLR_URL/$COLLECTION/update/json/docs?commit=true&wt=json" --data-binary @$FILE_NAME
done
Approach 3: Dumping the infra-solr collections before HDP upgrade and restoring them after CDP upgrade (Speed 1 million records in 1 min)
Backing up the collections in infra-solr using the solrCloudCli.sh script
The average speed at which records will be backed up is ~ 1 million records/minute
Run in nohup mode for collections with records more than 15 million records (or less to ensure the script doesn't terminate if you lose connectivity)
This script only works if your ambari Infra-solr client is v 2.7.x or higher.
Step 1:
Take the dump of the infra-solr collections before starting the Ambari upgrade.
## Getting Kerberos ticket
klist -kt /etc/security/keytabs/ambari-infra-solr.service.keytab infra-solr/`hostname -f`@REALM
### Taking Dump of your collections e.g ranger_audits (speed 1min/million records)
/usr/lib/ambari-infra-solr-client/solrCloudCli.sh --zookeeper-connect-string zookeeper_host:2181/infra-solr --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --dump-documents --collection ranger_audits --output /home/solr/backup/ranger/ranger_audits/data --max-read-block-size 100000 --max-write-block-size 100000
### Running in backgroud using nohup
nohup /usr/lib/ambari-infra-solr-client/solrCloudCli.sh --zookeeper-connect-string zookeeper_host:2181/infra-solr --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --dump-documents --collection ranger_audits --output /home/solr/backup/ranger/ranger_audits/data --max-read-block-size 100000 --max-write-block-size 100000 2>&1 > /home/solr/backup/ranger/backup_ranger_audits.log &
Note: In this option, we are taking the complete backup of infra-solr collections and can then remove collections and data from infra-solr. A simple restart of Ranger Admin/Atlas will create new empty collections in infra-solr.
Step 2:
During the Ambari managed HDP upgrade steps, we do not need to backup or restore collections.
Step 3:
After migrating to CDP and once all the services are started we can trigger the restore script.
Make sure the ranger collection is created before running this script.
The speed at the restores the data at speed ~ 1 million records in 1 minute.
## Getting Kerberos ticket
klist -kt /etc/security/keytabs/ambari-infra-solr.service.keytab infra-solr/`hostname -f`@REALM
### Restoring your collections e.g ranger_audits (speed 1min/million records)
/usr/lib/ambari-infra-solr-client/solrCloudCli.sh --zookeeper-connect-string zookeeper_host:2181/solr-infra --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --upload-documents --collection ranger_audits --output /home/solr/backup/ranger/ranger_audits/data --max-read-block-size 100000 --max-write-block-size 100000
### Running in backgroud using nohup
nohup /usr/lib/ambari-infra-solr-client/solrCloudCli.sh --zookeeper-connect-string zookeeper_host:2181/solr-infra --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --upload-documents --collection ranger_audits --output /home/solr/backup/ranger/ranger_audits/data --max-read-block-size 100000 --max-write-block-size 100000 2>&1 > /home/solr/backup/ranger/restore_ranger_audits.log &
Summary
We have been using:
Approach 1 for Development clusters with significantly less audit data in infra-solr
Approach 2 for HDP 2.6.5 clusters
Approach 3 for HDP 3.1.5 clusters
===========================================================================
We recommend:
Approach 1 for clusters with less than 10 million records since it is easy to backup and restore as per Cloudera upgrade documentation
Approach 2 is slow when compared to other approaches (~ 1 million records in 7 minutes) but useful since:
When used in "archive" mode, it will clean up the data which it has backed up
One can choose the END_DATE, if required, to re-run the script multiple times before the upgrade date
Approach 3 for clusters with more than 10 million records due to its efficient way to dump the documents and restore them after upgrading to CDP. Note the downsides of Approach 3 are:
It will only work if the ambari-infra-solr client is v 2.7.x or higher
If your backup fails, you have to restart the entire script and backup from the start again (i.e. it doesn't delete from the collections as it goes)
This summary is written based on our experience working with HDP clusters that are being migrated to CDP. Please use the development environments on your estate to come up with your own estimates and choose the right option which suits your clusters and SLAs.
Thank you !!
... View more
08-28-2019
06:11 AM
From the HDF documenation it is clear that "NiFi does not perform user authentication over HTTP. Using HTTP, all users will be granted all roles." https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.4.1.1/nifi-security/content/user_authentication.html So it is mandatory to enabled Nifi ssl to have ldap_login_identity_provider Hemanth
... View more
08-17-2019
07:46 AM
Hi @araujo You are correct. Just added the jars from "CSD & Manifest Files" from https://docs.hortonworks.com/HDPDocuments/CFM/CFM-1.0.0/release-notes/content/download-locations.html Now I can install Nifi. Thanks.
... View more