Created on 11-09-2021 11:08 AM - edited on 11-10-2021 02:04 AM by VidyaSargur
Migrating HDP clusters to CDP has been a journey many customers are going through these days.
Migrating Ranger Audits + Atlas collections in infra-solr to CDP has been a challenging task.
We hope the steps below will make simplify your journey.
Sample API calls to get the current status of collections in infra-solr. This is an important step to visualise how big these collections are and as a result, get an idea of how long the migration will take.
Note: In the commands below,
### To list collections
curl --negotiate -u: -k 'http://solr_host:solr_port/solr/admin/collections?action=LIST'
### To get total records in collection along with one entry
curl --negotiate -u: -k 'http://solr_host:solr_port/solr/ranger_audits/query?q=*:*&rows=0'
###First record
curl --negotiate -u: -k "http://solr_host:solr_port/solr/ranger_audits/query?q=*:*&rows=1&sort=evtTime%20asc"
###Last record
curl --negotiate -u: -k "http://solr_host:solr_port/solr/ranger_audits/query?q=*:*&rows=1&sort=evtTime%20desc"
######audit count for each day Mapped by Day
curl --negotiate -u: -k "http://solr_host:solr_port/solr/ranger_audits/query?q=*:*&rows=1&facet.range=evtTime&facet=true&facet.range.start=NOW/DAY-30DAY&facet.range.end=NOW/DAY&facet.range.gap=%2B1DAY"
### pick a infra-olr server host
###Upgrade the infra-solr client to latest version
yum upgrade ambari-infra-solr-client -y
export CONFIG_INI_LOCATION=/root/ambari_solr_migration.ini
### Generating the config file
/usr/bin/python /usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py --ini-file $CONFIG_INI_LOCATION --host=ambari_hostname --port=8080 --cluster=hsbcap2 --username=admin --password=**** --backup-base-path=/root/hdp_solr_backup --java-home=/usr/lib/jvm/jre-1.8.0-openjdk/
### Backup the infra-solr collections
/usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh --ini-file $CONFIG_INI_LOCATION --mode backup | tee backup_output.txt
### Deleting the collections, upgrading the infra-solr clients and servers , restaring the Ranger and Altas which will recreate collections.
/usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh --ini-file $CONFIG_INI_LOCATION --mode delete | tee delete_output.txt
Ambari infra-migrate and restore
###Exporting the config file
export CONFIG_INI_LOCATION=/root/ambari_solr_migration.ini
### Restoring the collections
nohup /usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh --ini-file $CONFIG_INI_LOCATION --mode migrate-restore
nohup /usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh --ini-file $CONFIG_INI_LOCATION --mode transport
###Backing up the solr collections
##Exporting the config file
export CONFIG_INI_LOCATION=/root/ambari_solr_migration-cdp.ini
### Generating the config file
/usr/bin/python /usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py --ini-file $CONFIG_INI_LOCATION --host ambari_hostname --port 8080 --cluster clustername --username admin --password ***** --backup-base-path /root/hdp_solr_backup_new --java-home /usr/lib/jvm/jre-1.8.0-openjdk/ --hdfs-base-path /opt/solrdata
#### backing up the solr collections
/usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action backup
## moving the data to HDFS
/usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action copy-to-hdfs
### Exporting the config file
export CONFIG_INI_LOCATION=/root/ambari_solr_migration-cdp.ini
###Note: Please make sure the ini file is adjusted to CDP solr url, znode location and other properties which is applicable
#### changing the permissions of HDFS location
python /root/am2cm/restore_collections.py --ini-file $CONFIG_INI_LOCATION --action change-ownership-in-hdfs
### Deleting the solr collections
python /root/am2cm/restore_collections.py --ini-file $CONFIG_INI_LOCATION --action delete-new-solr-collections
## Restoring the solr collections
python /root/am2cm/restore_collections.py --ini-file $CONFIG_INI_LOCATION --action full-restore
Taking the backup of the infra-solr ranger_audits collection can be run anytime even multiple times before the Ambari upgrade. Please use END_DATE accordingly to take the backup.
### shell script name collection_local.sh
# Init values:
SOLR_URL=http://solr_host:solr_port/solr
END_DATE=2021-06-25T12:00:00.000Z
OLD_COLLECTION=ranger_audits
LOCAL_PATH=/home/solr/backup/ranger/ranger_audits/data
EXCLUDE_FIELDS=_version_
# comma separated exclude fields, at least _version_ is required
# provide these with -k and -n options only if kerberos is enabled for Infra Solr !!!
INFRA_SOLR_KEYTAB=/etc/security/keytabs/ambari-infra-solr.service.keytab
INFRA_SOLR_PRINCIPAL=infra-solr/$(hostname -f)@REALM
DATE_FIELD=evtTime
# -m MODE, --mode=MODE archive | delete | save
MODE=archive
/usr/lib/ambari-infra-solr-client/solrDataManager.py -m $MODE -v -c $OLD_COLLECTION -s $SOLR_URL -z none -r 100000 -w 100000 -f $DATE_FIELD -e $END_DATE -x $LOCAL_PATH -k $INFRA_SOLR_KEYTAB -n $INFRA_SOLR_PRINCIPAL --exclude-fields $EXCLUDE_FIELDS
Note: EXCLUDE_FIELDS is not available in the infra-solr scripts that come with Ambari 2.6.*. Please upgrade your infra-solr client or remove EXCLUDE from the script.
## i.e remove --exclude-fields $EXCLUDE_FIELDS
During the Ambari wizard upgrade of HDP to the Ambari-managed interim HDP-7.1.x version of CDP, you do not need to backup or restore collections.
After transitioning to Cloudera Manager running CDP and once all the services are started, we can trigger the restore script.
#Saving data from solr:
# Init values:
SOLR_URL=http://solr_host:solr_port/solr
COLLECTION=ranger_audits
DIR_NAME=/home/solr/backup/ranger/ranger_audits/data
# provide these with -k and -n options only if kerberos is enabled for Infra Solr !!!
INFRA_SOLR_KEYTAB=/etc/security/keytabs/ambari-infra-solr.service.keytab
INFRA_SOLR_PRINCIPAL=infra-solr/$(hostname -f)@REALM
for FILE_NAME in $DIR_NAME/*.json
do
echo "Uploading file to solr - $FILE_NAME"
curl -k --negotiate -u : -H "Content-type:application/json" "$SOLR_URL/$COLLECTION/update/json/docs?commit=true&wt=json" --data-binary @$FILE_NAME
done
Take the dump of the infra-solr collections before starting the Ambari upgrade.
## Getting Kerberos ticket
klist -kt /etc/security/keytabs/ambari-infra-solr.service.keytab infra-solr/`hostname -f`@REALM
### Taking Dump of your collections e.g ranger_audits (speed 1min/million records)
/usr/lib/ambari-infra-solr-client/solrCloudCli.sh --zookeeper-connect-string zookeeper_host:2181/infra-solr --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --dump-documents --collection ranger_audits --output /home/solr/backup/ranger/ranger_audits/data --max-read-block-size 100000 --max-write-block-size 100000
### Running in backgroud using nohup
nohup /usr/lib/ambari-infra-solr-client/solrCloudCli.sh --zookeeper-connect-string zookeeper_host:2181/infra-solr --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --dump-documents --collection ranger_audits --output /home/solr/backup/ranger/ranger_audits/data --max-read-block-size 100000 --max-write-block-size 100000 2>&1 > /home/solr/backup/ranger/backup_ranger_audits.log &
Note: In this option, we are taking the complete backup of infra-solr collections and can then remove collections and data from infra-solr. A simple restart of Ranger Admin/Atlas will create new empty collections in infra-solr.
During the Ambari managed HDP upgrade steps, we do not need to backup or restore collections.
After migrating to CDP and once all the services are started we can trigger the restore script.
## Getting Kerberos ticket
klist -kt /etc/security/keytabs/ambari-infra-solr.service.keytab infra-solr/`hostname -f`@REALM
### Restoring your collections e.g ranger_audits (speed 1min/million records)
/usr/lib/ambari-infra-solr-client/solrCloudCli.sh --zookeeper-connect-string zookeeper_host:2181/solr-infra --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --upload-documents --collection ranger_audits --output /home/solr/backup/ranger/ranger_audits/data --max-read-block-size 100000 --max-write-block-size 100000
### Running in backgroud using nohup
nohup /usr/lib/ambari-infra-solr-client/solrCloudCli.sh --zookeeper-connect-string zookeeper_host:2181/solr-infra --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --upload-documents --collection ranger_audits --output /home/solr/backup/ranger/ranger_audits/data --max-read-block-size 100000 --max-write-block-size 100000 2>&1 > /home/solr/backup/ranger/restore_ranger_audits.log &
We have been using:
===========================================================================
We recommend:
This summary is written based on our experience working with HDP clusters that are being migrated to CDP. Please use the development environments on your estate to come up with your own estimates and choose the right option which suits your clusters and SLAs.
Thank you !!