About hpasumarthi

hpasumarthi · ‎10-31-2022

This article focuses on demonstrating how to post external user/group mappings in to CM UI using CM API call. Target state on Cloudera Manger LDAP Group CM Role CMGroup1 ROLE_ADMIN CMGroup2 ROLE_CONFIGURATOR CMGroup3 ROLE_AUDITOR, ROLE_LIMITED Step 1: Get the current authRoles GET /authRoles curl -g -X GET -u admin:admin -H "Content-Type: application/json" "http://cmhost:7180/api/v43/authRoles" > cm_authroles.json Download the existing authroles from the CM UI and validate if the CM roles are mentioned in target state available in the file cm_authroles.json Step 2: Create the user mapping template example shown below cm_user_mapping.json { "items": [ { "name": "CMGroup1", "type": "LDAP", "authRoles": [ { "name": "ROLE_ADMIN" } ] }, { "name": "CMGroup2", "type": "LDAP", "authRoles": [ { "name": "ROLE_CONFIGURATOR" } ] }, { "name": "CMGroup3", "type": "LDAP", "authRoles": [ { "name": "ROLE_AUDITOR" }, { "name": "ROLE_LIMITED" } ] } ] } Step 3: Post the cm_user_mapping.json to the CM via externalUserMappings api curl -g -X POST -u admin:admin -H "Content-Type: application/json" -d _user_mapping.json "http://cmhost:7180/api/v43/externalUserMappings" More details about the mapping and authroles and user mapping can be found in official documentation https://archive.cloudera.com/cm7/7.2.4/generic/jar/cm_api/apidocs/index.html

hpasumarthi · ‎10-31-2022

Introduction This article focuses on backup and restore of Atlas data during HDP3 to CDP migration. Steps to Backup on HDP3 Run the following commands on the Atlas server host of HDP3. Command to get the metrics From Atlas API or Atlas UI curl -k -g -X GET -u admin:admin -H "Content-Type: application/json" -H"Cache-Control: no-cache" "https://atlas_host:21443/api/atlas/admin/metrics" > atlas_metrics.json Extract the entityActive from atlas_metrics.json and make it as a list sample is shown below # cat metics_types.list hive_db_ddl hive_table hive_db hbase_namespace hive_process hive_storagedesc hdfs_path hbase_table hive_column_lineage hbase_column_family hive_column hive_process_execution hive_table_ddl Export API Script to export all entities and save it as zip file mkdir /tmp/atlas_backup cd /tmp/atlas_backup for t in `cat metics_types.list` do mkdir -p $t curl -k -X POST -u admin:admin -H "Content-Type: application/json" -H "Cache-Control: no-cache" -d "{\"itemsToExport\": [{\"typeName\": \"$t\"}], \"options\": {\"matchType\": \"forType\", \"fetchType\": \"full\"}}" "https://atlas_host:21443/api/atlas/admin/export" > $t/Atlas-$t.zip done Note: unzip and check one of the zip file. expect to see .json files with entities informations. Steps to Import on CDP Remediation steps Unzip and extract the json files from the backup directory /tmp/atlas_backup. Expect to see .json file with entity information. Replace the Atlas cluster_name in .json files with CDP Atlas cluster_name. Note: in CDP default value of cluster_name is 'cm'. Replace the HDFS Namespace directory e.g hdfs://HDFSNamespace:8020/ Replace the patterns which are applicable in CDP e.g @cluster_name Import API - Script to Import all entities from zip file cd /tmp/atlas_backup for t in `ls /tmp/atlas_backup/*/*.zip` do curl -ivk -X POST -u admin:admin -H "Content-Type: application/json" -H "Cache-Control: no-cache" -d "{\"options\": {\"fileName\": \"$t\"}}" "https://atlas_host:21443/api/atlas/admin/importfile" done Command to get the metrics From Atlas UI curl -k -g -X GET -u admin:admin -H "Content-Type: application/json" -H"Cache-Control: no-cache" "https://atlas_host:21443/api/atlas/admin/metrics" > atlas_metrics_final.json Compare the atlas_metrics.json with atlas_metrics_final.json

hpasumarthi · ‎07-29-2022

Introduction Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides a centralised platform to define, administer and manage security policies consistently across Hadoop components. More details about Ranger can be found here https://www.cloudera.com/products/open-source/apache-hadoop/apache-ranger.html Ranger API policy documentation https://ranger.apache.org/apidocs/index.html This article focuses on Export and Import of Ranger policies using API calls during HDP to CDP migration. Export List of Services configured in Ranger ### Command to get list of services curl -s -u admin:pass -H "Accept: application/json" -H "Content-Type: application/json" -X GET "http://<hostname>:<ranger-port>/service/public/v2/api/service" > services.json Export of Policies ### Export all policies To export all policies curl -X GET --header "text/json" -H "Content-Type: text/json" -o file.json -u admin:admin "http://<hostname>:<ranger-port>/service/plugins/policies/exportJson" This exported json file.json contains all policies including Tag based policies Export of users and Groups, which can be used for validation purposes. ## Api call to download all Users from Ranger curl -s -u admin:pass -H "Accept: application/json" -H "Content-Type: application/json" -X GET "https://ranger.com/service/xusers/users" > users.json ## Api call to download all groups from Ranger curl -s -u admin:pass -H "Accept: application/json" -H "Content-Type: application/json" -X GET "https://ranger.com/service/xusers/groups" > groups.json Import Importing policies into Target CDP cluster Step 1: Prepare the Ranger service and make sure to configure all service plugins. Step 2: Prepare servicemapping.json file which has mapping of Ranger service from HDP to CDP world cat /path/servicesMapping.json {"cm_knox":"cm_knox","cm_hdfs":"cm_hdfs","cm_hbase":"cm_hbase","cm_yarn":"cm_yarn","cm_solr":"cm_solr","cm_kafka":"cm_kafka","cm_atlas":"cm_atlas","cm_hive":"cm_hive"} Step 3: Import the Ranger policies using Ranger API #To Import policies from JSON file with servicesMap curl -i -X POST -H "Content-Type: multipart/form-data" -F 'file=@/path/file.json' -F ‘servicesMapJson=@/path/servicesMapping.json’ -u admin:admin http://<hostname>:<ranger-port>/service/plugins/policies/importPoliciesFromFile?isOverride=true Preparation for HDP to CDP Migration Known threats and Todo's Local users/groups in HDP Ranger must be available in Target CDP cluster. AD/LDAP users/groups in HDP Ranger must be available in Target CDP cluster. Ranger Services in HDP cluster must be configured in CDP clusters. Before importing policies into CDP Ranger must be empty ( Make sure to delete default policies which we get during enabling of services) Default policies must be reviewed and cleaned (e.g public groups and all resource are not ideal for production clusters) Useful Links Review and add Ranger policies required in CDP world which can be found here. https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/upgrade-hdp3/topics/amb3-add-ranger-policies-for-components-on-the-cdp-cluster.html

hpasumarthi · ‎07-15-2022

Introduction Apache Atlas provides open metadata management and governance capabilities for organisations to build a catalog of their data assets, classify and govern these assets and provide collaboration capabilities around these data assets for data scientists, analysts and the data governance team. This article focuses on backup and restore of Atlas data during HDP to CDP migration. Migration Paths HDP 2.6 to CDP Export Note: Following commands need to be executed on HDP Cluster Download the migration script to download on to Atlas server host. ## Download the script wget https://archive.cloudera.com/am2cm/hdp2/atlas-migration-exporter-0.8.0.2.6.6.0-332.tar.gz ## Untar tar zxvf atlas-migration-exporter-0.8.0.2.6.6.0-332.tar.gz ##copy the contents mkdir /usr/hdp/2.6.5.0-292/atlas/tools/migration-exporter/ cp atlas-migration-exporter-0.8.0.2.6.6.0-332/* /usr/hdp/2.6.5.0-292/atlas/tools/migration-exporter/ chown -R atlas:hadoop /usr/hdp/2.6.5.0-292/atlas/tools/migration-exporter/ Before taking the backup Stop the atlas ## Stop Atlas via ambari UI ## /root/atals_metadata is by backup directory this is an example ## Run the migration script [root@ccycloud-1 ~]# python /usr/hdp/2.6.5.0-292/atlas/tools/migration-exporter/atlas_migration_export.py -d /root/atlas_metadata atlas-migration-export: starting migration export. Log file location /var/log/atlas/atlas-migration-exporter.log atlas-migration-export: initializing atlas-migration-export: ctor: parameters: 3 atlas-migration-export: initialized atlas-migration-export: exporting typesDef to file /root/atlas_metadata/atlas-migration-typesdef.json atlas-migration-export: exported typesDef to file /root/atlas_metadata/atlas-migration-typesdef.json atlas-migration-export: exporting data to file /root/atlas_metadata/atlas-migration-data.json atlas-migration-export: exported data to file /root/atlas_metadata/atlas-migration-data.json atlas-migration-export: completed migration export! Make sure the exported data files in json format found in the location ## make sure the backup is available [root@ccycloud-1 ~]# ls -ltrh /root/atlas_metadata total 240K -rw-r--r-- 1 root root 32K Sep 8 02:57 atlas-migration-typesdef.json -rw-r--r-- 1 root root 205K Sep 8 02:57 atlas-migration-data.json Import Note: Following commands need to be executed on CDP Cluster Note: For this migration Atlas must be empty before following the next steps. To restore atlas metadata we need to start Atlas in migration mode this can be done by * configure via CM UI >> Atlas >>conf/atlas-application.properties_role_safety_valve atlas.migration.data.filename=/root/atlas_metadata Start the Atlas and wait until the migration status is completed After the migration is completed as shown above. Please remove the config atlas.migration.data.filename and restart atlas again. Expect to see a status as shown below. HDP 3.X to CDP Backup Note: Following commands need to be executed on HDP Cluster Atlas metadata is stored at Hbase and Infra-solr collections. Both locations need to be backed up. Hbase backup ### script this hbase shell disable 'atlas_janus' snapshot 'atlas_janus', 'atlas_janus-backup-new' enable 'atlas_janus' disable 'ATLAS_ENTITY_AUDIT_EVENTS' snapshot 'ATLAS_ENTITY_AUDIT_EVENTS','ATLAS_ENTITY_AUDIT_EVENTS-backup-new' enable 'ATLAS_ENTITY_AUDIT_EVENTS' exit ## Linux cli hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 'atlas_janus-backup-new' -copy-to hdfs:///tmp/hbase_new_atlas_backups hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 'ATLAS_ENTITY_AUDIT_EVENTS-backup-new' -copy-to hdfs:///tmp/hbase_new_atlas_backups Infra-solr collections backup ## Getting Kerberos ticket klist -kt /etc/security/keytabs/ambari-infra-solr.service.keytab infra-solr/`hostname -f`@REALM ### Taking Dump of atlas collections e.g vertex_index (speed 1min/million records) /usr/lib/ambari-infra-solr-client/solrCloudCli.sh --zookeeper-connect-string zookeeper_host:2181/infra-solr --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --dump-documents --collection vertex_index --output /home/solr/backup/atlas/vertex_index/data --max-read-block-size 100000 --max-write-block-size 100000 ### Replace the collection and back up edge_index,fulltext_index,vertex_index on at a time. Restore Note: Following commands need to be executed on CDP Cluster Use the backup of Hbase and Infra-solr collections restore them in to CDP environment before starting atlas in CDP world. Hbase restore tables from snapshot ##copy the backup directories to Target HDFS and run the restore commands hbase shell list_snapshots restore_snapshot 'atlas_janus-backup-new' restore_snapshot 'ATLAS_ENTITY_AUDIT_EVENTS-backup-new' Infra-solr restore ## Getting Kerberos ticket klist -kt /etc/security/keytabs/ambari-infra-solr.service.keytab infra-solr/`hostname -f`@REALM ### Restoring your collections e.g vertex_index (speed 1min/million records) /usr/lib/ambari-infra-solr-client/solrCloudCli.sh --zookeeper-connect-string zookeeper_host:2181/solr-infra --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --upload-documents --collection vertex_index --output /home/solr/backup/atlas/vertex_index/data --max-read-block-size 100000 --max-write-block-size 100000 ### make sure to restore all 3 collections i.e vertex_index, fulltext_index, edge_index After restoring the Hbase tables and Solr collections. Start the Atlas via CM UI and validate the metrics. Useful Links & Scripts Command to download the metrics of atlas data. This metrics can be used for validation before and after migration. # curl -g -X GET -u admin:admin -H "Content-Type: application/json" -H"Cache-Control: no-cache" "http://ccycloud-1.hsbcap2.root.hwx.site:21000/api/atlas/admin/metrics" >atlas_metrics.json % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 261 0 261 0 0 64 0 --:--:-- 0:00:04 --:--:-- 64 # ls -ltrh atlas_metrics.json -rw-r--r-- 1 root root 261 Sep 8 05:47 atlas_metrics.json Atlas also comes with API to export and import entities sample code can be found here https://atlas.apache.org/#/ImportExportAPI Known issue during the migration is the namespace of Hbase tables, make sure the tables are restored in default namespace if not configure the right namespace in atlas configs as below https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/upgrade-hdp/topics/amb-migrating-atlas-data.html Hbase backup and restore guide: https://community.cloudera.com/t5/Support-Questions/How-to-take-backup-of-Apache-Atlas-and-restore-it/m-p/300957 https://blog.cloudera.com/approaches-to-backup-and-disaster-recovery-in-hbase/ Full Disclosure & Disclaimer: I am an Employee of Cloudera, but this is not part of the formal documentation of the Cloudera Data platform. It is purely based on my own experience of advising people in their choice of tooling and customisations.

hpasumarthi · ‎05-15-2022

Introduction The AM2CM is an offline tool that converts the Ambari blueprint to Cloudera Manager Deployment template. Import the converted template to Cloudera Manager, start the services through the Cloudera Manager UI, and validate the cluster. The latest version of the tool can be downloaded here: Software download matrix for 3.1.5 to CDP 7.1.x Note: This article is written on am2cm version 2.0.4.0-4. Latest version might have some new features which are not covered in this article. Internals of working Files comes with AM2CM script, configs files, and lib directories. am2cm-2.0.4.0-4 % ls -1 am2cm-2.0.0.2.0.4.0-4.jar am2cm.sh ambari7_blueprint.json cm_migration.log conf configs_summary.log kerberos_summary.json lib restore_collections.py AM2CM script uses blueprint from Ambari cluster as input, and based on user input HDP version, makes a decision regarding the config mapping file need to be used. This mapping files consists of mapping rule from Ambari world to CM world. Config files, such as user-settings.ini and service-config.ini, will be used as a part of mapping logic to generate CM deployment json. How to use Command to execute am2cm-2.0.4.0-4 % sh am2cm.sh -bp conf/blueprint.json -dt cm_deployment.json INPUT Ambari Blueprint : conf/blueprint.json OUTPUT CM Template : cm_deployment.json Starting blueprint to CM Template migration What is source version 1.HDP2 2.HDP3/HDF352 (1 or 2)? 1 Total number of hosts in blueprint: 6 Your cluster has services (listed below) that are not handled by this migration tool. AMBARI_METRICS The tool will skip the above identified service related configs. Do you want to proceed with migration (Y OR N)? (N):y Processing: POWERSCALE Processing: LIVY Processing: SOLR Processing: TEZ Processing: HDFS Processing: OOZIE Processing: SQOOP_CLIENT Processing: NIFIREGISTRY Processing: ZOOKEEPER Processing: HBASE Processing: YARN Processing: RANGER_KMS Processing: KNOX Processing: ATLAS Processing: HIVE_ON_TEZ Processing: RANGER Processing: HIVE Processing: KAFKA Processing: NIFI Processing: SPARK_ON_YARN Adding: QUEUEMANAGER CM Template is generated at : am2cm-2.0.4.0-4/cm_deployment.json Kerberos summary file is generated at : am2cm-2.0.4.0-4/kerberos_summary.json Successfully completed The output of the command are: CM deployment.json has cluster template for Cloudera Manager Kerberos_summary.json gives us list of the keytabs required for CDP cluster configs_summary.log gives summary of the each config transformation from Ambari to Cloudera Manager cm_migration.log provide AM2CM execution log for each service. Features HDP and HDF services: AM2CM supports blueprints from clusters, HDP 2.6.5*, HDP 3* and HDF3.5* Advance/Custom Configs: AM2CM tool can handle Ambari configs in advance and custom sections of service and migrate them to CM safety valves. Host/Group Mapping: AM2CM will be able to translate config group mapping into CM role groups. Adding New Services: Using AM2CM > config> service-config.ini, we will be able to add new services that are not available in Ambari world. Example: Yarn QM etc. Hidden Features Option to dry run: Getting a blueprint with host is not possible before upgrading ambari to 7.1.* version. Here am2cm dry_run feature will be useful. While running am2cm we can pass argument `--dry_run` using which we can pass blueprint with out host mapping details and get a deployment json for validation purposes. Ignore list: AM2CM > config> service-confi.ini comes with config ignore list, which can be extended to ignore configs from blueprint and configs ignore will take default CM configs. Override configs in Blueprint: Using AM2CM > config > cm-config-mapping.ini files, we can over write configs in blueprint to new CM standards. Install new components: Using AM2CM > config > service-config.ini, we will be able to add new services that are not available in Ambari world. For example, Hue, Phoenix, etc. New cluster standards: Using AM2CM > config > user-settings.ini file, we can ingest new CM configs. For example, TLS configuration, Kerberos princ, etc. Summary of configs: Output of AM2CM provide you a log file with summary of the each config transformation from Ambari to CM. This will be useful to compare the configs before and after migration. Limitations SSL/TLS configs: does not migrate SSL and TLS configuration to CM world. Kerberos (rules, configs): AM2CM does not migrate the kerberos configs, auth_to_local rules to CM. Rack Topology: If Host Rack topology is not managed by Ambari then blueprint does not hold rack information so this will not be migrate to CM deployment Json. Knox Topologies: Most often Knox topologies are not managed by Ambari, so blueprint does not have them. So, AM2CM will not migrate Knox topologies. Ranger Plugins : Ranger plugin configs, for example, plugin name from Ambari will not be migrated to CM deployment template. Backup and Restore: AM2CM will not take care of backup and restore of services metadata, for example, Hive DB, Oozie DB, this has to be taken care separately. Config Validations: AM2CM will be not be able to validate configs in Ambari, if they are compatible with CM versions. We have to do this separately after deploying the template. HA configs (HTTPFS): AM2CM will not be able to migrate the HA configs of new services HTTPFS. Validate them manually on CM side. Config groups with correct Names: AM2CM will create convert ambari config groups in to CM role groups. But it comes up with standard names. Once we are on CM side, we have to manually validate and correct the names. Above-mentioned limitation can be handled separately either after deploying template or configuring AM2CM configs to ingest new configs. Useful Links & Scripts HDP3 to CDP: Transitioning HDP cluster to CDP Private Cloud Base cluster using the AM2CM tool HDP2 to CDP: Transitioning HDP 2.6.5 cluster to CDP Private Cloud Base 7.1.x cluster using the AM2CM tool Full Disclosure & Disclaimer: I am an Employee of Cloudera, but this is not part of the formal documentation of the Cloudera Data platform. It is purely based on my own experience of advising people in their choice of tooling and customisations.

hpasumarthi · ‎11-09-2021

Migrating HDP clusters to CDP has been a journey many customers are going through these days. Migrating Ranger Audits + Atlas collections in infra-solr to CDP has been a challenging task. We hope the steps below will make simplify your journey. Preparation: Sample API calls to get the current status of collections in infra-solr. This is an important step to visualise how big these collections are and as a result, get an idea of how long the migration will take. Note: In the commands below, -k is used if https is used rather than http change http for https if the connection is secure check the port is correct depending on the version of infra-solr being used or if you have customized the port Infra-Solr API queries Gather the list of collections in infra-solr ### To list collections curl --negotiate -u: -k 'http://solr_host:solr_port/solr/admin/collections?action=LIST' Get the total number of records in your infra-solr collection. Example, ranger_audits ### To get total records in collection along with one entry curl --negotiate -u: -k 'http://solr_host:solr_port/solr/ranger_audits/query?q=*:*&rows=0' Get the first and last record of the collection. Example, ranger_audits ###First record curl --negotiate -u: -k "http://solr_host:solr_port/solr/ranger_audits/query?q=*:*&rows=1&sort=evtTime%20asc" ###Last record curl --negotiate -u: -k "http://solr_host:solr_port/solr/ranger_audits/query?q=*:*&rows=1&sort=evtTime%20desc" Get the number of records per day this will help to estimate the load per day. ######audit count for each day Mapped by Day curl --negotiate -u: -k "http://solr_host:solr_port/solr/ranger_audits/query?q=*:*&rows=1&facet.range=evtTime&facet=true&facet.range.start=NOW/DAY-30DAY&facet.range.end=NOW/DAY&facet.range.gap=%2B1DAY" Approach 1: From Cloudera documentation guide HDP to CDP Pre HDP Upgrade tasks Backup Ambari Infra Solr ### pick a infra-olr server host ###Upgrade the infra-solr client to latest version yum upgrade ambari-infra-solr-client -y export CONFIG_INI_LOCATION=/root/ambari_solr_migration.ini ### Generating the config file /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py --ini-file $CONFIG_INI_LOCATION --host=ambari_hostname --port=8080 --cluster=hsbcap2 --username=admin --password=**** --backup-base-path=/root/hdp_solr_backup --java-home=/usr/lib/jvm/jre-1.8.0-openjdk/ ### Backup the infra-solr collections /usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh --ini-file $CONFIG_INI_LOCATION --mode backup | tee backup_output.txt ### Deleting the collections, upgrading the infra-solr clients and servers , restaring the Ranger and Altas which will recreate collections. /usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh --ini-file $CONFIG_INI_LOCATION --mode delete | tee delete_output.txt Post HDP tasks Ambari infra-migrate and restore ###Exporting the config file export CONFIG_INI_LOCATION=/root/ambari_solr_migration.ini ### Restoring the collections nohup /usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh --ini-file $CONFIG_INI_LOCATION --mode migrate-restore nohup /usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh --ini-file $CONFIG_INI_LOCATION --mode transport Backup Infra Solr collections ###Backing up the solr collections ##Exporting the config file export CONFIG_INI_LOCATION=/root/ambari_solr_migration-cdp.ini ### Generating the config file /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py --ini-file $CONFIG_INI_LOCATION --host ambari_hostname --port 8080 --cluster clustername --username admin --password ***** --backup-base-path /root/hdp_solr_backup_new --java-home /usr/lib/jvm/jre-1.8.0-openjdk/ --hdfs-base-path /opt/solrdata #### backing up the solr collections /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action backup ## moving the data to HDFS /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action copy-to-hdfs Post CDP tasks amb_posttransition_solr ### Exporting the config file export CONFIG_INI_LOCATION=/root/ambari_solr_migration-cdp.ini ###Note: Please make sure the ini file is adjusted to CDP solr url, znode location and other properties which is applicable #### changing the permissions of HDFS location python /root/am2cm/restore_collections.py --ini-file $CONFIG_INI_LOCATION --action change-ownership-in-hdfs ### Deleting the solr collections python /root/am2cm/restore_collections.py --ini-file $CONFIG_INI_LOCATION --action delete-new-solr-collections ## Restoring the solr collections python /root/am2cm/restore_collections.py --ini-file $CONFIG_INI_LOCATION --action full-restore Approach 2: Backing up infra-solr data before your and restoring them after CDP upgrade (Speed - 1 million records in 7min) Backing up the collections in infra-solr using the solrDataManager.py script. This approach will take a backup of 0.1 million records and delete them from infra-solr. This will help offloading data in the infra-solr as we progress with backup. This script can be enhanced to only save, archive, or delete mode. Adjust the END_DATE accordingly if you wish to run the script multiple times. The average speed at which records will be backed up is 1 million records in 7 mins Run in nohup mode for collections with records more than 10 million records. Step 1: Taking the backup of the infra-solr ranger_audits collection can be run anytime even multiple times before the Ambari upgrade. Please use END_DATE accordingly to take the backup. ### shell script name collection_local.sh # Init values: SOLR_URL=http://solr_host:solr_port/solr END_DATE=2021-06-25T12:00:00.000Z OLD_COLLECTION=ranger_audits LOCAL_PATH=/home/solr/backup/ranger/ranger_audits/data EXCLUDE_FIELDS=_version_ # comma separated exclude fields, at least _version_ is required # provide these with -k and -n options only if kerberos is enabled for Infra Solr !!! INFRA_SOLR_KEYTAB=/etc/security/keytabs/ambari-infra-solr.service.keytab INFRA_SOLR_PRINCIPAL=infra-solr/$(hostname -f)@REALM DATE_FIELD=evtTime # -m MODE, --mode=MODE archive | delete | save MODE=archive /usr/lib/ambari-infra-solr-client/solrDataManager.py -m $MODE -v -c $OLD_COLLECTION -s $SOLR_URL -z none -r 100000 -w 100000 -f $DATE_FIELD -e $END_DATE -x $LOCAL_PATH -k $INFRA_SOLR_KEYTAB -n $INFRA_SOLR_PRINCIPAL --exclude-fields $EXCLUDE_FIELDS Note: EXCLUDE_FIELDS is not available in the infra-solr scripts that come with Ambari 2.6.*. Please upgrade your infra-solr client or remove EXCLUDE from the script. ## i.e remove --exclude-fields $EXCLUDE_FIELDS Step 2: During the Ambari wizard upgrade of HDP to the Ambari-managed interim HDP-7.1.x version of CDP, you do not need to backup or restore collections. Step 3: After transitioning to Cloudera Manager running CDP and once all the services are started, we can trigger the restore script. Ensure the Ranger collection is created before running this script. The speed of the restore is around 1 million every 3 minutes #Saving data from solr: # Init values: SOLR_URL=http://solr_host:solr_port/solr COLLECTION=ranger_audits DIR_NAME=/home/solr/backup/ranger/ranger_audits/data # provide these with -k and -n options only if kerberos is enabled for Infra Solr !!! INFRA_SOLR_KEYTAB=/etc/security/keytabs/ambari-infra-solr.service.keytab INFRA_SOLR_PRINCIPAL=infra-solr/$(hostname -f)@REALM for FILE_NAME in $DIR_NAME/*.json do echo "Uploading file to solr - $FILE_NAME" curl -k --negotiate -u : -H "Content-type:application/json" "$SOLR_URL/$COLLECTION/update/json/docs?commit=true&wt=json" --data-binary @$FILE_NAME done Approach 3: Dumping the infra-solr collections before HDP upgrade and restoring them after CDP upgrade (Speed 1 million records in 1 min) Backing up the collections in infra-solr using the solrCloudCli.sh script The average speed at which records will be backed up is ~ 1 million records/minute Run in nohup mode for collections with records more than 15 million records (or less to ensure the script doesn't terminate if you lose connectivity) This script only works if your ambari Infra-solr client is v 2.7.x or higher. Step 1: Take the dump of the infra-solr collections before starting the Ambari upgrade. ## Getting Kerberos ticket klist -kt /etc/security/keytabs/ambari-infra-solr.service.keytab infra-solr/`hostname -f`@REALM ### Taking Dump of your collections e.g ranger_audits (speed 1min/million records) /usr/lib/ambari-infra-solr-client/solrCloudCli.sh --zookeeper-connect-string zookeeper_host:2181/infra-solr --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --dump-documents --collection ranger_audits --output /home/solr/backup/ranger/ranger_audits/data --max-read-block-size 100000 --max-write-block-size 100000 ### Running in backgroud using nohup nohup /usr/lib/ambari-infra-solr-client/solrCloudCli.sh --zookeeper-connect-string zookeeper_host:2181/infra-solr --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --dump-documents --collection ranger_audits --output /home/solr/backup/ranger/ranger_audits/data --max-read-block-size 100000 --max-write-block-size 100000 2>&1 > /home/solr/backup/ranger/backup_ranger_audits.log & Note: In this option, we are taking the complete backup of infra-solr collections and can then remove collections and data from infra-solr. A simple restart of Ranger Admin/Atlas will create new empty collections in infra-solr. Step 2: During the Ambari managed HDP upgrade steps, we do not need to backup or restore collections. Step 3: After migrating to CDP and once all the services are started we can trigger the restore script. Make sure the ranger collection is created before running this script. The speed at the restores the data at speed ~ 1 million records in 1 minute. ## Getting Kerberos ticket klist -kt /etc/security/keytabs/ambari-infra-solr.service.keytab infra-solr/`hostname -f`@REALM ### Restoring your collections e.g ranger_audits (speed 1min/million records) /usr/lib/ambari-infra-solr-client/solrCloudCli.sh --zookeeper-connect-string zookeeper_host:2181/solr-infra --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --upload-documents --collection ranger_audits --output /home/solr/backup/ranger/ranger_audits/data --max-read-block-size 100000 --max-write-block-size 100000 ### Running in backgroud using nohup nohup /usr/lib/ambari-infra-solr-client/solrCloudCli.sh --zookeeper-connect-string zookeeper_host:2181/solr-infra --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --upload-documents --collection ranger_audits --output /home/solr/backup/ranger/ranger_audits/data --max-read-block-size 100000 --max-write-block-size 100000 2>&1 > /home/solr/backup/ranger/restore_ranger_audits.log & Summary We have been using: Approach 1 for Development clusters with significantly less audit data in infra-solr Approach 2 for HDP 2.6.5 clusters Approach 3 for HDP 3.1.5 clusters =========================================================================== We recommend: Approach 1 for clusters with less than 10 million records since it is easy to backup and restore as per Cloudera upgrade documentation Approach 2 is slow when compared to other approaches (~ 1 million records in 7 minutes) but useful since: When used in "archive" mode, it will clean up the data which it has backed up One can choose the END_DATE, if required, to re-run the script multiple times before the upgrade date Approach 3 for clusters with more than 10 million records due to its efficient way to dump the documents and restore them after upgrading to CDP. Note the downsides of Approach 3 are: It will only work if the ambari-infra-solr client is v 2.7.x or higher If your backup fails, you have to restart the entire script and backup from the start again (i.e. it doesn't delete from the collections as it goes) This summary is written based on our experience working with HDP clusters that are being migrated to CDP. Please use the development environments on your estate to come up with your own estimates and choose the right option which suits your clusters and SLAs. Thank you !!

hpasumarthi · ‎08-28-2019

From the HDF documenation it is clear that "NiFi does not perform user authentication over HTTP. Using HTTP, all users will be granted all roles." https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.4.1.1/nifi-security/content/user_authentication.html So it is mandatory to enabled Nifi ssl to have ldap_login_identity_provider Hemanth

hpasumarthi · ‎08-27-2019

Hello Team, Is it mandatory to enable ssl for Nifi and Nifi registry in order to configure Ldap Authentication? I followed all steps mentioned in article https://docs.hortonworks.com/HDPDocuments/CFM/CFM-1.0.1/securing-cfm/content/cfm-configure-ldap.html When I open Nifi Ui no login window pops up. Regards, Hemanth

hpasumarthi · ‎08-20-2019

Hi @Faerballert , Yes you are correct. Issue is resolved after restart of the Nifi node. Hemanth

hpasumarthi · ‎08-20-2019

HI @Faerballert Thanks for reply. I cannot find another instance running on the background. [root@ip-172-31-15-238 ec2-user]# ps -ef | grep nifi root 15458 15388 0 12:31 pts/0 00:00:00 grep --color=auto nifi nifi 16931 11875 0 11:49 ? 00:00:00 /usr/bin/python2 /opt/cloudera/cm-agent/bin/cm proc_watcher 16939 nifi 16939 16931 0 11:49 ? 00:00:05 /usr/java/jdk1.8.0_181-cloudera//bin/java -cp /var/run/cloudera-scm-agent/process/45-nifi-NIFI_NODE:/opt/cloudera/parcels/CFM-1.0.0.0/NIFI/lib/bootstrap/* -Xms12m -Xmx24m -Dorg.apache.nifi.bootstrap.config.log.dir=/var/log/nifi -Dorg.apache.nifi.bootstrap.config.pid.dir=/var/run/cloudera-scm-agent/process/45-nifi-NIFI_NODE -Dorg.apache.nifi.bootstrap.config.file=/var/run/cloudera-scm-agent/process/45-nifi-NIFI_NODE/bootstrap.conf org.apache.nifi.bootstrap.RunNiFi run nifi 16940 16931 0 11:49 ? 00:00:00 /usr/bin/python2 /opt/cloudera/cm-agent/bin/cm redactor --fds 3 5 nifi 17181 16939 14 11:49 ? 00:05:55 /usr/java/jdk1.8.0_181-cloudera/bin/java -classpath /var/run/cloudera-scm-agent/process/45-nifi-NIFI_NODE:/opt/cloudera/parcels/CFM-1.0.0.0/NIFI/lib/jul-to-slf4j-1.7.25.jar:/opt/cloudera/parcels/CFM-1.0.0.0/NIFI/lib/jetty-schemas-3.1.jar:/opt/cloudera/parcels/CFM-1.0.0.0/NIFI/lib/nifi-runtime-1.9.0.1.0.0.0-90.jar:/opt/cloudera/parcels/CFM-1.0.0.0/NIFI/lib/nifi-api-1.9.0.1.0.0.0-90.jar:/opt/cloudera/parcels/CFM-1.0.0.0/NIFI/lib/logback-core-1.2.3.jar:/opt/cloudera/parcels/CFM-1.0.0.0/NIFI/lib/jcl-over-slf4j-1.7.25.jar:/opt/cloudera/parcels/CFM-1.0.0.0/NIFI/lib/nifi-properties-1.9.0.1.0.0.0-90.jar:/opt/cloudera/parcels/CFM-1.0.0.0/NIFI/lib/slf4j-api-1.7.25.jar:/opt/cloudera/parcels/CFM-1.0.0.0/NIFI/lib/nifi-nar-utils-1.9.0.1.0.0.0-90.jar:/opt/cloudera/parcels/CFM-1.0.0.0/NIFI/lib/nifi-framework-api-1.9.0.1.0.0.0-90.jar:/opt/cloudera/parcels/CFM-1.0.0.0/NIFI/lib/javax.servlet-api-3.1.0.jar:/opt/cloudera/parcels/CFM-1.0.0.0/NIFI/lib/logback-classic-1.2.3.jar:/opt/cloudera/parcels/CFM-1.0.0.0/NIFI/lib/log4j-over-slf4j-1.7.25.jar -Dorg.apache.jasper.compiler.disablejsr199=true -Xmx512m -Xms512m -Djavax.security.auth.useSubjectCredsOnly=true -Djava.security.egd=file:/dev/urandom -Dsun.net.http.allowRestrictedHeaders=true -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true -XX:+UseG1GC -Djava.protocol.handler.pkgs=sun.net.www.protocol -Dnifi.properties.file.path=/var/run/cloudera-scm-agent/process/45-nifi-NIFI_NODE/nifi.properties -Dnifi.bootstrap.listen.port=36497 -Dapp=NiFi -Dorg.apache.nifi.bootstrap.config.log.dir=/var/log/nifi org.apache.nifi.NiFi -K /var/run/cloudera-scm-agent/process/45-nifi-NIFI_NODE/sensitive.key I have killed the processes and restarted Nifi but still the same error message. Hemanth

Online	Offline
Last Visited	‎12-17-2024 09:02 AM

Member Since	‎08-16-2019 06:52 AM
Last Visited	‎12-17-2024 09:02 AM
Posts	18
Kudos received	2

Cloudera Community

Re: CFM Is it mandatory to enable Nifi SSL to conf...

Cloudera Manager UI - External User Mappings API

HDP3 to CDP - Atlas backup and restore using Atlas...

HDP to CDP - Ranger policies export and import

HDP to CDP - Atlas backup and restore

HDP to CDP - AM2CM tool

HDP to CDP - Migration of Infra Solr collections

Re: CFM Is it mandatory to enable Nifi SSL to conf...

CFM Is it mandatory to enable Nifi SSL to configur...

Re: CFM- Nifi service is not starting due to "ERRO...

Re: CFM- Nifi service is not starting due to "ERRO...