Member since
09-29-2015
19
Posts
27
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2173 | 03-26-2018 06:33 PM | |
2307 | 04-13-2017 06:49 PM |
10-11-2018
01:13 PM
From Ambari 2.7 (after an ambari-server upgrade) it is required to backup / migrate and restore Infra Solr collection data. That is required because from Ambari 2.7 Infra Solr uses Solr 7 instead of Solr 5 and optimize command won't work on a Lucene 5 index, so it is required to do this offline. In order to not get stuck on any steps it is useful if we understands correctly what happens in the background as the documentation covers the happy path with the usage of /usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh script. (https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.0.0/bk_ambari-upgrade/content/backup_and_upgrade_ambari_infra_data.html). But fortunately every step can be done manually as well. Backup/Migration/Restore is only needed for Atlas and Ranger. Log Search will have a totally new schema, so there the Prerequisites:
1. First of all, upgrading the Ambari Infra Solr service is the next step right after Ambari server upgrade. (backup old Solr data then upgrade Infra Solr server packages) As Ambari Infra is not shipped the same way as other services, that is not part of HDP/HDF, it is the part of Ambari. Therefore after Ambari server upgrade, although the Infra Solr configuration will be upgraded, the service itself will still use the un-upgraded Infra Solr (Solr with version 5), because of that, after Ambari upgrade DO NOT RESTART Infra Solr service. (otherwise you will hit a ClassNotFound issue: https://community.hortonworks.com/content/supportkb/210579/error-nullorgapachesolrcommonsolrexception-error-l.html, reason for that is Ambari uploads a generated security.json during startup, worth to mention you can provide your own security.json as well by setting "infra-solr-security-json/content" configuration property). The easiest way to fix that issue properly, if you set the "infra-solr-security-json/content" to the following (remove authorization part): {
"authentication": {
"class": "org.apache.solr.security.KerberosPlugin"
}
}
And revert the infra-solr-env config back as it was before Ambari server upgrade, which can be done by replace the following line in infra-solr-env/content: SOLR_AUTH_TYPE="kerberos" with: SOLR_KERB_NAME_RULES="{{infra_solr_kerberos_name_rules}}"
SOLR_AUTHENTICATION_CLIENT_CONFIGURER="org.apache.solr.client.solrj.impl.Krb5HttpClientConfigurer"
Also, if ambari version is at least 2.7.1, that means the solr version will be upgraded to 7.4.0 which will start to use log4j2, but as the solr has not upgraded yet, log4j config will be needed to be upgraded as well, in these cases replace the following line in infra-solr-env/content: LOG4J_PROPS={{infra_solr_conf}}/log4j2.xml with LOG4J_PROPS={{infra_solr_conf}}/log4j.properties For automation you can do the following instead of the steps above (fill params with proper cluster details): ## Update infra-solr-env/content
# save the infra solr env configs
/var/lib/ambari-server/resources/scripts/configs.py --action=get --cluster=cl1 --user=admin --password=admin --host=c7401.ambari.apache.org --port=8080 --config-type=infra-solr-env --file infra-solr-env.json
# replace SOLR_AUTH_TYPE in saved config file
sed -i 's/SOLR_AUTH_TYPE="kerberos"/SOLR_KERB_NAME_RULES="{{infra_solr_kerberos_name_rules}}"\nSOLR_AUTHENTICATION_CLIENT_CONFIGURER="org.apache.solr.client.solrj.impl.Krb5HttpClientConfigurer"\n/g' infra-solr-env.json
# replace LOG4J_PROPS
sed -i 's/LOG4J_PROPS={{infra_solr_conf}}/log4j2.xml/LOG4J_PROPS={{infra_solr_conf}}/log4j.properties/g' infra-solr-env.json
# update the config
/var/lib/ambari-server/resources/scripts/configs.py --action=set --cluster=cl1 --user=admin --password=admin --host=c7401.ambari.apache.org --port=8080 --config-type=infra-solr-env --file infra-solr-env.json
## Update infra-solr-security-json/content
/var/lib/ambari-server/resources/scripts/configs.py --action=set --cluster=cl1 --user=admin --password=admin --host=c7401.ambari.apache.org --port=8080 --config-type=infra-solr-security-json -k content -v '{"authentication": { "class": "org.apache.solr.security.KerberosPlugin"}}'
But note that, after the collection backup is done and Solr server packages have been upgraded (after the "delete" phase), then you need to set this config back, to make it work with Solr 7. Reverting back the configs to Solr 7 version by scripts: ## Update infra-solr-env/content
# save the infra solr env configs
/var/lib/ambari-server/resources/scripts/configs.py --action=get --cluster=cl1 --user=admin --password=admin --host=c7401.ambari.apache.org --port=8080 --config-type=infra-solr-env --file infra-solr-env.json
# replace SOLR_AUTH_TYPE in saved config file
sed -i 's/SOLR_KERB_NAME_RULES="{{infra_solr_kerberos_name_rules}}"/SOLR_AUTH_TYPE="kerberos"/g' infra-solr-env.json
sed -i 's/SOLR_AUTHENTICATION_CLIENT_CONFIGURER="org.apache.solr.client.solrj.impl.Krb5HttpClientConfigurer"//g' infra-solr-env.json
# replace LOG4J_PROPS
sed -i 's/LOG4J_PROPS={{infra_solr_conf}}/log4j.properties/LOG4J_PROPS={{infra_solr_conf}}/log4j2.xml/g' infra-solr-env.json
# update the config
/var/lib/ambari-server/resources/scripts/configs.py --action=set --cluster=cl1 --user=admin --password=admin --host=c7401.ambari.apache.org --port=8080 --config-type=infra-solr-env --file infra-solr-env.json
## Update infra-solr-security-json/conten
t/var/lib/ambari-server/resources/scripts/configs.py --action=set --cluster=cl1 --user=admin --password=admin --host=c7401.ambari.apache.org --port=8080 --config-type=infra-solr-security-json -k content -v '' 2. Make sure Solr nodes are up and running, or at least you have an active replica for every shard (that is not in DOWN state) - it is required as we will gather Solr/Cluster details not only from Ambari, but from Solr nodes or znodes, and we would like to choose to run commands on stable Solr servers. 3. Choose 1 node, where a Solr server installed and upgrade ambari-infra-solr-client package there: # For RHEL/CentOS/Oracle Linux:
yum clean all
yum upgrade ambari-infra-solr-client
# For SLES:
zypper clean
zypper up ambari-infra-solr-client
# For Ubuntu/Debian:
apt-get clean all
apt-get update
apt-get install ambari-infra-solr-client That infra solr client will be used to run the required migration commands. Sooner or later you will need to do this steps on every host where you have ambari-infra-solr-client package installed. (this can be done through that 1 host that you choosed if the migration config generation works well from the next steps, but if it is fine to do manually, the yum/apt-get upgrade itself can be done manually on those hosts) After the client is upgraded, that means there will be 4 new scripts available for the infra client: /usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py /usr/lib/ambari-infra-solr-client/migrationHelper.py /usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh /usr/lib/ambari-infra-solr-client/solrIndexHelper.sh // that was there before, it was just updated On the choosed Solr server host, you can consider to upgrade some of these scripts (migrationConfigGenerator and migrationHelper) to the latest version and download them manually (just for that host), as this Solr upgrade is a one time upgrade, that means the latest could contains fixes. wget --no-check-certificate -O
/usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py
https://raw.githubusercontent.com/apache/ambari-infra/master/ambari-infra-
solr-client/src/main/python/migrationConfigGenerator.py
chmod +x /usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py
wget --no-check-certificate -O /usr/lib/ambari-infra-solr-client/migrationHelper.py
https://raw.githubusercontent.com/apache/ambari-infra/master/ambari-infra-
solr-client/src/main/python/migrationHelper.py
chmod +x /usr/lib/ambari-infra-solr-client/migrationHelper.py 1. Gather required Ambari and Solr parameters (with examples) After the Solr client was upgraded on the Infra Solr, before we would start the backup process, we need to gather some useful informations from Ambari configs and from the Solr ZNode.The migrationConfigGenerator script will be able to do this. This script is used to avoid duplication of providing the similar configuration properties in every command for migrationHelper script. (so in the end, we are generating an ini file, which can work as an input for migrationHelper). Worth to know: that ini file can be filled manually as well (in case if the generation is failing), an example command could be that: CONFIG_INI_LOCATION=ambari_solr_config_file.ini
/usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py --ini-file $CONFIG_INI_LOCATION --host c7401.ambari.apache.org --port 8080 --cluster cl1 --username admin --password admin --backup-base-path=/my/path --java-home /usr/jdk64/jdk1.8.0_112 # for ssl use -s and 8443 for port Asciinema example: https://asciinema.org/a/188260?speed=2 Example output: [ambari_server]
host = c7401.ambari.apache.org
port = 8080
cluster = cl1
protocol = http
username = admin
password = admin
[local]
java_home = /usr/jdk64/jdk1.8.0_112/
hostname = c7402.ambari.apache.org
shared_drive = false
[cluster]
kerberos_enabled = true
[infra_solr]
protocol = http
hosts = c7402.ambari.apache.org,c7403.ambari.apache.org
port = 8886
zk_connect_string = c7401.ambari.apache.org:2181
znode = /infra-solr
user = infra-solr
keytab = /etc/security/keytabs/ambari-infra-solr.service.keytab
principal = infra-solr/c7402.ambari.apache.org
zk_principal_user = zookeeper
[ranger_collection]
enabled = true
ranger_config_set_name = ranger_audits
ranger_collection_name = ranger_audits
ranger_collection_shards = 2
ranger_collection_max_shards_per_node = 4
backup_ranger_config_set_name = old_ranger_audits
backup_ranger_collection_name = old_ranger_audits
backup_path = /my/path/ranger
[atlas_collections]
enabled = true
config_set = atlas_configs
fulltext_index_name = fulltext_index
fulltext_index_shards = 2
fulltext_index_max_shards_per_node = 4
edge_index_name = edge_index
edge_index_shards = 2
edge_index_max_shards_per_node = 4
vertex_index_name = vertex_index
vertex_index_shards = 2
vertex_index_max_shards_per_node = 4
backup_fulltext_index_name = old_fulltext_index
backup_edge_index_name = old_edge_index
backup_vertex_index_name = old_vertex_index
backup_path = /my/path/atlas
[logsearch_collections]
enabled = true
hadoop_logs_collection_name = hadoop_logs
audit_logs_collection_name = audit_logs
history_collection_name = history 2. Steps after we have the config ini file 2. a.) Backup (if required) The documentation is saying that from that point you can use the /usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh script to create the backup, but in that example we won't doing that in order to understand what the script will do. The backup command of ambariSolrMigration.sh script running the following scripts: /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action upgrade-solr-clients
/usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action backup The first command is for upgrade infra solr clients on every hosts where INFRA_SOLR_CLIENT components are installed by Ambari. That command can be replaced with yum/apt upgrade commands on those hosts (that was described above how to do it). The backup command is a bit more complicated. That will send an Ambari command for every ambari agents where Infra Solr servers are installed, and run Solr backup commands by replication handler (https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html#backup-api) - note Backup collection API is not available in Solr 5, that is why we are using the rest API on replication endpoint, also note that if you are calling the backup on a collection, that replication handler will randomly choose 1 shard replica that will be saved on the filesystem, so we will need to use the backup API on every core one-by-one on the right hosts. (it is enough to use backup just only on 1 replica of a shard) For this we will need to figure it out the host location of the cores, and the shard/replica mapping as well. These informations are figured out by the migrationHelper command, but if you are stucked with the command because of any reasons you can gather the informations manually from the :"/infra-solr" znode for every collection. (state.json) Example how to get the state.json (e.g.: for Ranger): source /etc/ambari-infra-solr/conf/infra-solr-env.sh
export SOLR_ZK_CREDS_AND_ACLS="${SOLR_AUTHENTICATION_OPTS}" # in case of kerberos
export ZK_HOST="c7401.ambari.apache.org:2181/infra-solr"
# kinit first with infra-solr user if kerberos is enabled
/usr/lib/ambari-infra-solr/server/scripts/cloud-scripts/zkcli.sh --zkhost "${ZK_HOST}" -cmd get /collections/ranger_audits/state.json A state.json could look like (for ranger_audits, from that point every example will be with ranger_audits collection): {"ranger_audits":{
"replicationFactor":"2",
"shards":{
"shard1":{
"range":"80000000-ffffffff",
"state":"active",
"replicas":{
"core_node1":{
"core":"ranger_audits_shard1_replica1",
"base_url":"http://c7402.ambari.apache.org:8886/solr",
"node_name":"c7402.ambari.apache.org:8886_solr",
"state":"active",
"leader":"true"},
"core_node3":{
"core":"ranger_audits_shard1_replica2",
"base_url":"http://c7403.ambari.apache.org:8886/solr",
"node_name":"c7403.ambari.apache.org:8886_solr",
"state":"active"}}},
"shard2":{
"range":"0-7fffffff",
"state":"active",
"replicas":{
"core_node2":{
"core":"ranger_audits_shard2_replica1",
"base_url":"http://c7402.ambari.apache.org:8886/solr",
"node_name":"c7402.ambari.apache.org:8886_solr",
"state":"active"},
"core_node4":{
"core":"ranger_audits_shard2_replica2",
"base_url":"http://c7403.ambari.apache.org:8886/solr",
"node_name":"c7403.ambari.apache.org:8886_solr",
"state":"active",
"leader":"true"}}}},
"router":{"name":"compositeId"},
"maxShardsPerNode":"22",
"autoAddReplicas":"false"}}
Shard / replica mapping visually: (note that the replicas for one shard are on different hosts) Next step with the happy path is to backup the collections: /usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh --ini-file $CONFIG_INI_LOCATION --mode backup Which basically run the following migrationHelper commands: /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action upgrade-solr-clients
/usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action check-docs
/usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action backup Here, the first and the third commands are the important ones. The first command sends commands to those ambari-agent hosts, where ambari-infra-solr-clients are installed and executes an apt/zypper or yum upgrade against ambari-infra-solr-client packages. With this way you do not need to ssh into every (related) host and upgrade those packages. (if you hit any issue with that command, you can do that as a workaround). The third command is responsible to send ambari Solr backup commands to ambari-agents where Infra Solr servers are installed. The backup calls are done per hosts / cores where it's required, also that is important to backup only 1 replica from a shard (that is less time + those are the replications anyway, if you need 1 more copy from a shard you can just manually copy the backup to somewhere else). The backup commands can be done manually as well, but for that you will need to figure out the right core urls + leaders. You can use the state.json output above to figure that out. The solr prefix url for backup command (see: https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html#backup-api) for 1 replica (let's say core_node2), can be created from its "base_url" and "core" (with core_node2 example, that is "http://c7402.ambari.apache.org:8886/solr/ranger_audits_shard1_replica1"), with this knowladge, a manual backup command for 1 shard looks like this (with the manual way, you need this step for every shard): # do on host c7402.ambari.apache.org , where the core is located
BACKUP_PATH="..." # set an existing path for the backup
BACKUP_CORE_NAME="ranger_audits_shard1_replica1" # that will be used for the snapshot name, can be anything, using core name is recommended
su infra-solr # login with the infra-solr user - can be a custom one
# kinit if cluster is kerberized
kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab $(whoami)/$(hostname -f)
curl --negotiate -k -u : "http://c7402.ambari.apache.org:8886/solr/ranger_audits_shard1_replica1/replication?command=BACKUP&location=$BACKUP_PATH&name=$BACKUP_CORE_NAME"
# that will generate an index snapshot with location: $BACKUP_PATH/snapshot.$BACKUP_CORE_NAME Note that if you will go with the manual way, from that point you will need to do the manual steps for every remaining steps. Also we need to backup the znode data as well (as with Solr7 we will have a new schema / solrconfig.xml), manual way looks like this (if migrationHelper cannot be used): export JAVA_HOME=/usr/jdk64/1.8.0_112 # or other jdk8 location
export ZK_CONN_STR=... # without znode, e.g.: myhost1:2181,myhost2:2181,myhost3:2181
# note 1: --transfer-mode copyToLocal or --transfer-mode copyFromLocal can be used if you want to use the local filesystem
# note 2: use --jaas-file option only if the cluster is kerberized
infra-solr-cloud-cli --transfer-znode -z $ZK_CONN_STR --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --copy-src /infra-solr/configs/ranger_audits --copy-dest /infra-solr/configs/old_ranger_audits That will create and old_ranger_audits under /infra-solr/configs, so that can be used to (later) create a collection with the old schema (it is important: you only need to backup the znode for Ranger, it is not required for Log Search or Atlas) 2. b.) Delete & cleanup old collections If backup is finished successfully for every shards, from that point as you have the data, you can get rid of the Solr 5 collections & delete solr configs (those will be regenerated during Ranger/Atlas/LogSearch restart) With the happy path, that can be done with one call: /usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh --ini-file $CONFIG_INI_LOCATION --mode delete --skip-solr-client-upgrade In case of backup is not required before that step, you can remove the --skip-solr-client-upgrade flag from the command. That one is used to not re-upgrade the solr-clients (as it was done during backup, but obviously if backup have not done, you will need to upgrade those clients). The delete command executes the following scripts: /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action delete-collections
/usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action upgrade-solr-instances
/usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action restart-solr
/usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action restart-ranger
/usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action upgrade-logsearch-portal
/usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action upgrade-logfeeders
/usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action restart-logsearch
/usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action restart-atlas
The first command is the most important one. That will delete all the collections (Ranger: ranger_audits, Atlas: vertex/fulltext/edge index, LogSearch: audit_logs, service_logs, history) and also deletes (or upgrade) the solr configs from zookeeper for those collections. (during collection re-creation, those will be re-uploaded by LogSearch/Atlas/Ranger service starts). These steps looks the following manually: (with ranger_audits, but do this on against all collections) su infra-solr # infra-solr user - if you have a custom one, use that
# use kinit and --negotiate option for curl only if the cluster is kerberized
kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab $(whoami)/$(hostname -f)
curl --negotiate -k -u : "http://c7402.ambari.apache.org:8886/admin/collections?action=DELETE&name=ranger_audits" For ranger_audits, it is not needed to delete the solr configs from the znode, it is enough to just upgrade the actual one (the schema): sudo -u infra-solr -i
# If kerberos enabled
kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab $(whoami)/$(hostname -f)
## UPLOAD NEW SCHEMA
# Setup env for zkcli.sh
export JAVA_HOME=/usr/jdk64/1.8.0_112 # or other jdk8 location
export ZK_CONN_STR=... # without znode, e.g.: myhost1:2181,myhost2:2181,myhost3:2181
source /etc/ambari-infra-solr/conf/infra-solr-env.sh
# Run that command only if kerberos is enabled.
export SOLR_ZK_CREDS_AND_ACLS="${SOLR_AUTHENTICATION_OPTS}"
# Upload the new schema
/usr/lib/ambari-infra-solr/server/scripts/cloud-scripts/zkcli.sh --zkhost "${ZK_HOST}" -cmd putfile /configs/ranger_audits/managed-schema /usr/lib/ambari-infra-solr-client/migrate/managed-schema
Also note that, in the backup steps we have already backup the ranger configs (in /infra-solr/configs/old_ranger_audits), so that one will use the old schema. For atlas, delete/upgrade is not needed on the solr configs, for Log Search, it is required to delete all of the solr configs, which looks like the following for hadoop_logs/audit_logs/history collections:
su infra-solr # infra-solr user - if you have a custom one, use that
# ZOOKEEPER CONNECTION STRING from zookeeper servers
export ZK_CONN_STR=... # without znode,e.g.: myhost1:2181,myhost2:2181,myhost3:2181
kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab $(whoami)/$(hostname -f)
zookeeper-client -server $ZK_CONN_STR rmr /infra-solr/configs/hadoop_logs
zookeeper-client -server $ZK_CONN_STR rmr /infra-solr/configs/audit_logs
zookeeper-client -server $ZK_CONN_STR rmr /infra-solr/configs/history The remaining migrationHelper steps can be done manually by yum/apt/zypper commands (e.g.: "yum upgrade ambari-infra-solr" where those required, also do the same for ambari-logsearch-portal and ambari-logsearch-logfeeder packages), and the restart commands can be done from Ambari UI. Before restarting Infra Solr (after upgrade is done), do not forget to revert changed infra-solr-env and infra-solr-security-json configurations. (if those were required to be changed, see at the Prerequisites part of the article) From that point you can go ahead with upgrading HDP or HDF, restore and migrate can be done offline at anytime, just make sure you have the backups for the old collections. 3. Migrate & Restore If you need your old data from Atlas and Ranger collections, the first thing that you need to do is to migrate the backups, then secondly restore them to new collections (so do not touch those collections that was re-created by Log Search / Atlas / Ranger during service restarts) With the happy path, the migration and restore can be done together in one command: /usr/lib/ambari-infra-solr-client/ambariSolrMigration.sh --ini-file $CONFIG_INI_LOCATION --mode migrate-restore # you can use --keep-backup flag as well if you do not want to delete the backup files yet That command will execute the following commands: /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action check-docs # if one of the collection is not available that can fail - you can skip that command if you will use migrationHelper.py commands directly instead of the happy-path script
/usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action migrate
/usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action restore
/usr/bin/python /usr/lib/ambari-infra-solr-client/migrationHelper.py --ini-file $CONFIG_INI_LOCATION --action rolling-restart-solr --batch-interval 60 There are 2 important commands here: the second and the third. (migrate and restore) 3. a.) Migrate Migrate will send ambari commands to hosts where you have snapshots, that will run IndexUpgraderTool (https://lucene.apache.org/solr/guide/6_6/indexupgrader-tool.html) by solrIndexHelper script. The solrIndexHelper script can be executed manually against the snapshots: export JAVA_HOME=/usr/jdk64/1.8.0_112
# if /tmp/ranger-backup is your backup location
infra-lucene-index-tool upgrade-index -d /tmp/ranger-backup -f -b -g
# -b flag work as a filter, it will use the snapshot.* folders for index upgrade
# with 'infra-lucene-index-tool help' command you can checkout the command line options You can also use a java command directly on the index, the lucene libraries can be found at /usr/lib/ambari-infra-solr-client/migrate (both lucene 6 and lucene 7 libraries) java -cp /usr/lib/ambari-infra-solr-client/migrate/lucene-backward-codecs-6.6.2.jar:/usr/lib/ambari-infra-solr-client/migrate/lucene-core-6.6.2.jar lucene.index.IndexUpgrader [-delete-prior-commits] [-verbose] /path/to/index Note that the index upgrade can take a lot of time (1GB/min), so worth to do the migration commands with nohup, in order to run the commands in the background. In case of someone totally skipped the backup and Infra Solr instances were upgraded, it is still possible to fix the lucene 5 index, by running the IndexUpgrader on you index (but make sure Solr is stopped) 3. b.) Restore Restore command does multiple things. It will create the new collections, then it will restore your migrated data in those collections. That means you will need at least the same number of shards that you have before the backup (in the original collections). Those exact values were saved by the migrationConfigGenerator tool, but if you did manual steps, you need to count how many shards did you have per collections. So first, let's make a collection that we can restore: (here we will use old_ranger_audits, remember, before we saved the old_ranger_audits schema, that we can use for this collection, but we will need to new solrconfig.xml, as we need to be compatible with solr7, although we will use an old schema, so it will be needed to copy the newly created solrconfig.xml to configs/old_ranger_audits znode): su infra-solr
# kinit only if kerberos is enabled for tha cluster
kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab $(whoami)/$(hostname -f)
export JAVA_HOME=/usr/jdk64/1.8.0_112 # or other jdk8 location
export ZK_CONN_STR=... # without znode, e.g.: myhost1:2181,myhost2:2181,myhost3:2181
# note 1: jaas-file option required only if kerberos is enabled for the cluster
# note 2: copy new solrconfig.xml as the old one won't be compatible with solr 7
infra-solr-cloud-cli --transfer-znode -z $ZK_CONN_STR --jaas-file /etc/ambari-infra-solr/conf/infra_solr_jaas.conf --copy-src /infra-solr/configs/ranger_audits/solrconfig.xml --copy-dest /infra-solr/configs/old_ranger_audits/solrconfig.xml
# note: it is enough to use 1 replica for that collection
curl --negotiate -k -u : "http://c7402.ambari.apache.org:8886/solr/admin/collections?action=CREATE&name=old_ranger_audits&numShards=2&replicationFactor=1&maxShardsPerNode=4&collection.configName=old_ranger_audits" It is possible that your snapshots and the newly created shards will be on different hosts. You can solve this issue by deleting core data from your local filesystem (like in /opt/ambari_infra_solr/data/old_ranger_audits* or where your solr data dir located for that collection...you can safely delete data from there as that collection is empty), and edit the old_ranger_audits state.json (download znode content, then upload it) to use the right hosts for a shard. (you need to replace base_url and node_name values in the state.json). After the state.json update, you will need to restart the Solr instances. Note these hacks are only required if you want to manually copy your index files to the new cores and you should do these only if the Solr Rest API RESTORE actions (https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html#restore-api) did not worked. For 1 shard using Solr restore command can look like this (core names can be gathered from the state.json - similar as you have seen in the backup phase, but look at the old_ranger_audits collection, not the ranger_audits_collection): su infra-solr
BACKUP_PATH=... # backup location, e.g.: /tmp/ranger-backup
OLD_BACKUP_COLLECTION_CORE=... # choose a core to restore
BACKUP_CORE_NAME=... # choose a core from backup cores - you can find these names as : <backup_location>/snapshot.$BACKUP_CORE_NAME
kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab $(whoami)/$(hostname -f)
curl --negotiate -k -u : "http://c7403.ambari.apache.org:8886/solr/$OLD_BACKUP_COLLECTION_CORE/replication?command=RESTORE&location=$BACKUP_PATH&name=$BACKUP_CORE_NAME" 4. Transport inactive restored collection data to active collections. From that point, although you will have your old data in Solr, those collections won't be actively used, so as a very last step the data needs to be transported to the active collections (so transfer old_ranger_audits data to ranger_audits collection). As the schema is a bit different for the new collections, the easiest way to do this is to do a query against old_ranger_audits collections, and the response can be used as doc inputs for ranger_audits. That can be done by solrDataManager script, see in this asciinema video: https://asciinema.org/a/188396?speed=2 See more docs: https://github.com/apache/ambari-infra/blob/master/ambari-infra-solr-client/README.md
... View more
Labels:
05-10-2018
06:40 PM
1 Kudo
Hi @J Koppole you can do this before using zkcli.sh of infra-solr sudo -u infra-solr -i
kinit -kt /etc/security/keytabs/ambari-infra-solr.service.keytab $(whoami)/$(hostname -f) source /etc/ambari-infra-solr/conf/infra-solr-env.sh
export SOLR_ZK_CREDS_AND_ACLS="${SOLR_AUTHENTICATION_OPTS}"
... View more
03-26-2018
10:01 PM
Seems like it's required. I guess mostly that was used for bootstrapping. How that solrconfig.xml file changes ... it's not that clear we already have the collection created or we are just doing an update. Then i think in the future upload configs + create / reload collections should be moved on Ranger side (background bootstrap process), then Ambari should just configure the files, not to do anything with Solr API from agent side.
... View more
03-26-2018
06:33 PM
hi, @Hajime If you are restarting Ranger it should upload a new solrconfig.xml to zookeeper with the changed max retention days. Then maybe you can reload the collection with Solr API (RELOAD action on collections, see: https://lucene.apache.org/solr/guide/6_6/collections-api.html) ,
... View more
02-16-2018
02:20 PM
sorry for late response. https://cwiki.apache.org/confluence/display/AMBARI/Modify+configurations usage of configs.py can be useful here
... View more
02-16-2018
02:19 PM
sorry for late response, i did not notice. probably you need to extend SOLR_OPTS with -Dsolr.hdfs.security.kerberos.principal= ... // from infra-solr-env template, you can use `hostname -f` there, then use it like that in the xml <strname="solr.hdfs.security.kerberos.principal">${solr.hdfs.security.kerberos.principal:}</str>
... View more
02-06-2018
03:33 PM
@Julian Blin , I think you need to set 'infra-solr-env/infra_solr_kerberos_name_rules' property for using the rules for Solr,
... View more
12-18-2017
05:33 PM
5 Kudos
If Ranger is setup to use
Solr to store audits, it is necessary to configure Solr and Ranger Solr
collection properly in order to keep our system stable. Here, there is no any
magic number about the number of Solr nodes/shards, because with Ranger the
required numbers are based on the load (generated by master components). If there are lots of ongoing jobs on your
cluster, obviously it will generate more audits. (It matters more how busy is
your cluster than how many machines do you have – of course, if you have more
resources, then your system can handle more load) Ambari Infra Solr
load test on Ranger Solr collection In order to get the proper (approximate)
Solr settings with Ranger audits, we had to do some scale/load testing. For this purpose we also need something to
visualize the results, because we need to understand what happened during the
load. Here we can use AMS + Grafana. As Ambari Infra uses Solr 5.5.2, it has no
Solr Metrics API in that Solr release, so we need a component to somehow send
Solr metrics to AMS. Solr has JMX
support, so what we can do is to create a component, which gather Solr JMX
details and push the metrics to AMS. Ambari mpack to periodically
send JMX data to AMS (with pre-defined Grafana dashboards): https://github.com/oleewere/ams-solr-metrics-mpack (unofficial, I created to myself to help scale/load
testing) Ambari cluster setup for load
testing:
Services: Ranger,
Infra Solr, Zookeeper, AMS (distributed mode), HBase, KNOX, Kafka, HDFS, YARN,
Hive, Solr Metrics (the mpack that you can see above) Number of nodes:
~60 nodes, 8 Solr nodes Node details:
4CPU 15G RAM Solr heap: 5G
(anything else is default) After first day run I started
with relatively small load, only ~1 million Ranger audit data per day with just
only one shard:
Number of audits:
9.5 million Index size: ~1GB CPU load -
average 0.5%, max 2% Heap – average:
265MB, max: 302MB Number of threads
- average/max: 31 Connections -
ESTABLISHED: ~10, CLOSE_WAIT: 1, TIME_WAIT: 1 Transaction log
size - average: 132MB, max: 170MB Based on these metrics, we
can calculate with that: 10 million audit doc is about 1GB index. As you can
see the system values seems to be stable, did not hit any limits, but that is
expected with this small load. From this point we can increase the load, also
it worth to mention how to add new shards for Ranger Solr Audit collection. As
Ranger uses compositeId for routing, therefore, you will need to split the
shards (if the number is not right originally); For example if you have a
shard called shard1 in ranger_audit collection, the split shard request can
look like this: http://<solr_address>:8886/solr/admin/collections?action=SPLITSHARD&collection=ranger_audits&shard=shard1&async=1000&wt=json Wait until the request
finishes, you can check the status with:
http:// <solr_address>:8886/solr/admin/collections?action=REQUESTSTATUS&requestid=1000&wt=json That will put your original core
to inactive status, you can use UNLOAD action on that if you would like to
delete it. (note: both new cores will be on the same machine, so it is possible
you will need to move one of the core to somewhere else – like you can create a
new replica from that, and delete the old one) As we have multiple nodes, we
can start the load again (with higher throughput): ~265mill doc/day, then the
result after a few days run (both shard on the same node):
Number of audits:
~615Million Index size: ~70G
(35 – 35 G) CPU load -
average 74%, max 88% Heap – average: 1,7
GB, max: 1,88 GB Number of threads
– average: 183, max: 257 Connections -
ESTABLISHED: ~110, CLOSE_WAIT: ~90, TIME_WAIT: 6K Transaction log
size - average: 132MB, max: 256MB As you can see there are a
lot of network connections created (6K), so with high load, you can run out of
network connections. To fix that you can use the following system network
settings: (sysctl): net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
(note: As net.ipv1.tcp_tw_recycle has been removed from later versions of Linux TCP stack, you can increase net.core.somaxconn and net.ipv4.tcp_fin_timeout to get more half-open connections than recycle/reuse kernel structures) (other recommended system settings: https://community.hortonworks.com/articles/90955/how-to-fix-ambari-infra-solr-throwing-outofmemorye.html) Also we noticed, there were
often large GC pause times, we did not use G1 GC (which can be used to set a
low GC pause time, default is 250msec), so we changed the configuration use
that (in infra-solr-env/content property): GC_TUNE="-XX:+UseG1GC -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled -XX:G1HeapRegionSize=3m -XX:MaxGCPauseMillis=250 -XX:InitiatingHeapOccupancyPercent=75 -XX:+UseLargePages -XX:+AggressiveOpts" Then I moved one of
the cores to another Solr node in order to test them with in different circumstances.
(I will stop one of them, and in the last day of load test, I will start it
again), the result was after few days run: Number of audit
records: ~1,2 billion Node 1 (load ran
more days):
Index size: ~83G CPU load - max 60% Heap – average: 2.7
GB, max: 3,6 GB Number of threads
– average: 75, max: 102 Connections -
ESTABLISHED: 48, CLOSE_WAIT: 59, TIME_WAIT: 17 Transaction log
size – 200-400 MB (note: daily load
started at ~14:00) Node 2 (1 day ran):
Index size: ~45G CPU load - max 26% Heap – average: 1
GB, max: 1,2 GB Number of threads
– average: 35, max: 38 Connections -
ESTABLISHED: 28, CLOSE_WAIT: 4, TIME_WAIT: 4 Transaction log
size – 2-4 MB You can notice, the TIME_WAIT
connections are looking much better, that is because of the sysctl settings
that I modified before. Conclusion Solr is the right tool for
handling tokenized static data, but it is required the find out the proper
configuration with dynamically growing data. (That is the case with Ranger
audits) As the goal is to keep the system stable, so it is better to start with
the right number of cores/shards/replicas, as splitting shards / adding new
replicas can be really costly, and most of the time it’s “required” when your
Solr cluster reached the limits. (Also we need to keep the transaction log size
small as possible, one of the way to doing this is horizontal scaling) Based on the metrics that we have and the TTL
(time-to-leave) values of Ranger audits (default: 90 days), we can
approximately recommend the following settings for our Solr cluster:
Use G1 GC (set in
infra-solr-env/content) – for low pause time For production
set JVM memory ~10-12 G Number of
shards: keep data below 25G / shard,
oversharding is ok, you can predict how many shards you will need as you can
count with ~ 1million doc / GB, so TTL * # of docs with one day load =~ max
index size Every shard
should have at least an another replica (it is useful if you want to remove a
host, you can delete the replicas from there then add new ones into an another
host) Shards per node:
2-3, but it can be higher, based on how much memory Solr uses OS settings:
reuse sockets (as you can run out of network connections) – you can find the
sysctl settings above Appendix (performance
factors – in solrconfig.xml):
Limit
the indexing buffer size: All documents kept
in memory until it exceeds RAM buffer size (defined in solrconfig.xml): <ramBufferSizeMB>100</ramBufferSizeMB> Once it exceeded,
Solr creates a new segment / merge index to the new segment. (100 MB is the
default one). You can set the limit based on doc size as well. <maxBufferedDocs>1000</maxBufferedDocs> If the RAM size
cross the limit, then it will flush the changes. Note: With <maxIndexingThread>
you can also control the number of threads that are used for indexing.
(default: 😎 High frequency
commits: use more CPU time Low frequency
commits: use more memory of your instance 2. Commits: Commit - make sure updates are stored on the disk. Automatic commits: when enabled, docs automatically
written to the storage (based on some conditions), hard commit will
replicate index on all nodes (on cluster environment). Conditions are:
<maxTime> or <maxDoc>, choose them to be lower value if there are
continuous index updates in your system. Also there is an option
<openSearcher>, if it’s true enables committed changes to be visible
immediately. (new searcher) Soft commits: faster than hard commits, makes
index change visible for searches, does not any sync index across nodes. (near
real time) Power failure -> data lost. Soft commit <maxTime> should be
set less than hard commit time. Update Log: Enables transaction logs, those are used
for recovery of updates (replay during startup) and durability. It’s
recommended to have hard commit size limit based on update log size. Example: <updateHandler class=”solr.DirectUpdateHandler2”>
<updateLog>
<str name=”dir”>${solr.ulog.dir:}</str>
</updateLog>
<autoCommit>
<maxTime>15000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>1000</maxTime>
</autoSoftCommit>
</updateHandler>
3. Optimizing index merge: Merge factor: This value tells lucene how many segments should be built
before merging them to a single one.
(<mergeFactor> in <indexConfig>) High merge factor: (e.g. 20) improves indexing speed, resulting more index files so
searches will be slower. Low merge factor: (e.g. 5) improves searches, but because of more segment merge that slows
down the indexing. 4. Caches: Common parameters: class: solr.LRUCache, solr.LFUCache, solr.FastLRUCache initialSize: initial capacity for the cache (hashMap) size: max size of the cache autowarmCount: when a new searcher is opened, its cache can be
pre-populated, that set the number of items that will be generated from the old
caches, 4.1 Field cache (per node): Used for
sorting and faceting. Lucene low level field cache, that is not managed by Solr
so it has no configuration options. Store field values. 4.2 Filter cache
(per core): This cache is responsible to storing documents
(ids) for filter queries. If you have faceting, using this can improve
performance. (if docs continuously grows maybe it worth to disable it,
especially if you are using a lot of different filter queries really often) 4.3.
Document cache (per core): This cache is storing lucene documents
that are fetched from the disk. That can reduce disk I/O. The size needs to be
larger than max_results * max_concurrent_queries. (requires relatively small
heap) 4.4 Field value
cache (per core): This cache is mainly for faceting.
Similar as field cache, but It supports multi-valued fields. If it is not
declared in solrconfig.xml, it’s generated automatically with initial size 10
(up to 10000) 4.5 Query result
cache (per core): This cacheis storing the top n
query results. (ordered set of document ids -> therefore it use much less
memory than filter cache) (requires relatively small heap) Cache sizing: Smaller cache size can help to avoid full garbage collections. Disabling caches: Comment out caching section. Cache
Example (in solrconfig.xml): <query>
<documentCache class=”solr.LRUCache” size=”512” initialSze=”512” autowarmCount=”0”/>
...
</query>
5.
TTL (time to leave): <updateRequestProcessorChain ...>
<processor class=”solr.DefaultValueUpdateProcessorFactory” >
<str name=”fieldName”>_ttl_</str>
<str name=”value”>+7DAYS</str>
</processor>
<processor class=”solr.DocExpirationUpdateProcessorFactory”>
<int name=”autoDeletePeriodSeconds”>8600</str>
<str name=”ttlFieldName”>_ttl_</str>
<str name=”expirationFieldName”>_expire_at_</str>
</processor>
<processor class=”solr.FirstFieldValueUpdateProcessorFactory”>
<str name=”fieldName”>_expire_at_</str>
</processor>
...
</updateRequestProcecssorChain>
... View more
Labels:
12-05-2017
09:22 PM
1 Kudo
@Attila Kanto , I think you can do something like that in a script: #!/usr/bin/expect
spawn ambari-server setup-sso expect "Some question:" send "answer\r" ... expect eof
... View more
05-29-2017
02:42 PM
7 Kudos
Log Search is a log analysis / monitoring tool which is shipped with Ambari. Log Search has 2 components: Log Search Portal (server + web) and Log Feeder. The second one is responsible to monitor specific log files and ship the parsed log lines into Solr. To define which files should be monitored and how the parsing should work for those files, you will need input config descriptors (list of the input config descriptors are located in /etc/ambari-logsearch-logfeeder/conf/logfeeder.properties, all of them are defined in logfeeder.config.files configuration property, on ambari managed Log Search service, those can be found in logfeeder-properties/logfeeder.config.files configuration entry) If you have a specific custom service (see: https://cwiki.apache.org/confluence/display/AMBARI/Custom+Services), to support that inside Log Search application you will need a *-logsearch-conf.xml (* can be a custom name, Ambari will generate input.config-*.json file based on the name inside /etc/ambari-logsearch-logfeeder/conf/) file inside the {SERVICE_NAME}/{SERVICE_VERSION}/configuration folder, this *-logsearch-conf.xml should contain 3 properties: - service_name - component_mappings - content Here is an example for that (zookeeper): <configuration supports_final="false" supports_adding_forbidden="true">
<property>
<name>service_name</name>
<display-name>Service name</display-name>
<description>Service name for Logsearch Portal (label)</description>
<value>Zookeeper</value>
<on-ambari-upgrade add="true"/>
</property>
<property>
<name>component_mappings</name>
<display-name>Component mapping</display-name>
<description>Logsearch component logid mapping list (e.g.: COMPONENT1:logid1,logid2;COMPONENT2:logid3)</description>
<value>ZOOKEEPER_SERVER:zookeeper</value>
<on-ambari-upgrade add="true"/>
</property>
<property>
<name>content</name>
<display-name>Logfeeder Config</display-name>
<description>Metadata jinja template for Logfeeder which contains grok patterns for reading service specific logs.</description>
<value>{ "input":[
{ "type":"zookeeper",
"rowtype":"service",
"path":"{{default('/configurations/zookeeper-env/zk_log_dir', '/var/log/zookeeper')}}/zookeeper*.log"} ],
"filter":[ {
"filter":"grok",
"conditions":{
"fields":{"type":["zookeeper"]}
},
"log4j_format":"%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n", "multiline_pattern":"^(%{TIMESTAMP_ISO8601:logtime})",
"message_pattern":"(?m)^%{TIMESTAMP_ISO8601:logtime}%{SPACE}-%{SPACE}%{LOGLEVEL:level}%{SPACE}\\[%{DATA:thread_name}\\@%{INT:line_number}\\]%{SPACE}-%{SPACE}%{GREEDYDATA:log_message}",
"post_map_values": {
"logtime": {
"map_date":{
"target_date_pattern":"yyyy-MM-dd HH:mm:ss,SSS"
}
}
}
}
]}
</value>
<value-attributes>
<type>content</type>
<show-property-name>false</show-property-name>
</value-attributes>
<on-ambari-upgrade add="true"/>
</property>
</configuration> The first property is the service_name, that will be the label for the custom service inside Log Search, which will appear on the troubleshooting page. The second one is the component_mappings, that's important because of 2 reasons: if you will click on the custom service label on Log Search portal, it will choose the proper components for filtering, the other reason is that we need to map the specific logIds (those are defined in the service descriptors) to Ambari components (as you see Ambari components names are different from Log Search names: ZOOKEEPER_SERVER <-> zookeeper_server). It can be multiple logIds for components, as its possible, for a specific component there are multiple log files that is needed to monitor. The last property is the content, which is a template that will be generated during logfeeder startup (also that means you will need to restart the Log Feeders if you just added a your new service to the cluster with the proper *-logsearch-conf configuration). First thing you need here is an "input", which describe the log file(s) that is monitored by the Log Feeder. ("rowtype" should be service, type is the logId, path is the log location pattern, can be used regex there ... as you can see there is a python code used in the path, that is used to get the zookeeper log directory from the ambari configuration, that can be important in case of the log directory changes). The second important block is the "filter" part. there you will need to chose "grok" (there is a "json" one as well, but that only works on that case if you have the logsearch-log4j-appender in your classpath). With Grok you can describe, how the log lines should be parsed, and what fields will be mapped to specific solr fields. (2 important fields here: multiline_pattern - if this pattern matches, that means the actual line will be appended to the last one (log_message), message_pattern: that will define how to parse the specific fields and maps them to Solr field, here logtime and log_message are required, level is optional, but recommended). After the parsing has done, you can modify the mappings with post_map_values (as you see in the example, we re-map the date to use a specific pattern in order to save dates in a specific format inside Solr) (for more details about input configurations see: https://github.com/apache/ambari/blob/trunk/ambari-logsearch/ambari-logsearch-logfeeder/docs/inputConfig.md) For figure it out what is the proper pattern to use to your log files, you can use: https://grokdebug.herokuapp.com/ There are some built-in grok patterns used for Log Search, you can find those here: https://github.com/apache/ambari/blob/trunk/ambari-logsearch/ambari-logsearch-logfeeder/src/main/resources/grok-patterns - that can be included to the debugging tool if you click on "Add custom patterns".
... View more
Labels: