Member since
06-27-2017
25
Posts
2
Kudos Received
0
Solutions
01-03-2024
01:48 PM
you property was removed, removed from what/where? I cannot find it.
... View more
02-25-2022
03:51 PM
1 Kudo
@regeamor As this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post. Thanks!
... View more
01-05-2022
01:45 PM
@regeamor Has the article helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
05-26-2021
01:58 PM
I misunderstood the original question. I understand what you are asking, now. I do not know if there is a way to clear out the old keys. They don't use much storage space so I've never heard of anyone having an issue with that. Michael
... View more
12-17-2020
12:48 PM
solution, change #$admin = HBaseAdmin.new(connection) $admin = connection.getAdmin(); leads to other changes new script # # hbase org.jruby.Main list_small_regions.rb min_siz <namespace.tablename> # # Note: Please replace namespace.tablename with your namespace and table, eg NS1.MyTable. This value is case sensitive. require 'digest' require 'java' java_import org.apache.hadoop.hbase.HBaseConfiguration java_import org.apache.hadoop.hbase.client.HBaseAdmin java_import org.apache.hadoop.hbase.TableName java_import org.apache.hadoop.hbase.HRegionInfo; java_import org.apache.hadoop.hbase.client.Connection java_import org.apache.hadoop.hbase.client.ConnectionFactory java_import org.apache.hadoop.hbase.client.Table java_import org.apache.hadoop.hbase.util.Bytes def list_bigger_regions(table_name) cluster_status = $admin.getClusterStatus() master = cluster_status.getMaster() biggers = [] cluster_status.getServers.each do |s| cluster_status.getLoad(s).getRegionsLoad.each do |r| # getRegionsLoad returns an array of arrays, where each array # is 2 elements # Filter out any regions that don't match the requested # tablename next unless r[1].get_name_as_string =~ /#{table_name}\,/ if r[1].getStorefileSizeMB() > $low_size if r[1].get_name_as_string =~ /\.([^\.]+)\.$/ biggers.push $1 else raise "Failed to get the encoded name for #{r[1].get_name_as_string}" end end end end biggers end def regions_to_merge?(table_name) bigger_regions = list_bigger_regions(table_name) #regions = $admin.getTableRegions(Bytes.toBytes(table_name)); regions = $admin.getTableRegions(table_name); filtered_regions = regions.reject do |r| bigger_regions.include?(r.get_encoded_name) end puts "#{table_name},#{regions.length},#{bigger_regions.length},#{filtered_regions.length-1}" filtered_regions.length end limit_batch = 1000 do_merge = false config = HBaseConfiguration.create() connection = ConnectionFactory.createConnection(config) #$admin = HBaseAdmin.new(connection) $admin = connection.getAdmin(); # Handle command line parameters $low_size = 1 if ARGV[0].to_i >= $low_size $low_size=ARGV[0].to_i end all_tables=$admin.listTableNames if ARGV.length > 1 p "ARGV - #{ARGV[1]}" Table table = connection.getTable(TableName.valueOf("#{ARGV[1]}")); tables.push table else tables=all_tables end puts "TABLE_NAME,TOT_REGIONS,REGIONS_OK,REGIONS_NEED_MERGED" tables.each do |table| #table_name = table.getName() regions_to_merge?(table) end $admin.close
... View more
11-03-2020
07:30 AM
Thanks, What I am experiencing is that the complete file, if 300GB, has to be assembled before upload to S3. This requires either 300GB of memory or disk. Distcp does not create a part file per block. I have not witnessed any file split being done. Multi part uploads require you get an upload ID and upload many part files with a numeric extension and in the end ask S3 to put them back together. I do not see any of this being done. I admit I do not know much about all this and it could be happening out of my sight.
... View more
03-04-2019
09:12 PM
Hello @regeamor Thank you for posting the query with us. Basically when you enable Dynamic allocation it gracefully remove the Idle containers which were idle for (60s as default) But when you lower the value will remove the executors frequently depending upon the executor's usage (just like the tasks getting allocated to those executors and the amount of tasks required) https://spark.apache.org/docs/latest/job-scheduling.html#graceful-decommission-of-executors Also, the above said behaviour is upto Spark (and yarn container pre-emption wont be aware of it)
... View more
03-20-2018
02:45 PM
#!/usr/bin/env python import ssl,sys,time from cm_api.api_client import ApiResource from cm_api.endpoints.types import ApiClusterTemplate from cm_api.endpoints.cms import ClouderaManager from cm_api.endpoints import clusters, events, hosts, external_accounts, tools from cm_api.endpoints import types, users, timeseries, roles, services ssl._create_default_https_context = ssl._create_unverified_context try: cm = ApiResource("CM_SERVER","7183","admin","CM_PASS","true","15") cluster = cm.get_cluster("CLUSTER_NAME") except: print "Failed log into cluster %s" % ("CLUSTER_NAME") sys.exit(0) servers = [ "server1.company.com", "server2.company.com", "server3.company.com"] s = cluster.get_service("solr") ra = [] for r in s.get_roles_by_type("SOLR_SERVER"): hostname = cm.get_host(r.hostRef.hostId).hostname if hostname in servers: ra.append([hostname,r]) ra.sort() print "\nWill restart %s SOLR instances" % len(ra) for hostname,r in ra: print "\nRestarting SOLR on %s" % (hostname) s.restart_roles(r.name) r = s.get_role(r.name) wait = time.time() + 180 # three minutes while r.roleState != "STARTED": print "Role State = %s" % (r.roleState) print "Waiting for role state to be STARTED" print time.strftime("%H:%M:%S") if time.time() > wait: print "SOLR failed to restart on %s" % (hostname) sys.exit(1) time.sleep(10) r = s.get_role(r.name) print "SOLR restarted on %s" % (hostname) print "\nAll SOLR roles restarted" sys.exit(0)
... View more