About vxu

vxu · ‎02-28-2017

Caveat: This feature has been validated by HWX engineer manually but we don't officially support it at the moment. Environment: HDP-2.5.3.0-37 Ambari-2.4.2.0-136 JDK1.8 Kerberos enabled Ranger enabled Due to security limitations, we can only launch flume agent processes in Ambari. STEP 1: Create/modify flume configuration file. Ambari -> Flume -> Configs -> flume.conf # Flume agent config #### Global #### demo.sources = logtcp logudp demo.channels = kafka_channel demo.sinks = sink #### Sources #### demo.sources.logtcp.type = multiport_syslogtcp demo.sources.logtcp.ports = 9515 demo.sources.logtcp.host = 0.0.0.0 demo.sources.logtcp.keepFields = true demo.sources.logtcp.selector.type=replicating demo.sources.logtcp.channels= kafka_channel demo.sources.logudp.type = syslogudp demo.sources.logudp.port = 9515 demo.sources.logudp.host = 0.0.0.0 demo.sources.logudp.keepFields = true demo.sources.logudp.selector.type=replicating demo.sources.logudp.channels = kafka_channel #### Sinks #### demo.sinks.sink.type = logger demo.sinks.sink.channel = kafka_channel #### Channels #### demo.channels.kafka_channel.type = org.apache.flume.channel.kafka.KafkaChannel demo.channels.kafka_channel.kafka.bootstrap.servers = node1.vxu.com:6667,node2.vxu.com:6667,node3.vxu.com:6667 demo.channels.kafka_channel.kafka.topic = flume_topic demo.channels.kafka_channel.kafka.producer.security.protocol = SASL_PLAINTEXT demo.channels.kafka_channel.kafka.producer.sasl.mechanism = GSSAPI demo.channels.kafka_channel.kafka.consumer.security.protocol = SASL_PLAINTEXT demo.channels.kafka_channel.kafka.consumer.sasl.mechanism = GSSAPI STEP 2: Add kafka jaas file(s) Create a flume_kafka_jaas.conf in /etc/flume/conf/: KafkaClient { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true serviceName="kafka" keyTab="/etc/security/keytabs/kafka.service.keytab" principal="kafka/node1.vxu.com@VXU.COM"; }; STEP 3: Modify flume-env template Ambari -> Flume -> Configs -> Advanced flume-env -> flume-env template ... # Enviroment variables can be set here. export JAVA_HOME={{java_home}} # Give Flume more memory and pre-allocate, enable remote monitoring via JMX export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote -Dflume.monitoring.type=http -Dflume.monitoring.port=34545 -Djava.security.auth.login.config=/etc/flume/conf/flume_kafka_jaas.conf" # Note that the Flume conf directory is always included in the classpath. # Add flume sink to classpath if [ -e "/usr/lib/flume/lib/ambari-metrics-flume-sink.jar" ]; then export FLUME_CLASSPATH=$FLUME_CLASSPATH:/usr/lib/flume/lib/ambari-metrics-flume-sink.jar fi export HIVE_HOME={{flume_hive_home}} export HCAT_HOME={{flume_hcat_home}} Note: After changing the flume configs, you need to clear the /etc/flume/conf/demo directory and kill all previous flume agent processes. Otherwise, new configs may not take effect.

vxu · ‎08-24-2016

@Brandon Wilson mqureshi's explanation is correct, technically, you can have unlimited number of snapshots in hbase, but it will put much pressure on hdfs. It would not only just occupy some disk space, but it would create a huge amount of hfiles that might slow down the NameNode. Let's assume that you have a 10-CF HTable with 50k regions, each CF has 5 hfiles in average, which means you would have totally 2.5million hfiles for this single table. The first time you create a snapshot, all 2.5m hfiles will be referenced. When you do another snapshot in the next day(after some routine compactions, of course), another 2 or more million new hfiles will probably be referenced. Remember: old hfiles would not be removed until the snapshot is removed. In this case, you will get more than 15 million referenced hfiles after a week, which would be a really bad news for namenode.

vxu · ‎08-23-2016

Hi @Raja Ray, To answer your questions: 1. If I put data in temporary hbase cluster during main hbase cluster downtime, then how I will merge data from temporary cluster to main cluster when main cluster will be up and running. If there are only Put operations during the main cluster downtime, you can use CopyTable tool or Export& Bulkload tool to migrate data from temporary cluster back to main cluster after it's up. But if there are both Put and Delete operations during the main cluster downtime, the best way to migrate data is to set up hbase replication from temporary cluster to main cluster. This will read all WALs(Write-ahead-log) and replay both Puts and Deletes on the main cluster after it's up. 2. When I am restoring data from hdfs hfile location to new location, then how I will recover memstore data. Memstore is a place in RS to keep incoming data. It will start growing when new write operations are coming. If you mean the blockcache of the hfile, that will be reload into memory when new read operations are coming. 3. If I shutdown restart hbase service, is memstore data being flushed to hdfs hfile that time? Yes, memstore would be forced to flush to hfile before RS is shutdown. Make sure hdfs path '/apps/hbase/data/WALs/' is empty after hbase being shutdown, so that all memstore data has been flushed into hfiles. Thanks, Victor

vxu · ‎08-22-2016

Hi @Raja Ray, I checked but HBase rolling upgrade won't help here either, because HMaster and RS both use this 'hbase.rootdir' in the runtime and only changing part of them would cause data inconsistencies. So my suggestion would be create a smaller temporary hbase cluster to handle all the production requests and do a quick restart on the main hbase cluster. Modifying 'hbase.rootdir' really needs downtime. Hope that will help. Thanks, Victor

vxu · ‎08-22-2016

In other words, there's no 'hot switch' for this 'hbase.rootdir' parameter. If you want to change it, you have to restart hbase to make it work.

vxu · ‎08-22-2016

Ok, I understand. But even if you just want to change hdfs root directory for a running hbase cluster, you'll need a restart to make it work. Do you mean you've already change the root path to '/apps/hbase/data2' before starting your current hbase cluster?

vxu · ‎08-22-2016

Hi @Raja Ray, 1. Which version of hbase are you using? 2. When performing my steps, is there any specific error log that you can share with me? 3. Could you elaborate on your use case? Thanks, Victor

vxu · ‎08-22-2016

Hi @Raja Ray, here are the steps for recover Hfiles in another hdfs directory: 1. Shutdown the hbase with old hdfs path. 2. Change 'hbase.rootdir' to new path and restart hbase. 3. Create table 'CUTOFF2', so that new htable structure will be created in new hdfs path, and of course, it's empty. 4. Use distcp to copy hfile(s) from old path to new path in case the hfile(s) are very huge. 5. Do a 'hbase hbck' on the new hbase, and there should be something wrong with the 'CUTOFF2'. 6. Do a 'hbase hbck -repair' on the problematic table and it will finalize the recovery. 7. Done

vxu · ‎08-10-2016

Hi @pan bocun, I guess you want to start REST for hbase: Use one of the following commands to start the REST server in the foreground or background. The port is optional, and defaults to 8080. # Foreground $ bin/hbase rest start -p <port> # Background, logging to a file in $HBASE_LOGS_DIR $ bin/hbase-daemon.sh start rest -p <port> Reference: http://hbase.apache.org/book.html#_rest

vxu · ‎07-29-2016

Hi @Sunile Manjee, all the configurations of AMS hbase are in /etc/ams-hbase/conf/ on ambari metrics collector node. You can also view and change that in Ambari Metrics -> Configs -> 'Advanced ams-hbase-env' / 'Advanced ams-hbase-site'. The best way to check AMS metadata info is to execute the following line on ambari metrics collector node: # hbase --config /etc/ams-hbase/conf/ shell Then use 'list', 'desc' or other useful commands to view the metadata you need.

Online	Offline
Last Visited	‎12-07-2017 08:18 AM

Member Since	‎02-21-2016 10:18 PM
Last Visited	‎12-07-2017 08:18 AM
Posts	30
Kudos received	25

Cloudera Community

Re: Is there a limit to the number of snapshots I ...

Re: how to recover hbase using hdfs data directory

Re: cant start hbase reset service in hbase shell

Re: Ambari Metrics Collector DB Metadata

Flume with Secured Kafka Channel in HDP 2.5

Re: Is there a limit to the number of snapshots I ...

Re: how to recover hbase using hdfs data directory

Re: how to recover hbase using hdfs data directory

Re: how to recover hbase using hdfs data directory

Re: how to recover hbase using hdfs data directory

Re: how to recover hbase using hdfs data directory

Re: how to recover hbase using hdfs data directory

Re: cant start hbase reset service in hbase shell

Re: Ambari Metrics Collector DB Metadata