Member since
02-21-2016
30
Posts
26
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
867 | 08-24-2016 09:30 PM | |
4029 | 08-22-2016 06:13 AM | |
1084 | 08-10-2016 09:45 AM | |
1159 | 07-29-2016 05:14 PM |
06-01-2017
04:57 PM
Hi Craig, this is indeed a useful tool. Thanks! AFAIK, HDFS snapshots could increase the small files. Have you taken care of snapshots in your script or have they already been ruled out during the FSImage->TSV phase?
... View more
02-28-2017
02:25 PM
1 Kudo
Caveat: This feature has been validated by HWX engineer manually but we don't officially support it at the moment. Environment:
HDP-2.5.3.0-37 Ambari-2.4.2.0-136 JDK1.8 Kerberos enabled Ranger enabled Due to security limitations, we can only launch flume agent processes in Ambari. STEP 1: Create/modify flume configuration file. Ambari -> Flume -> Configs -> flume.conf # Flume agent config
#### Global ####
demo.sources = logtcp logudp
demo.channels = kafka_channel
demo.sinks = sink
#### Sources ####
demo.sources.logtcp.type = multiport_syslogtcp
demo.sources.logtcp.ports = 9515
demo.sources.logtcp.host = 0.0.0.0
demo.sources.logtcp.keepFields = true
demo.sources.logtcp.selector.type=replicating
demo.sources.logtcp.channels= kafka_channel
demo.sources.logudp.type = syslogudp
demo.sources.logudp.port = 9515
demo.sources.logudp.host = 0.0.0.0
demo.sources.logudp.keepFields = true
demo.sources.logudp.selector.type=replicating
demo.sources.logudp.channels = kafka_channel
#### Sinks ####
demo.sinks.sink.type = logger
demo.sinks.sink.channel = kafka_channel
#### Channels ####
demo.channels.kafka_channel.type = org.apache.flume.channel.kafka.KafkaChannel
demo.channels.kafka_channel.kafka.bootstrap.servers = node1.vxu.com:6667,node2.vxu.com:6667,node3.vxu.com:6667
demo.channels.kafka_channel.kafka.topic = flume_topic
demo.channels.kafka_channel.kafka.producer.security.protocol = SASL_PLAINTEXT
demo.channels.kafka_channel.kafka.producer.sasl.mechanism = GSSAPI
demo.channels.kafka_channel.kafka.consumer.security.protocol = SASL_PLAINTEXT
demo.channels.kafka_channel.kafka.consumer.sasl.mechanism = GSSAPI STEP 2: Add kafka jaas file(s) Create a flume_kafka_jaas.conf in /etc/flume/conf/: KafkaClient {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
storeKey=true
serviceName="kafka"
keyTab="/etc/security/keytabs/kafka.service.keytab"
principal="kafka/node1.vxu.com@VXU.COM";
};
STEP 3: Modify flume-env template Ambari -> Flume -> Configs -> Advanced flume-env -> flume-env template ...
# Enviroment variables can be set here.
export JAVA_HOME={{java_home}}
# Give Flume more memory and pre-allocate, enable remote monitoring via JMX
export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote -Dflume.monitoring.type=http -Dflume.monitoring.port=34545 -Djava.security.auth.login.config=/etc/flume/conf/flume_kafka_jaas.conf"
# Note that the Flume conf directory is always included in the classpath.
# Add flume sink to classpath
if [ -e "/usr/lib/flume/lib/ambari-metrics-flume-sink.jar" ]; then
export FLUME_CLASSPATH=$FLUME_CLASSPATH:/usr/lib/flume/lib/ambari-metrics-flume-sink.jar
fi
export HIVE_HOME={{flume_hive_home}}
export HCAT_HOME={{flume_hcat_home}} Note: After changing the flume configs, you need to clear the /etc/flume/conf/demo directory and kill all previous flume agent processes. Otherwise, new configs may not take effect.
... View more
- Find more articles tagged with:
- Data Ingestion & Streaming
- Flume
- How-ToTutorial
- Kafka
- Kerberos
Labels:
08-24-2016
09:30 PM
3 Kudos
@Brandon Wilson mqureshi's explanation is correct, technically, you can have unlimited number of snapshots in hbase, but it will put much pressure on hdfs. It would not only just occupy some disk space, but it would create a huge amount of hfiles that might slow down the NameNode. Let's assume that you have a 10-CF HTable with 50k regions, each CF has 5 hfiles in average, which means you would have totally 2.5million hfiles for this single table. The first time you create a snapshot, all 2.5m hfiles will be referenced. When you do another snapshot in the next day(after some routine compactions, of course), another 2 or more million new hfiles will probably be referenced. Remember: old hfiles would not be removed until the snapshot is removed. In this case, you will get more than 15 million referenced hfiles after a week, which would be a really bad news for namenode.
... View more
08-23-2016
09:34 AM
2 Kudos
Hi @Raja Ray, To answer your questions: 1. If I put data in temporary hbase cluster during main hbase cluster downtime, then how I will merge data from temporary cluster to main cluster when main cluster will be up and running. If there are only Put operations during the main cluster downtime, you can use CopyTable tool or Export& Bulkload tool to migrate data from temporary cluster back to main cluster after it's up. But if there are both Put and Delete operations during the main cluster downtime, the best way to migrate data is to set up hbase replication from temporary cluster to main cluster. This will read all WALs(Write-ahead-log) and replay both Puts and Deletes on the main cluster after it's up. 2. When I am restoring data from hdfs hfile location to new location, then how I will recover memstore data. Memstore is a place in RS to keep incoming data. It will start growing when new write operations are coming. If you mean the blockcache of the hfile, that will be reload into memory when new read operations are coming. 3. If I shutdown restart hbase service, is memstore data being flushed to hdfs hfile that time? Yes, memstore would be forced to flush to hfile before RS is shutdown. Make sure hdfs path '/apps/hbase/data/WALs/' is empty after hbase being shutdown, so that all memstore data has been flushed into hfiles. Thanks, Victor
... View more
08-22-2016
03:46 PM
Hi @Raja Ray, I checked but HBase rolling upgrade won't help here either, because HMaster and RS both use this 'hbase.rootdir' in the runtime and only changing part of them would cause data inconsistencies. So my suggestion would be create a smaller temporary hbase cluster to handle all the production requests and do a quick restart on the main hbase cluster. Modifying 'hbase.rootdir' really needs downtime. Hope that will help. Thanks, Victor
... View more
08-22-2016
03:31 PM
In other words, there's no 'hot switch' for this 'hbase.rootdir' parameter. If you want to change it, you have to restart hbase to make it work.
... View more
08-22-2016
03:27 PM
Ok, I understand. But even if you just want to change hdfs root directory for a running hbase cluster, you'll need a restart to make it work. Do you mean you've already change the root path to '/apps/hbase/data2' before starting your current hbase cluster?
... View more
08-22-2016
03:10 PM
Hi @Raja Ray, 1. Which version of hbase are you using? 2. When performing my steps, is there any specific error log that you can share with me? 3. Could you elaborate on your use case? Thanks, Victor
... View more
08-22-2016
06:13 AM
2 Kudos
Hi @Raja Ray, here are the steps for recover Hfiles in another hdfs directory: 1. Shutdown the hbase with old hdfs path. 2. Change 'hbase.rootdir' to new path and restart hbase. 3. Create table 'CUTOFF2', so that new htable structure will be created in new hdfs path, and of course, it's empty. 4. Use distcp to copy hfile(s) from old path to new path in case the hfile(s) are very huge. 5. Do a 'hbase hbck' on the new hbase, and there should be something wrong with the 'CUTOFF2'. 6. Do a 'hbase hbck -repair' on the problematic table and it will finalize the recovery. 7. Done
... View more
08-10-2016
09:45 AM
2 Kudos
Hi @pan bocun, I guess you want to start REST for hbase: Use one of the following commands to start the REST server in the foreground or background. The port is optional, and defaults to 8080.
# Foreground
$ bin/hbase rest start -p <port>
# Background, logging to a file in $HBASE_LOGS_DIR
$ bin/hbase-daemon.sh start rest -p <port>
Reference: http://hbase.apache.org/book.html#_rest
... View more
07-29-2016
05:14 PM
Hi @Sunile Manjee, all the configurations of AMS hbase are in /etc/ams-hbase/conf/ on ambari metrics collector node. You can also view and change that in Ambari Metrics -> Configs -> 'Advanced ams-hbase-env' / 'Advanced ams-hbase-site'. The best way to check AMS metadata info is to execute the following line on ambari metrics collector node: # hbase --config /etc/ams-hbase/conf/ shell Then use 'list', 'desc' or other useful commands to view the metadata you need.
... View more
07-29-2016
09:54 AM
3 Kudos
Hi @Ashnee Sharma, I don't think there is a simple command to backup all tables at a time. To backup the whole hbase cluster, you need to shutdown it first and use distcp tool to backup all hbase data on HDFS. To backup single table, you could use Replication, CopyTable or Export. You can find the details here: http://hbase.apache.org/book.html#ops.backup My suggestion is that you can write a small script to backup all the tables one by one in a live cluster.
... View more
05-17-2016
01:06 PM
5 Kudos
QUESTION: How are the Ambari-2.2.1.1 local accounts protected, such as "admin"? What about all the various component's configuration data managed by Ambari-2.2.1.1? ANSWER: Ambari local account credentials These are stored in the Ambari database as the SHA256 hash of the (randomly salted) password. Service configuration password properties These are stored in the Ambari database in blobs of JSON-formatted data in plaintext. When returned via API calls, the properties marked as passwords are masked and not displayed as plaintext. When sent to the agents, they are stored in plaintext in the command.json files stored in /var/lib/ambari-agent/data (readable only by root and the user that executes ambari-agent). Ambari-specific database and ldap credentials These are stored in plaintext in the ambari.properities file by default but can be encrypted via ambari-server setup-security. If encrypted, they are stored in a Java Keystore implementation (JCEKS) which uses 3DES in CBC mode with PKCS #5 padding to encrypt its keys. The master key for this keystore is either stored in plaintext on the Ambari server host, or query for when Ambari is started.
... View more
- Find more articles tagged with:
- Ambari
- configuration
- FAQ
- Security
Labels:
04-20-2016
06:18 PM
In the bash command line: $ export HBASE_CLASSPATH=${HBASE_CLASSPATH}:<path_to_jar_compiled_using_your_code> $ ${HBASE_HOME}/bin/hbase <package_name_of_your_class>.HbasePut
... View more
04-19-2016
11:51 AM
If you've installed HBase client on your test environment, you could use this command to test your code: export HBASE_CLASSPATH=$HBASE_CLASSPATH:<jar_compiled_using_your_code> $HBASE_HOME/bin/hbase <package_name>.HbasePut
... View more
04-13-2016
01:04 PM
Hi, Nilesh. Have you checked the yarn scheduler? Is the default queue out of resource?
... View more
04-07-2016
10:34 AM
2 Kudos
There is no specific 'removeStoragePolicy' command. So what you want to do, I guess, is to get it back to the default value, right? The workaround is you could use '-setStoragePolicy' to set it to 'HOT'(default value), or you could use '-getStoragePolicy' to check the policy of the parent directory first and then use '-setStoragePolicy' to set it.
... View more
04-04-2016
10:12 AM
Your hmaster seems to be initialized very slowly. Is there any exceptions or errors in the hmaster's logs?
... View more
04-01-2016
03:56 PM
Hi, could you provide more details like the slow regionserver's log and the hmaster's log? By the way, is there any RIT regions in hmaster's UI page?
... View more