Short description: In this article I am going to create a simple Producer to publish messages(tweets) to a kafka topic. Additionally I'm also creating a simple Consumer that subscribes to the kafka topic and reads the messages Create the kafka topic: ./ --create --topic 'kafka-tweets' --partitions 3 --replication-factor 3 --zookeeper <zookeeper node:zk port> Install necessary packages in your python project venv: pip install kafka-python twython Producer: def main():
# Load credentials from json file
with open("twitter_credentials.json", "r") as file:
creds = json.load(file)
# Instantiate
python_tweets = Twython(creds['CONSUMER_KEY'], creds['CONSUMER_SECRET'])
# search query
query = {'q': 'cloudera', 'result_type': 'mixed', 'count': 100}
#result is a python dict of tweets
result =**query)['statuses']
injest_data(result) To get access to twitter API I need to use my credentials which are stored in "twitter_credentials.json". I then use twython to search for 100 tweets that contain word "cloudera" The result is a python dict, that will be the input of injest_data() were I will be connecting to kafka and then send messages to topic "kafka-tweets" def injest_data(list):
#serialize dict to string via json and encode to bytes via utf-8
p = KafkaProducer(bootstrap_servers='<kafka-broker>:6667', acks='all',value_serializer=lambda m: json.dumps(m).encode('utf-8'), batch_size=1024)
for item in list:
p.send('kafka-tweets', value=item)
p.close() Consumer: def consume():
# To consume latest messages and auto-commit offsets and also decode from raw bytes to utf-8
consumer = KafkaConsumer('kafka-tweets',
bootstrap_servers=['<kafka-broker>:6667'],value_deserializer=lambda m: json.loads(m.decode('utf-8')),consumer_timeout_ms=10000)
for message in consumer:
# message value and key are raw bytes -- need to decode
print ("%s:%d:%d: key=%s value=%s" % (message.topic, message.partition,
message.offset, message.key,
consumer.close() We are subscribing to "kafka-tweets" topic and then reading the messages Output (1 message): tweets:0:484: key=None value={u'contributors': None, u'truncated': True, u'text': u'Urgent Requirement for an Infrastructure & Platform Engineer to work with one of our top financial clients!!\nApply\u2026', u'is_quote_status': False, u'in_reply_to_status_id': None, u'id': 1124041974875664390, u'favorite_count': 1, u'source': u'<a href="" rel="nofollow">Twitter Web Client</a>', u'retweeted': False, u'coordinates': None, u'entities': {u'symbols': [], u'user_mentions': [], u'hashtags': [], u'urls': [{u'url': u'', u'indices': [120, 143], u'expanded_url': u'', u'display_url': u'\u2026'}]}, u'in_reply_to_screen_name': None, u'id_str': u'1124041974875664390', u'retweet_count': 5, u'in_reply_to_user_id': None, u'favorited': False, u'user': {u'follow_request_sent': None, u'has_extended_profile': False, u'profile_use_background_image': False, u'time_zone': None, u'id': 89827370, u'default_profile': False, u'verified': False, u'profile_text_color': u'000000', u'profile_image_url_https': u'', u'profile_sidebar_fill_color': u'000000', u'is_translator': False, u'geo_enabled': True, u'entities': {u'url': {u'urls': [{u'url': u'', u'indices': [0, 22], u'expanded_url': u'', u'display_url': u''}]}, u'description': {u'urls': []}}, u'followers_count': 82, u'protected': False, u'id_str': u'89827370', u'default_profile_image': False, u'listed_count': 8, u'lang': u'en', u'utc_offset': None, u'statuses_count': 2508, u'description': u'Beachhead is a Premier IT recruiting firm based in Toronto, Canada. Follow for exciting opportunities in Financial, Retail and Telecommunication sector.\U0001f600', u'friends_count': 59, u'profile_link_color': u'0570B3', u'profile_image_url': u'', u'notifications': None, u'profile_background_image_url_https': u'', u'profile_background_color': u'000000', u'profile_banner_url': u'', u'profile_background_image_url': u'', u'name': u'BeachHead', u'is_translation_enabled': False, u'profile_background_tile': False, u'favourites_count': 19, u'screen_name': u'BeachHeadINC', u'url': u'', u'created_at': u'Sat Nov 14 00:02:15 +0000 2009', u'contributors_enabled': False, u'location': u'Toronto, Canada', u'profile_sidebar_border_color': u'000000', u'translator_type': u'none', u'following': None}, u'geo': None, u'in_reply_to_user_id_str': None, u'possibly_sensitive': False, u'lang': u'en', u'created_at': u'Thu May 02 20:04:25 +0000 2019', u'in_reply_to_status_id_str': None, u'place': None, u'metadata': {u'iso_language_code': u'en', u'result_type': u'recent'}} Code available in:
Steps on how to setup YARN to run docker containers can be found in Part 1: article In this article I will show how to run Hive components (Hiverserver2, Metastore) as docker containers in YARN. Metastore will be using a Mysql 8 database also running as a docker container in a local host. Pre-requisites: 1. Pull mysql-server image from Docker hub, run image as a docker container, create hive database and permissions for hive user : docker pull mysql/mysql-server
docker run -d -p 3306:3306 -e MYSQL_ROOT_PASSWORD=admin --restart=always --name mysqld mysql/mysql-server
docker exec -it mysqld bash
bash-4.2# mysql -u root --password=admin
mysql> CREATE DATABASE hive;
mysql> CREATE USER 'hive' IDENTIFIED BY 'hive';
mysql> GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'%' WITH GRANT OPTION; 2. Create user 'hive' and assign to 'hadoop' group: useradd hive
usermod -aG hadoop hive Dockerize Hive: 1. Create a yum repo file "hdp.repo" that contains HDP- and HDP-UTILS- repositories: [HDP-] name=HDP Version - HDP- baseurl= gpgcheck=1 gpgkey= enabled=1 priority=1 [HDP-UTILS-] name=HDP-UTILS Version - HDP-UTILS- baseurl= gpgcheck=1 gpgkey= enabled=1 priority=1 2. Create the Dockerfile: FROM centos:7
ENV JAVA_HOME /usr/lib/jvm/jre-1.8.0-openjdk
COPY hdp.repo /etc/yum.repos.d/
COPY mysql-connector-java-8.0.14-1.el7.noarch.rpm /root/
RUN yum updateinfo \
&& yum install -y sudo java-1.8.0-openjdk-devel hadoop-yarn hadoop-mapreduce hive hive-metastore tez \
&& yum clean all
RUN yum localinstall -y /root/mysql-connector-java-8.0.14-1.el7.noarch.rpm
RUN cp /usr/share/java/mysql-connector-java-8.0.14.jar /usr/hdp/ Note: For the metastore to connect to our Mysql database we need a JDBC connector. I've downloaded it from here and copied the .rpm file to the same directory as my Dockerfile, so it is installed on the image. 3. Build the image: docker build -t hive . 4. Tag the image and push it to the docker local registry: Tag the image as “<docker registry server>:5000/hive_local”. This creates an additional tag for the existing image. When the first part of the tag is a hostname and port, Docker interprets this as the location of a registry. docker tag hive <docker registry server>:5000/hive_local docker push <docker registry server>:5000/hive_local Now that our hive image is created we will create a Yarn Service configuration file (Yarnfile) with all the details of our service. Deployment: 1. Copy core-site.xml, hdfs-site.xml and yarn-site.xml to hive user dir in HDFS: su - hive hdfs dfs -copyFromLocal /etc/hadoop/conf/core-site.xml . hdfs dfs -copyFromLocal /etc/hadoop/conf/hdfs-site.xml . hdfs dfs -copyFromLocal /etc/hadoop/conf/yarn-site.xml . 2. Create YarnFile (hive.json): {
"name": "hive",
"lifetime": "-1",
"version": "",
"artifact": {
"id": "<docker registry server>:5000/hive2",
"type": "DOCKER"
"configuration": {
"env": {
"HIVE_LOG_DIR": "var/log/hive",
"YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS": "/etc/passwd:/etc/passwd:ro,/etc/group:/etc/group:ro",
"HADOOP_HOME": "/usr/hdp/"
"properties": {
"": "host"
"files": [
"type": "TEMPLATE",
"dest_file": "/etc/hadoop/conf/core-site.xml",
"src_file": "core-site.xml"
"type": "TEMPLATE",
"dest_file": "/etc/hadoop/conf/yarn-site.xml",
"src_file": "yarn-site.xml"
"type": "TEMPLATE",
"dest_file": "/etc/hadoop/conf/hdfs-site.xml",
"src_file": "hdfs-site.xml"
"type": "XML",
"dest_file": "/etc/hive/conf/hive-site.xml",
"properties": {
"hive.zookeeper.quorum": "${CLUSTER_ZK_QUORUM}",
"hive.zookeeper.namespace": "hiveserver2",
"hive.server2.zookeeper.publish.configs": "true",
"": "true",
"": "true",
"hive.metastore.warehouse.dir": "/user/${USER}/warehouse",
"javax.jdo.option.ConnectionUserName": "hive",
"javax.jdo.option.ConnectionPassword": "hive",
"hive.server2.enable.doAs": "false",
"hive.metastore.schema.verification": "true",
"hive.metastore.db.type": "MYSQL",
"javax.jdo.option.ConnectionDriverName": "com.mysql.jdbc.Driver",
"javax.jdo.option.ConnectionURL": "jdbc:mysql://<mysql-server docker host>:3306/hive?createDatabaseIfNotExist=true",
"hive.metastore.event.db.notification.api.auth" : "false",
"hive.metastore.uris": "thrift://hivemetastore-0.${SERVICE_NAME}.${USER}.${DOMAIN}:9083"
"components": [
"name": "hiveserver2",
"number_of_containers": 1,
"launch_command": "sleep 25; /usr/hdp/current/hive-server2/bin/hiveserver2",
"resource": {
"cpus": 1,
"memory": "1024"
"configuration": {
"files": [
"type": "XML",
"dest_file": "/etc/hive/conf/hive-site.xml",
"properties": {
"hive.server2.thrift.port": "10000",
"hive.server2.thrift.http.port": "10001"
"env": {
"HADOOP_OPTS": "-Xmx1024m -Xms512m"
"name": "hivemetastore",
"number_of_containers": 1,
"launch_command": "sleep 5;/usr/hdp/current/hive-metastore/bin/schematool -initSchema -dbType mysql;/usr/hdp/current/hive-metastore/bin/hive --service metastore",
"resource": {
"cpus": 1,
"memory": "1024"
"configuration": {
"files": [
"type": "XML",
"dest_file": "/etc/hive/conf/hive-site.xml",
"properties": {
"hive.metastore.uris": "thrift://${COMPONENT_INSTANCE_NAME}.${SERVICE_NAME}.${USER}.${DOMAIN}:9083"
"env": {
"HADOOP_OPTS": "-Xmx1024m -Xms512m"
} 3. Deploy application using the YARN Services API: yarn app launch -hive hive.json Test access to Hive: The Registry DNS service that runs on the cluster listens for inbound DNS requests. Those requests are standard DNS requests from users or other DNS servers (for example, DNS servers that have the RegistryDNS service configured as a forwarder) If we have this setup, we can connect via beeline to "${COMPONENT_INSTANCE_NAME}.${SERVICE_NAME}.${USER}.${DOMAIN}" hostname as long as our client is using the corporate DNS server. More details here Because in this test we don't have this configured we need to manually find were the hiveserver2 docker container is running with: curl -X GET 'http://<RM-host>:8088/app/v1/services/hive?' | python -m json.tool On the containers information for component hiveserver2-0 we will find: "containers": [
"bare_host": "<hiveserver2-0 host hostname>", On the host were the container is running connect to hive via beeline: su - hive
beeline -u "jdbc:hive2://<hostname -I>:10000/default" References: Files are also available in the following GitHub repo:
Pre-requisites: 1. Install Docker in all nodemanager hosts and configure: It is recommended to install the version of Docker that is provided by your Operating System vendor. The Docker package has been known by several names; docker-engine, docker, and docker-ce. yum install docker If having issues installing docker, following document can be followed: Edit ‘/etc/docker/daemon.json’ and add the following options: {
"live-restore" : true,
"debug" : true,
"dns": ["<YARN registry dns ip addr>"]
} If not using HTTPS, Configure each of the cluster hosts to skip HTTPS checks by adding following line in ‘/etc/docker/daemon.json’ "insecure-registries": ["<docker registry server>:5000"] 2. Create a private local docker registry(Optional) 2.1- Designate a server in the cluster for use by the Docker registry. Minimal resources are required, but sufficient disk space is needed to store the images and metadata. Docker must be installed and running. 2.2 Start the registry docker run -d -p 5000:5000 --restart=always --name registry registry:2 2.3 (Optional)By default, data will only be persisted within the container. If you would like to persist the data on the host, you can customize the bind mounts using the -v option: docker run -d -p 5000:5000 --restart=always -v /host_registry_path:/var/lib/registry --name registry registry:2 3. Configure YARN to run Docker containers: Our cluster is now ready to run Dockerized applications. Dockerize HBASE: 1. Create a yum repo file "hdp.repo" that contains HDP- and HDP-UTILS- repositories: [HDP-]
name=HDP Version - HDP-
name=HDP-UTILS Version - HDP-UTILS-
priority=1 2. Create the Dockerfile: FROM centos:7
ENV JAVA_HOME /usr/lib/jvm/jre-1.8.0-openjdk
COPY hdp.repo /etc/yum.repos.d/
RUN yum updateinfo && yum install -y sudo java-1.8.0-openjdk-devel hbase phoenix hadoop-yarn hadoop-mapreduce && yum clean all 3. Build the image: docker build -t hbase . 4. Tag the image and push it to the docker local registry: Tag the image as “<docker registry server>:5000/hbase_local”. This creates an additional tag for the existing image. When the first part of the tag is a hostname and port, Docker interprets this as the location of a registry docker tag hbase <docker registry server>:5000/hbase_local
docker push <docker registry server>:5000/hbase_local Now that our hbase image is created we will create a Yarn Service configuration file (Yarnfile) with all the details of our service Deployment: 1. Copy core-site.xml, hdfs-site.xml to user dir in HDFS: su - ambari-qa
hdfs dfs -copyFromLocal /etc/hadoop/conf/core-site.xml .
hdfs dfs -copyFromLocal /etc/hadoop/conf/hdfs-site.xml . 2. Create YarnFile (hbase.json): {
"name": "hbase",
"lifetime": "10800",
"version": "",
"artifact": {
"id": "<docker registry server>:5000/hbase_local",
"type": "DOCKER"
"configuration": {
"env": {
"HBASE_LOG_DIR": "var/log/hbase",
"HADOOP_HOME": "/usr/hdp/"
"properties": {
"": "host"
"files": [
"type": "TEMPLATE",
"dest_file": "/etc/hadoop/conf/core-site.xml",
"src_file": "core-site.xml"
"type": "TEMPLATE",
"dest_file": "/etc/hadoop/conf/hdfs-site.xml",
"src_file": "hdfs-site.xml"
"type": "XML",
"dest_file": "/etc/hbase/conf/hbase-site.xml",
"properties": {
"hbase.cluster.distributed": "true",
"hbase.zookeeper.quorum": "${CLUSTER_ZK_QUORUM}",
"hbase.rootdir": "${SERVICE_HDFS_DIR}/hbase",
"zookeeper.znode.parent": "${SERVICE_ZK_PATH}",
"hbase.master.hostname": "hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}",
"": "16010"
"components": [
"name": "hbasemaster",
"number_of_containers": 1,
"launch_command": "sleep 15; /usr/hdp/current/hbase-master/bin/hbase master start",
"resource": {
"cpus": 1,
"memory": "1024"
"readiness_check": {
"type": "HTTP",
"properties": {
"url": "http://${THIS_HOST}:16010/master-status"
"configuration": {
"env": {
"HBASE_MASTER_OPTS": "-Xmx1024m -Xms512m"
"name": "regionserver",
"number_of_containers": 3,
"launch_command": "sleep 15; /usr/hdp/current/hbase-regionserver/bin/hbase regionserver start",
"resource": {
"cpus": 1,
"memory": "512"
"configuration": {
"files": [
"type": "XML",
"dest_file": "/etc/hbase/conf/hbase-site.xml",
"properties": {
"hbase.cluster.distributed": "true",
"hbase.zookeeper.quorum": "${CLUSTER_ZK_QUORUM}",
"hbase.rootdir": "${SERVICE_HDFS_DIR}/hbase",
"zookeeper.znode.parent": "${SERVICE_ZK_PATH}",
"hbase.master.hostname": "hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}",
"": "16010",
"": "16020",
"hbase.regionserver.port": "16030",
"hbase.regionserver.hostname": "${COMPONENT_INSTANCE_NAME}.${SERVICE_NAME}.${USER}.${DOMAIN}"
"env": {
"HBASE_REGIONSERVER_OPTS": "-XX:CMSInitiatingOccupancyFraction=70 -Xmx512m -Xms256m"
"name": "hbaseclient",
"number_of_containers": 1,
"launch_command": "sleep infinity",
"resource": {
"cpus": 1,
"memory": "512"
"quicklinks": {
"HBase Master Status UI": "http://hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}:16010/master-status"
} 3. Deploy application using the YARN Services API: yarn app -launch hbase hbase.json 4. Go to "Services" in YARN RM UI and select hbase: We have 1 hbase master, 3 regionservers and 1 hbaseclient components running. 5. We can also use the Yarn Service REST API to get the state of our hbase service: curl -X GET 'http://<RM host>:8088/app/v1/services/hbase?' | python -m json.tool {
"artifact": {
"id": "<docker registry server>:5000/hbase_local",
"type": "DOCKER"
"components": [
"artifact": {
"id": "<docker registry server>:5000/hbase_local",
"type": "DOCKER"
"configuration": {
"env": {
"HADOOP_HOME": "/usr/hdp/",
"HBASE_LOG_DIR": "var/log/hbase",
"HBASE_MASTER_OPTS": "-Xmx1024m -Xms512m"
"files": [
"dest_file": "/etc/hbase/conf/hbase-site.xml",
"properties": {
"hbase.cluster.distributed": "true",
"hbase.master.hostname": "hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}",
"": "16010",
"hbase.rootdir": "${SERVICE_HDFS_DIR}/hbase",
"hbase.zookeeper.quorum": "${CLUSTER_ZK_QUORUM}",
"zookeeper.znode.parent": "${SERVICE_ZK_PATH}"
"type": "XML"
"dest_file": "/etc/hadoop/conf/core-site.xml",
"properties": {},
"src_file": "core-site.xml",
"type": "TEMPLATE"
"dest_file": "/etc/hadoop/conf/hdfs-site.xml",
"properties": {},
"src_file": "hdfs-site.xml",
"type": "TEMPLATE"
"properties": {
"": "host"
"containers": [
"bare_host": "",
"component_instance_name": "hbasemaster-0",
"hostname": "hbasemaster-0.hbase.ambari-qa.OPENSTACKLOCAL.COM",
"id": "container_e21_1553187523351_0006_01_000002",
"ip": "",
"launch_time": 1553266732354,
"state": "READY"
"dependencies": [],
"launch_command": "sleep 15; /usr/hdp/current/hbase-master/bin/hbase master start",
"name": "hbasemaster",
"number_of_containers": 1,
"quicklinks": [],
"readiness_check": {
"properties": {
"url": "http://${THIS_HOST}:16010/master-status"
"type": "HTTP"
"resource": {
"additional": {},
"cpus": 1,
"memory": "1024"
"restart_policy": "ALWAYS",
"run_privileged_container": false,
"state": "STABLE"
"artifact": {
"id": "pandrade-4:5000/hbase_local",
"type": "DOCKER"
"configuration": {
"env": {
"HADOOP_HOME": "/usr/hdp/",
"HBASE_LOG_DIR": "var/log/hbase",
"HBASE_REGIONSERVER_OPTS": "-XX:CMSInitiatingOccupancyFraction=70 -Xmx512m -Xms256m"
"files": [
"dest_file": "/etc/hbase/conf/hbase-site.xml",
"properties": {
"hbase.cluster.distributed": "true",
"hbase.master.hostname": "hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}",
"": "16010",
"hbase.regionserver.hostname": "${COMPONENT_INSTANCE_NAME}.${SERVICE_NAME}.${USER}.${DOMAIN}",
"": "16020",
"hbase.regionserver.port": "16030",
"hbase.rootdir": "${SERVICE_HDFS_DIR}/hbase",
"hbase.zookeeper.quorum": "${CLUSTER_ZK_QUORUM}",
"zookeeper.znode.parent": "${SERVICE_ZK_PATH}"
"type": "XML"
"dest_file": "/etc/hadoop/conf/core-site.xml",
"properties": {},
"src_file": "core-site.xml",
"type": "TEMPLATE"
"dest_file": "/etc/hadoop/conf/hdfs-site.xml",
"properties": {},
"src_file": "hdfs-site.xml",
"type": "TEMPLATE"
"properties": {
"": "host"
"containers": [
"bare_host": "",
"component_instance_name": "regionserver-2",
"hostname": "regionserver-2.hbase.ambari-qa.OPENSTACKLOCAL.COM",
"id": "container_e21_1553187523351_0006_01_000005",
"ip": "",
"launch_time": 1553266732359,
"state": "READY"
"bare_host": "",
"component_instance_name": "regionserver-0",
"hostname": "regionserver-0.hbase.ambari-qa.OPENSTACKLOCAL.COM",
"id": "container_e21_1553187523351_0006_01_000003",
"ip": "",
"launch_time": 1553266732358,
"state": "READY"
"bare_host": "",
"component_instance_name": "regionserver-1",
"hostname": "regionserver-1.hbase.ambari-qa.OPENSTACKLOCAL.COM",
"id": "container_e21_1553187523351_0006_01_000004",
"ip": "",
"launch_time": 1553266732358,
"state": "READY"
"dependencies": [],
"launch_command": "sleep 15; /usr/hdp/current/hbase-regionserver/bin/hbase regionserver start",
"name": "regionserver",
"number_of_containers": 3,
"quicklinks": [],
"resource": {
"additional": {},
"cpus": 1,
"memory": "512"
"restart_policy": "ALWAYS",
"run_privileged_container": false,
"state": "STABLE"
"artifact": {
"id": "pandrade-4:5000/hbase_local",
"type": "DOCKER"
"configuration": {
"env": {
"HADOOP_HOME": "/usr/hdp/",
"HBASE_LOG_DIR": "var/log/hbase"
"files": [
"dest_file": "/etc/hbase/conf/hbase-site.xml",
"properties": {
"hbase.cluster.distributed": "true",
"hbase.master.hostname": "hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}",
"": "16010",
"hbase.rootdir": "${SERVICE_HDFS_DIR}/hbase",
"hbase.zookeeper.quorum": "${CLUSTER_ZK_QUORUM}",
"zookeeper.znode.parent": "${SERVICE_ZK_PATH}"
"type": "XML"
"dest_file": "/etc/hadoop/conf/core-site.xml",
"properties": {},
"src_file": "core-site.xml",
"type": "TEMPLATE"
"dest_file": "/etc/hadoop/conf/hdfs-site.xml",
"properties": {},
"src_file": "hdfs-site.xml",
"type": "TEMPLATE"
"properties": {
"": "host"
"containers": [
"bare_host": "",
"component_instance_name": "hbaseclient-0",
"hostname": "hbaseclient-0.hbase.ambari-qa.OPENSTACKLOCAL.COM",
"id": "container_e21_1553187523351_0006_01_000006",
"ip": "",
"launch_time": 1553266732370,
"state": "READY"
"dependencies": [],
"launch_command": "sleep infinity",
"name": "hbaseclient",
"number_of_containers": 1,
"quicklinks": [],
"resource": {
"additional": {},
"cpus": 1,
"memory": "512"
"restart_policy": "ALWAYS",
"run_privileged_container": false,
"state": "STABLE"
"configuration": {
"env": {
"HADOOP_HOME": "/usr/hdp/",
"HBASE_LOG_DIR": "var/log/hbase"
"files": [
"dest_file": "/etc/hadoop/conf/core-site.xml",
"properties": {},
"src_file": "core-site.xml",
"type": "TEMPLATE"
"dest_file": "/etc/hadoop/conf/hdfs-site.xml",
"properties": {},
"src_file": "hdfs-site.xml",
"type": "TEMPLATE"
"dest_file": "/etc/hbase/conf/hbase-site.xml",
"properties": {
"hbase.cluster.distributed": "true",
"hbase.master.hostname": "hbasemaster-0.${SERVICE_NAME}.${USER}.${DOMAIN}",
"": "16010",
"hbase.rootdir": "${SERVICE_HDFS_DIR}/hbase",
"hbase.zookeeper.quorum": "${CLUSTER_ZK_QUORUM}",
"zookeeper.znode.parent": "${SERVICE_ZK_PATH}"
"type": "XML"
"properties": {
"": "host"
"id": "application_1553187523351_0006",
"kerberos_principal": {},
"lifetime": 10200,
"name": "hbase",
"quicklinks": {
"HBase Master Status UI": "http://hbasemaster-0.hbase.ambari-qa.OPENSTACKLOCAL.COM:16010/master-status"
"state": "STABLE",
"version": ""
} Our Hbase service is stable and all docker containers in Ready state Hbase master UI: Find in which host the hbasemaster container is running and access the UI <hbase master container host>:16010/master-status References: Files are also available in the following GitHub repo:
@Rambabu Chamakuri It seems that the permissions on the VERSION file are wrong, see below: /mnt/dn/sdl/datanode/current/VERSION (Permission denied) Check the permission on this VERSION file: ls -lh /mnt/dn/sdl/datanode/current/VERSION The file should be owned by "hdfs:hdfs" and permissions set to 644. If they are not then change accordingly: chown hdfs:hdfs /mnt/dn/sdl/datanode/current/VERSION
chmod 644 /mnt/dn/sdl/datanode/current/VERSION And restart the Datanode. Let me know if it help in solving your issue
@Michael Bronson the practice of using 2x memory for swap space is very old and out of date. It was usefull on a time when systems had as an example 256MB of ram and does not apply as of today. Using a swap space in hadoop nodes, worker or masters is not recommended because it will not prevent you from having issues even when the swap space memory is being used due to RAM hitting the threshold defined with the swapiness parameters. In the refered post: "If you have the need to use more memory, or expect to need more, than the amount of RAM which has been purchased. And can accept severe degradation in failure. In this case you would need a lot of swap configured. Your better off buying the right amount of memory." "The fear with disabling swap on masters is that an OOM (out of memory) event could affect cluster availability. But that will still happen even with swap configured, it just will take slightly longer. Good administrator/operator practices would be to monitor RAM availability, then fix any issues before running out of memory" If you really have a requirement to have it configured on your master nodes, then just set the swap as you like, example a 1/4 of total system memory and set the swappiness value to 0.
@Michael Bronson For worker/data nodes is not recommended to use swap, as you said "swap is very slow memory , so ambari cluster will be negative affected when major swap resource will be in used" . Please refer to this post for a very good explanation:
