Member since
10-04-2016
22
Posts
2
Kudos Received
0
Solutions
04-19-2019
05:35 AM
Configuring Hive (streaming) Hive Streaming API allows data to be pumped continuously into Hive. The incoming data can be continuously committed in small batches of records into an existing Hive partition or table. Once data is committed it becomes immediately visible to all Hive queries initiated subsequently. Streaming support is built on top of ACID based insert/update support in Hive Streaming Requirements The following settings are required in hive-site.xml to enable ACID support for streaming: hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager hive.compactor.initiator.on = true (See more important details here) hive.compactor.worker.threads > 0 “stored as orc” must be specified during table creation. Only ORC storage format is supported currently. tblproperties("transactional"="true") must be set on the table during creation. hive.support.concurrency=true The Hive table must be bucketed, but not sorted. So something like “clustered by (colName) into 10 buckets” must be specified during table creation. The number of buckets is ideally the same as the number of streaming writers. User of the client streaming process must have the necessary permissions to write to the table or partition and create partitions in the table Limitations Out of the box, currently, the streaming API only provides support for streaming delimited input data (such as CSV, tab separated, etc.) and JSON (strict syntax) formatted data. Support for other input formats can be provided by additional implementations of the RecordWriter interface. Currently only ORC is supported for the format of the destination table.Creating hive database and tables – I will be creating all tables from Nifi user, as this will be the user with which data will be ingested. Since I am using Ranger, hence I don’t expect permission issues related to it. Configuration Depending on the data we are capturing we will create our table in that way. For eg the data captured in the form “hostname,lat,long,year,month,day,hour,min,sec,temp,pressure” separated by comma, hence below will be the table description. 1. Create Hive database and table (change the location) and input the details in Hive3streaming processor. CREATE database sensor_data; CREATE TABLE `sensor_data_orc`( `hostname` string, `lat` float, `long` float, `year` int, `month` int, `day` int, `hour` int, `min` int, `second` int, `temp` float, `hum` float) CLUSTERED BY ( day) INTO 2 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' TBLPROPERTIES ( 'transactional'='true'); 2. Now at this point we are almost done, the only next thing is to start the Nifi Workflow, Minifi. If not, check service logs for any error or warnings. Once done, you will observe that data becomes visible on Nifi through the pipeline and eventually lands into HDFS via Hive. Now you can login to beeline and use Zeppelin to run sql queries on this table. Happy hadooping ! Links to series Part 1, Part 2, Part 3, Part 4, Part 5
... View more
Labels:
04-19-2019
05:16 AM
Configuring Minifi Now we will install Minifi, which once configured, will be setup on the node capturing data which raspberry pi 3 here. Download the Minifi tar file on node. All the downloads are available on https://nifi.apache.org/minifi/download.html. I have downloaded version 0.5 1. Download the tar file on the node and unzip. 1. wget http://apache.mirrors.hoobly.com/nifi/minifi/0.5.0/minifi-0.5.0-bin.tar.gz
2. tar -xzvf minifi-0.5.0-bin.tar.gz 2. Next, we will create the Minifi flow, which will be created on Nifi WebUI. Login to Nifi UI (the one you installed on HDP cluster) 3. On NiFi's web UI, place an Input Port "From raspberry" and connect it to a MergeContent processor. 4. On the same canvas, create a new process group, and double click on it to enter it. Working inside this new processor group will let us create a clean template that includes flow components meant to run in MiNiFi. 5. Inside the process group, create a Remote Process Group (RPG) and give it the URL of your NiFi instance. Also change the protocol to HTTP 6. Create a ExecuteProcess processor and connect it to the RPG, selecting the "From raspberry" input port created earlier. * The input port (from raspberry pi) mentioned here should be same as that configured in Nifi workflow. In case you see MiNiFi not able to send data to Nifi input port, try reconfiguring it again. 7. Configure the Execute process and enter the absolute location of script which will execute and capture the data. This script will be placed on raspberry pi or data capturing node. Critical steps: Now since we have created the workflow on UI, we need to convert it to minify readable for which we will use the below tool. 8. Now create a template by right clicking on the Process group and download it by clicking on upper right corner 9. On raspberry pi node download the Converter ToolKit. It can be downloaded using wget http://apache.claz.org/nifi/minifi/0.5.0/minifi-toolkit-0.5.0-bin.tar.gz 10. Copy the downloaded template to raspberry pi and run the below command from the tool ./bin/config.sh transform /path/to/template.xml /output/path/config.yml 11. Replace this processed file with config.yml file in minifi directory. (/usr/minifi-0.5.0/conf) 12. Now we will setup the script which will run and capture the data. Since in raspberry pi, the sensor data is fetched by python, I have used a wrapper shell script. You can your own bash script and provide the name with its absolute path in the process (refer to point 7) # cat /home/pi/get_environment.sh (my bash script which runs python script on raspberry) #Wrapper script# out=`python /home/pi/get_environment.py` echo $out|tr ' ' ',' # cat /home/pi/get_environment.py from sense_hat import SenseHat import datetime sense = SenseHat() sense.clear() hostname = "real_rasp" date = datetime.datetime.now() year=date.year month=date.month day=date.day hour=date.hour min=date.minute sec=date.second long=77.1025 lat=28.7041 pressure = '%.2f' % sense.get_pressure() temp = '%.2f' % sense.get_temperature() print hostname,lat,long,year,month,day,hour,min,sec,temp,pressure 13. Once it is done, it is time to start MiNiFi 14. Validation: Now go to Nifi UI and start the input port processor. If the data is reaching the input port, you will see data in the queue next to it. 15. Once you see data in queue, stop the input port for now. Sample yaml file attached. config.txt Links to series Part 1, Part 2, Part 3, Part 4, Part 5
... View more
Labels:
04-18-2019
06:11 PM
Creating NIFI workflow Use below steps to create the whole workflow. 1. Open the Nifi Web UI, download the below processors and configure as described. a. MergeContent b. InferAvroSchema c. PutHive3streaming 2. Now let’s configure them one by one. Right click each processor and configure as shown below. a. MergeContent b. InferAvroSchema – In the CSV header definition put all the headers of your data separated by delimiter. c. PutHive3Streaming – Here we need to configure the record reader. Right click on the processor > configure > Record reader > create new service. d. Add CSV reader and save. e. Click on the record reader again and then enable by clicking on middle sign on upper right-side corner. f. Connect all three processor and address the errors if any. For any configuration reference, you can try importing the attached workflow. I have attached my workflow in case you want to directly import into your Nifi. Rasp_new.xml Links to series Part 1, Part 2, Part 3, Part 4, Part 5
... View more
Labels:
04-18-2019
05:55 PM
NIFI Installation on HDP We will go through step by step instructions how to setup your own data pipeline. First to save time and resources we will install NiFi on HDP cluster using mpack. 1. Install and setup the HDP cluster. 2. Install the HDF mpack (versions may vary) on ambari server node and restart the ambari server. 1. wget http://public-repo-1.hortonworks.com/HDF/centos7/3.x/updates/3.2.0.0/tars/hdf_ambari_mp/hdf-ambari-mpack-3.2.0.0-520.tar.gz
2. cp -r /var/lib/ambari-server/resources /var/lib/ambari-server/resources.backup
3. ambari-server install-mpack --mpack=hdf-ambari-mpack-3.2.0.0-520.tar.gz --verbose
4. ambari-server restart 3. Once we restart the ambari-server, we will be now able to add Nifi service to the same HDP cluster. Two different clusters can be used for HDP and HDF, however we are using only one to reduce the complexity. 4. Select NiFi service 5. Resume the installation after typing in the Master key, encryption and Nifi CA token password 6. Once it is done, validate the Nifi installation by opening the WebUI. http://<hostname>:9090/nifi Links to series Part 1, Part 2, Part 3, Part 4, Part 5
... View more
Labels:
04-18-2019
05:49 PM
INTRODUCTION From Edge to AI: This tutorial is designed to help you walk through the process of creating a work flow to read data from a edge sensors (Raspberry Pi 3 here) and ingest in Hive through NiFi workflow. The MiNiFi flow will push data to a remote NiFi instance which then using Hive streaming features will ingest data to Hive tables. Once ingested, it can be analyzed with hive sql interface, Spark, zeppelin and by numerous other tools designed for Hadoop platform. In the whole process we will be going through the installation and configuration of below working components. NiFi - Apache NiFi commonly called Niagara Files, is an integrated data logistics platform for automating the movement of data between disparate systems. It provides real-time control that makes it easy to manage the movement of data between any source and any destination. It is data source agnostic, supporting disparate and distributed sources of differing formats, schemas, protocols, speeds and sizes such as machines, geo location devices, click streams, files, social feeds, log files and videos and more. More details here: Apache doc, Hortonworks doc MiNiFi - MiNiFI is a subproject of NiFi designed to solve the difficulties of managing and transmitting data feeds to and from the source of origin, often the first/last mile of digital signal, enabling edge intelligence to adjust flow behavior/bi-directional communication. Since the first mile of data collection (the far edge), is very distributed and likely involves a very large number of end devices (ie. IoT), MiNiFi carries over all the main capabilities of NiFi, with the exception of immediate command and control, creating a design and deploy paradigm that make uniform management of a vast number of devices more practical. It also means that MiNiFi has a much smaller footprint than NiFi, with a range less than 40MB, depending which option is selected – MiNiFi with the Java Agent, or a Native C++ agent. More details here: Apache doc, Hortonworks doc Hive with ACID support. More info here: Confluence, Hortonworks doc Edge sensors (Raspberry Pi 3 here) Raspberry pi 3, Sensor hat Links to series Part 1, Part 2, Part 3, Part 4, Part 5
... View more
Labels:
04-04-2019
11:48 PM
I am capturing the steps to install supplementary spark version on your HDP version. Installing the version not shipped by ambari is unsupported and not recommended, however they are some times customer need it for testing purposes. Please find the steps below Here are the steps :- 1. Create the spark user on all nodes. Add it to the hdfs group. useradd -G hdfs spark 2. Create conf and log directory mkdir -p /etc/spark2.4/conf
mkdir -p /var/log/spark2.4
chmod 755 /var/log/spark2.4/
chown spark:hadoop /var/log/spark2.4
mkdir -p /var/run/spark2.4
chown spark:hadoop /var/run/spark2.4
chmod 775 /var/run/spark2.4
mkdir /var/lib/spark2/
chown spark:spark /var/lib/spark2/ 3. Create a directory in /usr/hdp and cd to the dir mkdir -p /usr/hdp/<your hdp version>/spark2.4 (root user)
cd /usr/hdp/<your hdp version>/spark2.4 4. Download the tar file from location http://apache.claz.org/spark/spark-2.4.0/ wget http://apache.claz.org/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz 5. Extract spark tar file tar -xzvf spark-2.4.0-bin-hadoop2.7.tgz
mv spark-2.4.0-bin-hadoop2.7/* .
rm -rf spark-2.4.0-bin-hadoop2.7* [Clean up the directory] 6. Change ownership to root chown root:root /usr/hdp/3.1.0.0-78/spark2.4 (root user) 7. Modify the configuration files cd /usr/hdp/3.1.0.0-78/spark2.4/conf
cp log4j.properties.template log4j.propertie 7.1 Create spark-defaults.conf and add below lines to this file cp spark-defaults.conf.template spark-defaults.conf ========== spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
spark.eventLog.dir hdfs:///spark2-history/
spark.eventLog.enabled true
spark.executor.extraJavaOptions -XX:+UseNUMA
spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
spark.history.fs.cleaner.enabled true
spark.history.fs.cleaner.interval 7d
spark.history.fs.cleaner.maxAge 90d
spark.history.fs.logDirectory hdfs:///spark2-history/
spark.history.kerberos.keytab none
spark.history.kerberos.principal none
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.history.store.path /var/lib/spark2/shs_db
spark.history.ui.port 18081
spark.io.compression.lz4.blockSize 128kb
spark.master yarn
spark.shuffle.file.buffer 1m
spark.shuffle.io.backLog 8192
spark.shuffle.io.serverThreads 128
spark.shuffle.unsafe.file.output.buffer 5m
spark.sql.autoBroadcastJoinThreshold 26214400
spark.sql.hive.convertMetastoreOrc true
spark.sql.hive.metastore.jars /usr/hdp/current/spark2-client/standalone-metastore/*
spark.sql.hive.metastore.version 3.0
spark.sql.orc.filterPushdown true
spark.sql.orc.impl native
spark.sql.statistics.fallBackToHdfs true
spark.sql.warehouse.dir /apps/spark2.4/warehouse
spark.unsafe.sorter.spill.reader.buffer.size 1m
spark.yarn.historyServer.address <hostname of localhost>:18081 [Make sure 18081 port is not used by any other process]
spark.yarn.queue default ========== 7.2 Edit the spark-env.sh fil ========== #!/usr/bin/env bash
# This file is sourced when running various Spark programs.
# Copy it as spark-env.sh and edit that to configure Spark for your site.
# Options read in YARN client mode
#SPARK_EXECUTOR_INSTANCES="2" #Number of workers to start (Default: 2)
#SPARK_EXECUTOR_CORES="1" #Number of cores for the workers (Default: 1).
#SPARK_EXECUTOR_MEMORY="1G" #Memory per Worker (e.g. 1000M, 2G) (Default: 1G)
#SPARK_DRIVER_MEMORY="512M" #Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb)
#SPARK_YARN_APP_NAME="spark" #The name of your application (Default: Spark)
#SPARK_YARN_QUEUE="default" #The hadoop queue to use for allocation requests (Default: default)
#SPARK_YARN_DIST_FILES="" #Comma separated list of files to be distributed with the job.
#SPARK_YARN_DIST_ARCHIVES="" #Comma separated list of archives to be distributed with the job.
# Generic options for the daemons used in the standalone deploy mode
# Alternate conf dir. (Default: ${SPARK_HOME}/conf)
export SPARK_CONF_DIR=${SPARK_CONF_DIR:-/usr/hdp/<your-hadoop-version>/spark2/conf}
# Where log files are stored.(Default:${SPARK_HOME}/logs)
export SPARK_LOG_DIR=/var/log/spark2
# Where the pid file is stored. (Default: /tmp)
export SPARK_PID_DIR=/var/run/spark2
#Memory for Master, Worker and history server (default: 1024MB)
export SPARK_DAEMON_MEMORY=2048m
# A string representing this instance of spark.(Default: $USER)
SPARK_IDENT_STRING=$USER
# The scheduling priority for daemons. (Default: 0)
SPARK_NICENESS=0
export HADOOP_HOME=${HADOOP_HOME:-/usr/hdp/<your-hadoop-version>/hadoop}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/usr/hdp/<your-hadoop-version>/hadoop/conf}
# The java implementation to use.
export JAVA_HOME=/usr/jdk64/jdk1.8.0_112 [Replace it with your java version] ============ 8. Change the ownership of all the config file to spark:spark 9. Symlinks - Create below symlinks ln -s /usr/hdp/2.6.5.0-292/spark2/conf/ /etc/spark2
ln -s /etc/hive/conf/hive-site.xml /usr/hdp/2.6.5.0-292/spark2/conf/hive-site.xml [make sure hive client is installed on this node] 10. Create HDFS directory hadoop fs -mkdir /spark2-history
hadoop fs -chown spark:hadoop /spark2-history
hadoop fs -chmod -R 777 /spark2-history
hadoop fs -mkdir /apps/spark2.4/warehouse
hadoop fs -chown spark:spark /apps/spark2.4/warehouse
hadoop fs -mkdir /user/spark
hadoop fs -chown spark:spark /user/spark
hadoop fs -chmod -R 755 /user/spark Copy hadoop jars /usr/hdp/3.1.0.0-78/hadoop/lib/jersey-* /usr/hdp/3.1.0.0-78/spark2.4/jars/ Start Spark history service cd /usr/hdp/3.1.0.0-78/spark2.4/sbin/ 11. Run a sample spark job export SPARK_HOME=/usr/hdp/3.1.0.0-78/spark2.4/
spark-submit --deploy-mode cluster --class org.apache.spark.examples.SparkPi $SPARK_HOME/examples/jars/spark-examples_2.11-2.4.0.jar 10
... View more
Labels:
02-01-2019
07:26 PM
Hive Pre Upgrade tool command fails after not able to access HDFS
The command is not able to find the right configuration of hadoop core components, primarily due to unavailability of hadoop core component configuration files in the command.
Resolution steps
There are different ways to export the hadoop core configuration, one of them is below.
1. Follow the below HWX document to make sure you have properly followed the steps. (Check appropriate link for your version) https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.3.0/bk_ambari-upgrade-major/content/prepare_hive_for_upgrade.html
2. Follow till "Procedure for compacting Hive tables (no Kerberos)" step 3 and before running the pre-upgrade tool command, export hadoop and hive env files source /etc/hadoop/conf/hadoop-env.sh
source /etc/hive/conf/hive-env.sh
3. Now while running the pre-upgrade tool command, include two more paths in the command (/etc/hadoop/conf:/etc/hive/conf).
So the command below similar to below. java -Djavax.security.auth.useSubjectCredsOnly=false (optional) -cp /etc/hadoop/conf:/etc/hive/conf:/usr/hdp/$STACK_VERSION/hive/lib/derby-10.10.2.0.jar:
/usr/hdp/$STACK_VERSION/hive/lib/*:/usr/hdp/$STACK_VERSION/hadoop/*:/usr/hdp/$STACK_VERSION/hadoop/lib/*:/usr/hdp/$STACK_VERSION/hadoop-mapreduce/*:
/usr/hdp/$STACK_VERSION/hadoop-mapreduce/lib/*:/usr/hdp/$STACK_VERSION/hadoop-hdfs/*:/usr/hdp/$STACK_VERSION/hadoop-hdfs/lib/*:/usr/hdp/$STACK_VERSION/
hadoop/etc/hadoop/*:/tmp/hive-pre-upgrade-<your version>.jar:/usr/hdp/$STACK_VERSION/hive/conf/conf.server org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool
... View more
Labels:
11-15-2018
07:17 PM
While upgrading the cluster from HDP-2.6 to HDP-3.0, not able to generate ambari-infra-solr migration script due to below error . [root@hostname ambari-infra-solr-client]# /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py --ini-file $CONFIG_INI_LOCATION --host <ambari server> --port 8443 -s --cluster DEV --username rchaman --password rchaman --backup-base-path=/tmp/back/ --java-home /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-0.el7_5.x86_64/jre --ranger-hdfs-base-path=/ranger/audit
Start generating config file: ambari_solr_migration.ini ...
Get Ambari cluster details ...
Set JAVA_HOME: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-0.el7_5.x86_64/jre
Service detected: ZOOKEEPER
Zookeeper connection string: zk1.openstack:2181,zk2.openstack:2181,zk3.openstack:2181
Service detected: AMBARI_INFRA_SOLR
Infra Solr znode: /infra-solr_test
2018-11-14 14:38:59,104 - Get clusterstate.json: waiting for 5 seconds before retyring again (retry count: 1)
2018-11-14 14:39:04,169 - Get clusterstate.json: waiting for 5 seconds before retyring again (retry count: 2) Solution - This solution will work if you have below error in solr.log file. This issue occurs when the infra-solr is restarted manually.Basically the ambari infra service should not be stopped unless done by ambari itself.. When Ambari uploads the latest version of security.json to zookeeper, it will be incompatible with the old version of Infra-Solr.
Both the version use different classes. For example (look for authorization) Before upgrade {
"authentication": {
"class": "org.apache.solr.security.KerberosPlugin"
},
"authorization": {
"class": "org.apache.ambari.infra.security.InfraRuleBasedAuthorizationPlugin",
After Upgrade {
"authentication": {
"class": "org.apache.solr.security.KerberosPlugin"
},
"authorization": {
"class": "org.apache.solr.security.InfraRuleBasedAuthorizationPlugin",
Due to this mismatch you might see this error in solr.log (class not found: org.apache.solr.security.InfraRuleBasedAuthorizationPlugin ) Timestamp [main] WARN [ ] org.apache.solr.core.CoreContainer (CoreContainer.java:401) - Couldn't add files from /opt/ambari_infra_solr/data
/lib to classpath: /opt/ambari_infra_solr/data/lib
Timestamp [main] ERROR [ ] org.apache.solr.servlet.SolrDispatchFilter (SolrDispatchFilter.java:141) - Could not start Solr. Check solr/home
property and the logs
Timestamp [main] ERROR [ ] org.apache.solr.common.SolrException (SolrException.java:159) - null:org.apache.solr.common.SolrException: Error
loading class 'org.apache.solr.security.InfraRuleBasedAuthorizationPlugin'
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:558)
at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:627)
at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:592)
at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:585)
at org.apache.solr.core.CoreContainer.initializeAuthorizationPlugin(CoreContainer.java:245)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:420)
at org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:158) Step1: Locate security.json file on node running Ambari infra instance and out of these on two below mentioned files, replace the new class with old class /etc/ambari-infra-solr/conf/custom-security.json /var/lib/ambari-agent/cache/common-services/AMBARI_INFRA_SOLR/0.1.0/package/templates/infra-solr-security.json.j2 Replace org.apache.solr.security.InfraRuleBasedAuthorizationPlugin with org.apache.ambari.infra.security.InfraRuleBasedAuthorizationPlugin Restart Ambari-infra and try to rerun the command [root@rchaman ~]# export CONFIG_INI_LOCATION=ambari_solr_migration.ini
[root@rchaman ~]# /usr/bin/python /usr/lib/ambari-infra-solr-client/migrationConfigGenerator.py --ini-file $CONFIG_INI_LOCATION --host ambari-server --port 8443 -s --cluster DEV --username rchaman --password password --backup-base-path=/tmp/back/ --java-home /opt/jdk1.8.0_112/jre --ranger-hdfs-base-path=/ranger/audit
Start generating config file: ambari_solr_migration.ini ...
Get Ambari cluster details ...
Set JAVA_HOME: /opt/jdk1.8.0_112/jre
Service detected: ZOOKEEPER
Zookeeper connection string: zk1.openstack:2181,zk2.openstack:2181,zk3.openstack:2181
Service detected: AMBARI_INFRA_SOLR
Infra Solr znode: /infra-solr
Service detected: RANGER
Ranger Solr collection: ranger_audits
Ranger backup path: /tmp/back/ranger
Service detected: ATLAS
Atlas Solr collections: fulltext_index, edge_index, vertex_index
Atlas backup path: /tmp/back/atlas
Kerberos: enabled
Config file generation has finished successfully Important - It is NOT RECOMMENDED to restart Ambari-infra during the upgrade process. Restart ambari-infra only when you get an issue and restart is unavoidable. Once done resume with the steps mentioned in this link
... View more
Labels:
10-18-2018
10:51 PM
This is an article which describes how to access AMS embedded Hbase in a secured kerberized environment. This is extension of this document
For a unsecured cluster, please read the above document however in case of secured environment you may need run one extra export command. The final command will be similar to below syntax <sqline.py path> <ams_collector_hostname>:<embedded_zk_port>:<znode> /usr/hdp/2.5.0.0-1245/phoenix/bin/sqlline.py rchaman.ambari.apache.org:61181:/ams-hbase-unsecure Please find the below steps to get the parameters as per your environment.
1. We need to make sure that we have the phoenix client packages installed on the host from where you are running command. however, I suggest you to run the command from AMS collector host. If it is not installed then we can install it as following: yum install phoenix -y 2. Search for sqlline.py script absolute path [root@c4140-node4 ~]# rpm -qa |grep phoenix
phoenix_2_6_4_0_91-4.7.0.2.6.4.0-91.noarch
[root@c4140-node4 ~]# rpm -ql phoenix_2_6_4_0_91-4.7.0.2.6.4.0-91.noarch|grep sqlline.py
/usr/hdp/2.6.4.0-91/phoenix/bin/sqlline.py 3. Now get the value of "zookeeper.znode.parent" from the Ambari Metrics Collector configurations. In ambari we can get the value from : "Ambari Metrics" -> "config" -> "Advanced ams-hbase-site", Then search for "zookeeper.znode.parent". Normally the value in secured cluster will be "/ams-hbase-secure" 4. Use embedded zookeeper port as 61181 and here, ams_collector_hostname is the hostname on which AMS collector is running in embedded mode . 5. Finally, once we have this information we need to run export command which instructs phoenix to read AMS hbase configuration rather default hbase configuration. export HBASE_CONF_DIR=/etc/ambari-metrics-collector/conf 6. Now run the command [ams@c4140-node4 ~]$ /usr/hdp/2.6.4.0-91/phoenix/bin/sqlline.py localhost:61181:/ams-hbase-secure
Setting property: [incremental, false] Connected to: Phoenix (version 4.7)
Driver: PhoenixEmbeddedDriver (version 4.7)
Autocommit status: true
Transaction isolation: TRANSACTION_READ_COMMITTED
Building list of tables and columns for tab-completion (set fastconnect to true to skip)...
195/195 (100%) Done
Done
sqlline version 1.1.8
0: jdbc:phoenix:localhost:61181:/ams-hbase-se> !tables
+------------+--------------+--------------------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+---------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION | INDEX_STATE | IMMUTABLE_ROWS | SALT_BUCKETS | MULTI_TENANT | VIEW_STATEMENT | VIEW_TYPE | INDEX_TYPE | TRANSACTIONAL | IS_NAME |
+------------+--------------+--------------------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+---------+
| | SYSTEM | CATALOG | SYSTEM TABLE | | | | | | false | null | false | | | | false | false |
| | SYSTEM | FUNCTION | SYSTEM TABLE | | | | | | false | null | false | | | | false | false |
| | SYSTEM | SEQUENCE | SYSTEM TABLE | | | | | | false | 2 | false | | | | false | false |
| | SYSTEM | STATS | SYSTEM TABLE | | | | | | false | null | false | | | | false | false |
| | | CONTAINER_METRICS | TABLE | | | | | | true | null | false | | | | false | false |
| | | HOSTED_APPS_METADATA | TABLE | | | | | | false | null | false | | | | false | false |
| | | INSTANCE_HOST_METADATA | TABLE | | | | | | false | null | false | | | | false | false |
| | | METRICS_METADATA | TABLE | | | | | | false | null | false | | | | false | false |
| | | METRIC_AGGREGATE | TABLE | | | | | | true | null | false | | | | false | false |
| | | METRIC_AGGREGATE_DAILY | TABLE | | | | | | true | null | false | | | | false | false |
| | | METRIC_AGGREGATE_HOURLY | TABLE | | | | | | true | null | false | | | | false | false |
| | | METRIC_AGGREGATE_MINUTE | TABLE | | | | | | true | null | false | | | | false | false |
| | | METRIC_RECORD | TABLE | | | | | | true | null | false | | | | false | false |
| | | METRIC_RECORD_DAILY | TABLE | | | | | | true | null | false | | | | false | false |
| | | METRIC_RECORD_HOURLY | TABLE | | | | | | true | null | false | | | | false | false |
| | | METRIC_RECORD_MINUTE | TABLE | | | | | | true | null | false | | | | false | false |
+------------+--------------+--------------------------+---------------+----------+------------+----------------------------+-----------------+--------------+-----------------+---------------+---------------+-----------------+------------+-------------+----------------+---------+
0: jdbc:phoenix:localhost:61181:/ams-hbase-se> Note: In case we do not export AMS hbase config, phoenix will read default hbase config and show below error [ams@c4140-node4 ~]$ /usr/hdp/2.6.4.0-91/phoenix/bin/sqlline.py localhost:61181:/ams-hbase-secure
Setting property: [incremental, false]
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix:localhost:61181:/ams-hbase-secure none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:localhost:61181:/ams-hbase-secure
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.4.0-91/phoenix/phoenix-4.7.0.2.6.4.0-91-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.4.0-91/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
18/10/18 22:37:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/10/18 22:37:53 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
18/10/18 22:37:54 WARN ipc.AbstractRpcClient: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
18/10/18 22:37:54 FATAL ipc.AbstractRpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
... View more
Labels:
08-23-2018
11:11 PM
For HDFS analysis, an Activity Analyzer needs to be deployed to each NameNode in the cluster. These instances will automatically begin processing the fsimage on startup and will reprocess the latest fsimage data once every 24 hours. By default, when deployed on a NameNode, these Activity Analyzers do not process YARN, or MapReduce & Tez utilization data; This is to reduce the amount of processing done on servers hosting critical services like the NameNode. Unless we install analyzer on namenode, it will not be able to find fsimage in directory and hence fail with below error 2018-08-23 14:28:04,401 ERROR [pool-3-thread-1] ActivityManager:78 - Failed to process activity id null of type HDFS
java.lang.NullPointerException
at java.io.File.<init>(File.java:277)
at com.hortonworks.smartsense.activity.hdfs.HDFSImageProcessor.processActivity(HDFSImageProcessor.java:71)
at com.hortonworks.smartsense.activity.ActivityManager$1.call(ActivityManager.java:69)
at com.hortonworks.smartsense.activity.ActivityManager$1.call(ActivityManager.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-08-23 14:28:04,403 INFO [Thread-3] ActivityManager:172 - Shutting down activity manager...
2018-08-23 14:28:04,425 WARN [Thread-6] HeapMemorySizeUtil:55 - hbase.regionserver.global.memstore.upperLimit is deprecated by hbase.regionserver.global.memstore.sizectivityManager:78 - Failed to process activity id null of type HDFSShutting down activity Solution - Install activity analyzer on both the namenodes
... View more