Member since
10-02-2017
112
Posts
71
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2040 | 08-09-2018 07:19 PM | |
2271 | 03-16-2018 09:21 AM | |
2593 | 03-07-2018 10:43 AM | |
616 | 02-19-2018 11:42 AM | |
3111 | 02-02-2018 03:58 PM |
09-10-2018
02:04 PM
1. Its amabari alerts framework which is creating the ticket as it, does a regular health check by connecting to HS2 using beeline. beeline needs a ticket to connect to HS2. 2. The frequency is every 2 minutes. 3. HiveServer2 always generate TGT in memory and never on disk_cache. 4. Even if you delete the ticket your HS2 will work fine. 5. If you have more than one HiveServer2 you will see the ticket is being generated only one of the instances.
... View more
09-10-2018
02:01 PM
1. Can you please loging to the host hosting namenode. 2. id UserName : shows the group the user is pointed to . Do you see the group present in Ranger for the user. There is also a possibility that LDAP is configured directly, and the grousp are being pulled from LDAP.
... View more
09-10-2018
01:52 PM
premeption of a query in not enabled till 2.6.4. Kindly use 2.6.5 or higher version for utilizing the prempetion feature of LLAP
... View more
09-10-2018
01:50 PM
Try setting the parameters before running the query. Play with set mapred.min.split.size=100000000; and max.split.size for optimal performance. set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; set mapred.min.split.size=100000000;
... View more
09-10-2018
11:45 AM
1. Spark Dynamic allocation I believe your Zeppelin is configured to spawn as many executors as possible for SPARK. Kindly enable Dynamic allocation for Spark in Zeppelin. 2. Yarn Queue User Limit. Can you also check whats your YARN queue configuration. You can limit the number of containers that can be used by a given user using user limit factor.
... View more
09-10-2018
11:42 AM
By default your spark job will spawn one task for each file hence it will be highly parallel. This will also be inefficient as task will take time to spawn for every file, if there are not enough executors to spawn 2500 task parallely ( or 2500X tasks , where X is number of days) Various Approaches 1. Try writing combineparquetfileinputformat : a http://bytepadding.com/big-data/spark/combineparquetfileinputformat/ so that one task can read multiple files located on same host or rack. 2. Run a merge job before reading the files.
... View more
09-10-2018
11:35 AM
1 Kudo
Kindly find the heap sizing of all the services for 80 Node HDP cluster (HDP-2.6.4). https://community.hortonworks.com/content/kbentry/209792/tuning-heap-sizing-for-a-80-node-hdp-cluster.html https://community.hortonworks.com/articles/209789/making-hiveserver2-instance-handle-more-than-500-c.html
... View more
09-10-2018
11:29 AM
1. Spark Dynamic allocation I believe your Zeppelin is configured to spawn as many executors as possible for SPARK. Kindly enable Dynamic allocation for Spark in Zeppelin. 2. Yarn Queue User Limit. Can you also check whats your YARN queue configuration. You can limit the number of containers that can be used by a given user using user limit factor.
... View more
09-10-2018
11:22 AM
can you try running MSCK REPAIR TABLE TABLENAME and re-execute your query.
... View more
08-31-2018
03:37 PM
12 Kudos
Thanks to Christoph Gutsfeld, Matthias von Görbitz and Rene Pajta for all their valuable pointers for writing this article. The article provides a indetailed and thorugh understanding of Hive LLAP. Understanding YARN YARN is essentially a
system for managing distributed applications. Itconsists of a
central Resource manager, which arbitrates all available cluster
resources, and a per-node Node Manager, which takes direction from the Resource
manager. Resource Manager and node Manager follow a master slave relationship.
The Node manager is responsible for managing available resources on a single
node. Yarn defines a unit of work in terms of container.. It is available in each node.
Application Master negotiates container with the scheduler(one of the component
of Resource Manager). Containers are launched by Node Manager Understanding YARN Memory configuration
Memory allocated for all YARN container on a
node : Total amount of memory that can
be used by Node manager on every node for allocating containers. Minimum container size : minimum amount of RAM
that will be allocated to a requested container. Any container requested will
be allocated memory in multiple of the Minimum container size. Maximum container size : The max amount of RAM
that can be allocated to a single container. Maximum container size <= Memory
allocated for all YARN container on a node LLAP
Daemons run as yarn container hence LLAP daemon size should be >=
Minimum container size but <= Maximum container size Understanding CPU
Memory configuration
Percentage of physical CPU allocated for all
containers on a node : X% of the total cpu
that can be used by the containers.
The value should never be 100% as cpu is needed by data Nodes, Node
Manager and OS. Minimum Container Vcores: minimum
number of cpu that will be allocated to a given container. Maximum Container Vcore: maximum number of Vcpu
that can be allocated to a container. CPU isolation : this enables c-groups, enforcing containers to use exactly the number
of CPU allocated to them. If this option is disabled then a container is free
to occupy all the CPUs available on the machine. LLAP
daemon run as a big YARN container hence always ensure that Maximum
Container Size Vcore is set equal to number of Vcores available to run YARN Container ( 80% of total number of CPU
available on that host). If
CPU isolation is enabled it becomes even more important to set Maximum Container Size Vcore to its appropriate value Hive LLAP Architecture https://cwiki.apache.org/confluence/display/Hive/LLAP known as Live Long
and Process, LLAP provides a hybrid execution model. It consists of a
long-lived daemon which replaces direct interactions with the HDFS Data Node,
and a tightly integrated DAG-based framework.
Functionality such as caching, pre-fetching, some query processing and access
control are moved into the daemon. Small/short queries are largely
processed by this daemon directly, while any heavy lifting will be performed in
standard YARN containers. Similar to the Data
Node, LLAP daemons can be used by other applications as well, especially if a
relational view on the data is preferred over file-centric processing. The
daemon is also open through optional APIs (e.g., Input Format) that can be
leveraged by other data processing frameworks as a
building block. Hive LLAP consists of the following component
Hive Interactive Server :
Thrift server which provide JDBC interface to connect to the Hive LLAP. Slider AM : The slider application which spawns,
monitor and maintains the LLAP daemons. TEZ AM query coordinator : TEZ Am which accepts
the incoming the request of the user and execute them in executors available
inside the LLAP daemons (JVM). LLAP daemons : To facilitate caching and JIT
optimization, and to eliminate most of the startup costs, a daemon runs on the
worker nodes on the cluster. The daemon handles I/O, caching, and query
fragment execution. LLAP configuration in details
Component
Parameter
Conf Section of Hive
Rule and comments
SliderSize
slider_am_container_mb
hive-interactive-env
=yarn.scheduler.minimum-allocation-mb
Tez AM coordinator Size
tez.am.resource.memory.mb
tez-interactive-site
=yarn.scheduler.minimum-allocation-mb
Number of Cordinators
hive.server2.tez.sessions.per.default.queue
Settings
Number of Concurrent Queries LLAP support.
This will result in spawning equal number of TEZ AM.
LLAP DaemonSize
hive.llap.daemon.yarn.container.mb
hive-interactive-site
yarn.scheduler.minimum-allocation-mb <= Daemon Size <= yarn.scheduler.maximu-allocation-mb.
Rule of thumb always set it to yarn.scheduler.maximu-allocation-mb.
Number of Daemon
num_llap_nodes_for_llap_daemons
hive-interactive-env
Number of LLAP Daemons running
Number of Daemons
num_llap_nodes_for_llap_daemons
hive-interactive-env
Number of LLAP Daemons running. This will determine total Cache and
executors available to run any query on LLAP
ExecutorSize
hive.tez.container.size
hive-interactive-site
4 – 6 GB is the recommended value. For each executor you need to
allocate one VCPU
Number of Executor
hive.llap.daemon.num.executors
Determined by number of “Maximum VCore in YARN”
LLAP Daemon configuration in details
Component
PARAMETER NAME
SECTION
Rule and comments
Maximum YARN container Size
yarn.scheduler.maximu-allocation-mb.
YARN settings
This is the maximum amount of memory a Conatiner can be allocated
with. Its Recommended to RUN LLAP
daemon as a big Container on a node
DaemonSize
hive.llap.daemon.yarn.container.mb
hive-interactive-site
yarn.scheduler.minimum-allocation-mb <= Daemon Size <= yarn.scheduler.maximu-allocation-mb.
Rule of thumb always set it to yarn.scheduler.maximu-allocation-mb.
Headroom
llap_headroom_space
hive-interactive-env
MIN (5% of DaemonSize or 6 GB). Its off heap But part of LLAP Daemon
HeapSize
llap_heap_size
hive-interactive-env
Number of Executor \* hive.tez.container.size
Cache Size
hive.llap.io.memory.size
hive-interactive-site
DaemonSize - HeapSize – Headroom. Its off heap but part of LLAP
daemon
LLAP Queue Size
Slider Am Size + Number of Tez
Conatiners \* hive.tez.container.size +
Size of LLAP Daemon \* Number of LLAP Daemons
LLAP on YARN LLAP Interactive Query Configuration. LLAP YARN Queue
Configuration Key configurations
to set 1.User Limit Factor =1 2.Capacity and Max capacity = 100 LLAP Daemon in Detail Sizing Rule of Thumb Parameter Tuning CACHE Managing LLAP through Command line Utility (Flex)
List Slider jobs : slider list List Slider status : slider status slider-application-name ( llap0) List Diagnostic Status of a Slider App : slider
diagnostics --application – name slider-application-name (llap0) –verbose. Scale down LLAP daemon : slider flex slider-application-name (llap0) --component LLAP -1 Scale up a new LLAP daemon : slider
flex slider-application-name (llap0) --component LLAP +1. To
stop Slider App : Slider stop slider-application-name (llap0) Trouble Shooting : Finding Which Hosts
the LLAP daemons are running Ambari -> Hive -> HiveServer2 Interactive UI -> Running
Instances Behavior 1. In
HDP 2.6.4 preemption of queries is not supported.
2. If multiple concurrent queries have
exhausted the queue then any incoming
query will bin waiting state.
3. All Queries running on Hive LLAP can be seen in the TEZ UI.
... View more
- Find more articles tagged with:
- Data Processing
- FAQ
- Hive
- llap
Labels:
08-30-2018
04:46 PM
1 Kudo
Why to use spark-submit if spark-shell is there ? 1. Spark shell spawns executors on random nodes and hence chances of data Locality will be very less. 2. Spark-submit based on the Nodes where the data is saved spawns the excutors hence spark-submit will be more performant as compared to spark-shell. 3. Spark shell is good in situations when data exploration needs to be done as it provides a interactive CLI to run your code.
... View more
- Find more articles tagged with:
- Data Processing
- FAQ
- Spark
- spark-shell
- spark-sql
Labels:
08-30-2018
03:42 PM
3 Kudos
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development APIs to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets. With Spark running on Apache Hadoop YARN, developers everywhere can now create applications to exploit Spark’s power, derive insights, and enrich their data science workloads within a single, shared dataset in Hadoop. The Hadoop YARN-based architecture provides the foundation that enables Spark and other applications to share a common cluster and dataset while ensuring consistent levels of service and response. Spark is now one of many data access engines that work with YARN in HDP. https://hortonworks.com/apache/spark/ Spark code samples: http://bytepadding.com/spark/ Take Away 1. Spark
is a library and not a service. 2. Spark interacts with multiple services
like HDFS and YARN to process data. 3. Spark client also has YARN client wrapped
within it. 4. Spark can be configured to run both
locally and on the cluster 5. Spark context is the entry point to
interact with a Spark process. 6. Spark is a JVM based execution engine. Take Away 1. Each line of your code is parsed to prepare a spark Plan. 2. sc.TextFile => results in fetching the metaInfo from Name Node of where are the file Blocks are located and requesting YARN for containers on those host. Text File also provides the information about the record delimeter used ( new line character in case of Text). 3. The transformations are all grouped together in a Task. The transformation are serialized on driver and send to the executors. Do appreciate all transformation and object creation happens on Driver and subsequently sent to executors. 4. Reduce by results in Data Shuffling know as Stages in Spark. 5. saveAsTextFile interacts with Name Node to get information about where to save the file and saves the file on HDFS.
... View more
Labels:
08-30-2018
02:51 PM
3 Kudos
Hadoop Distributed File System HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers. HDFS has demonstrated production scalability of up to 200 PB of storage and a single cluster of 4500 servers, supporting close to a billion files and blocks. When that quantity and quality of enterprise data is available in HDFS, and YARN enables multiple data access applications to process it, Hadoop users can confidently answer questions that eluded previous data platforms. HDFS is a scalable, fault-tolerant, distributed storage system that works closely with a wide variety of concurrent data access applications, coordinated by YARN. HDFS will “just work” under a variety of physical and systemic circumstances. By distributing storage and computation across many servers, the combined storage resource can grow linearly with demand while remaining economical at every amount of storage. Take Away 1. HDFS is based on a master Slave Architecture with Name Node (NN) being the master and Data Nodes (DN) being the slaves. 2. Name Node stores only the meta Information about the files, actual data is stored in Data Node. 3. Both Name Node and Data Node are processes and not any super fancy Hardware. 4. The Data Node uses the underlying OS file System to save the data. 4. You need to use HDFS client to interact with HDFS. The hdfs clients always talks to Name Node for meta Info and subsequently talks to Data Nodes to read/write data. No Data IO happens through Name Node. 5. HDFS clients never send data to Name Node hence Name Node never becomes a bottleneck for any Data IO in the cluster 6. HDFS client has "short-circuit" feature enabled hence if the client is running on a Node hosting Data Node it can read the file from the Data Node making the complete read/write Local. 7. To even make it simple imagine HDFSclient is a web client and HDFS as whole is a web service which has predefined task to GET, PUT, COPYFROMLOCAL etc. How is a 400 MB file Saved on HDFS with hdfs block size of 100 MB. The diagram shows how first block is saved. In case of replication each block will be saved 3 on different Data Nodes. The meta info saved on Name Node (Replication Factor of 3 is used hence each block is saved thrice) Block Placement Strategy
Place the first replica somewhere – either a random node (if the HDFS client is outside the Hadoop/DataNode cluster) or on the local node (if the HDFS client is running on a node where data Node is running. "short-circuit" optimization). Place the second replica in a different rack. (This ensures if power supply of one rock goes down still the block can be read from other rack.) Place the third replica in the same rack as the second replica. ( This ensures in case a yarn container can be allocated on a give host, the data will be served from a host in the same rack. Data transfer in same rack is faster as compared to across rack ) If there are more replicas – spread them across the rest of the racks. YARN (Yet Another Resource Negotiator ) "does it ring a bell 'Yet Another Hierarchically Organized Oracle' YAHOO" YARN
is essentially a system for managing distributed applications. It consists of a central Resource manager (RM),
which arbitrates all available cluster resources, and a per-node Node Manager (NM), which
takes direction from the Resource manager.
The Node manager
is responsible for managing available resources on a single node.
http://hortonworks.com/hadoop/yarn/ Take Away 1. YARN is based on a master Slave Architecture with Resource Manager being the master and Node Manager being the slaves. 2. Resource Manager keeps the meta info about which jobs are running on which Node Manage and how much memory and CPU is consumed and hence has a holistic view of total CPU and RAM consumption of the whole cluster. 3. The jobs run on the Node Manager and jobs never get execute on Resource Manager. Hence RM never becomes a bottleneck for any job execution. Both RM and NM are processes and not some fancy hardware 4. Container is logical abstraction for CPU and RAM. 5. YARN (Yet Another Resource Negotiator) is scheduling container (CPU and RAM ) over the whole cluster. Hence for end user if he needs CPU and RAM in the cluster it needs to interact with YARN 6. While Requesting for CPU and RAM you can specify the Host one which you need it. 7. To interact with YARN you need to use yarn-client which How HDFS and YARN work in TANDEM 1. Name Node and Resource Manager process are hosted on two different host. As they hold key meta information. 2. The Data Node and Node manager processes are co-located on same host. 3. A file is saved onto HDFS (Data Nodes) and to access a file in Distributed way one can write a YARN Application (MR2, SPARK, Distributed Shell, Slider Application) using YARN client and to read data use HDFSclient. 4. The Distributed application can fetch file location ( meta info From Name Node ) ask Resource Manager (YARN) to provide containers on the hosts which hold the file blocks. 5. Do remember the short-circuit optimization provided by HDFS, hence if the Distributed job gets a container on a host which host the file block and tries to read it, the read will be local and not over the network. 6. The same file If read sequentially would have taken 4 sec (100 MB/sec speed) can be read in 1 second as Distributed process is running parallely on different YARN container( Node Manager) and reading 100 MB/sec *4 in 1 second.
... View more
- Find more articles tagged with:
- FAQ
- Hadoop Core
- HDFS
- YARN
- yarn-container
Labels:
08-30-2018
01:31 PM
check_mk is what most use. It is easy to configure provides you with a Nice UI with history saved. The check_mk agents consume very less CPU and RAM hence avoiding any kind of any negative impact on any other application running on the Host.
... View more
08-18-2018
06:08 PM
3 Kudos
Cluster In Solr, a cluster is a set of Solr nodes operating in coordination with each other via ZooKeeper, and managed as a unit. A cluster may contain many collections. See also SolrCloud. Collection In Solr, one or more Documents grouped together in a single logical index using a single configuration and Schema. In SolrCloud
a collection may be divided up into multiple logical shards, which may
in turn be distributed across many nodes, or in a Single node Solr
installation, a collection may be a single Core. Commit To make document changes permanent in the index. In the case of added documents, they would be searchable after a commit. Core An individual Solr instance (represents a logical index). Multiple cores can run on a single node. See also SolrCloud. Key Take Away 1. Solr works on a non master-slave architecture, every solr node is master of its own. Solr nodes uses Zookeper to learn about the state of the cluster. 2. A solr Node (JVM) can host multiple core 3. Core is the place where Lucene (Index) engine is running. Every core has its own Lucene engine 4. A collection will be divided in shards. 5. A shard will be represented as core (A part of JVM) in the Solr Node (JVM) 6. Every solr node keeps sending heartbeat to Zookeeper to inform about its availability. 7. Usage of Local FS provides the most stable and best IO for solr. 8. A replication factor of 2 should be maintained on local mode to avoid any data loss. 9. Do remember every replication will have a core attached to it and also space is disk. 10. If a collection is divided into 3 shards with replication factor of 3 : in total 9 cores will be hosted across the solr nodes. Data saved on local fs will be 3X 11. Solr node doesnt publish data to ambari metrics by default. A solr metric process ( a seperate process that solr node) needs to be run on every node where solr node is hosted. The metric process fetches data from solr node and pushes to ambari metrics. Solr on HDFS 1. Solr node should be colocated with data nodes for best performance. 2. Because of DataNodes are used used by Spark, Hbase this setup can result into unstable Solr Cloud easily. 3. Because of heavy CPU consumption on data nodes solr nodes can loose to establish heart beat connection to zookeeper resulting in the solr node being removed from solr cluster. 4. Watch for solr logs to make sure short-circuit writes are being used. 5. At collection level you are compelled to use replication factor of 2 else a restart of one node will result in the collection being unavailable. 6. Replication of 2 at collection level and Replication Factor of 3 at HDFS can significantly impact the Write peroformance. 7. Ensure the RF of the Sole HDFS directory is set to 1. Lucene Indexing on single Core Pic taken from : https://www.quora.com/What-is-the-internal-architecture-of-Apache-solr Reference : https://lucene.apache.org/solr/guide/6_6/solr-glossary.html#solrclouddef
... View more
- Find more articles tagged with:
- architecture
- best-practices
- Design & Architecture
- FAQ
- solr
Labels:
08-10-2018
10:15 AM
Can you please check the jar that you have added has some classes that clash with the classes of hive. 1. I can see jersey and eclipse jetty. 2. can you please ensure the jar version of these jars (and any shared jar) is the same with what is present in your platform
... View more
08-09-2018
07:24 PM
Lets have a look at the write path without WAL 1. Data written by client is written into HEAP(RAM) of the RegionServer and flushed on to disk after certain criteria is met and not immediately. 2. Data held in heap of Region Server will be lost, if Hbase crashed before it is written on to Disk. 3. disabling WAL will have the same effect if you use Phoneix or directly inteact with Hbase through API or hbase shell. As WAL is a hbase feature and not Phoneix. 4. Disabling WAL will improve the Write throughput but comes at the cost of incurring Data Lost.
... View more
08-09-2018
07:19 PM
1. COUNT will result in a full table scan and hence the query is slow. 2. Where on the primary key will be fast as it will do a lookup and not a scan. 3. Where used on any column apart from the primary key will result in a HBase full table scan. 4. Analyse table once to speed up count queries. But it will not affect the where on no-primary key.
... View more
08-09-2018
07:17 PM
1. What is concurrency set for LLAP, how may TEZ AM can you see in the resource Manager UI ? 2. Can you kindly look at the TEZ UI and graphana LLAP to find if the first query is occupying all the LLAP executors. As your LLAP size is quite small I believe even premepetion is not helping to launch two jobs simulataneously.
... View more
08-09-2018
06:57 PM
1 Kudo
Sqoop is used for importing data from multiple sources onto HDFS. One of the most common use can is to use Hive imports sqoop import --connect jdbc:mysql://localhost:3306/sqoop --username root -P --split-by id --columns id,name --table customer --target-dir /user/cloudera/ingest/raw/customers --fields-terminated-by "," --hive-import --create-hive-table --hive-table sqoop_workspace.customers If you want to specify a specific Queue for sqoop job -Dmapred.job.queuename=queueName needs to be immediately added after the import keyword. sqoop import -Dmapred.job.queuename=queueName --connect jdbc:mysql://localhost:3306/sqoop --username root -P --split-by id --columns id,name --table customer --target-dir /user/cloudera/ingest/raw/customers --fields-terminated-by "," --hive-import --create-hive-table --hive-table sqoop_workspace.customers
This will launch the sqoop job in the specific queu but the hive job will be launched in the default queue. To launch the hive job is specific queue make a copy of tez-site.xml and in the queue name add the queue you want the hive job to be executed. Property of tez-site.xml <property> <name>tez.queue.name</name> <value>custom Queue Name </value> </property> export HIVE_CONF_DIR=PATH OF DIR WHERE CUSTOM tez-site.xml is placed run the sqoop job with the export statement executed. Do remember to add -Dmapred.job.queuename=queueName (immediately after import) to set the sqoop queue name and tez-site.xml for hive queue name
... View more
- Find more articles tagged with:
- Data Processing
- Hive
- How-ToTutorial
- import
- Sqoop
Labels:
08-03-2018
06:31 PM
2 option is not the perfect approach but is full proof, as its never possible for HDFS directory to be empty while Hive has data.
... View more
08-03-2018
06:12 PM
Various scenarios are possible 1. First job is consuming all VCores or Memory hence there is no resporuce left for the next job to spawn. 2. Prempetion has been disabled for the queue. 3. UserLimit is 100% for a given user hence the job is allowed to occupy the complete queue.
... View more
08-03-2018
06:12 PM
Various scenarios are possible 1. First job is consuming all VCores or Memory hence there is no resporuce left for the next job to spawn. 2. Prempetion has been disabled for the queue. 3. UserLimit is 100% for a given user hence the job is allowed to occupy the complete queue.
... View more
08-03-2018
06:09 PM
1. select * from database.table limit 1 It will never perform a full table scan. You can verify this on Resource Manager with no new job being spawned. 2. You can find the HDFS path corresponding to the table and do a hdfs -du on the path to know what is the size of directory.
... View more
07-31-2018
07:40 PM
1 Kudo
The general perception that one needs to thoroughly know the code base for Understanding and Debugging a HDP service. Do appreciate the fact that HDP services including the execution engine (MR, TEZ, SPARK) are all JVM based processes. JVM along with the Operating Systems provides various knobs to know what the process is doing and performing at run-time. The steps mentioned below can be applied for understanding and debugging any JVM process in general. Lets take an example of how HiveServer2 works assuming one is not much acquainted or has a deep understanding of the code base. We are assuming one knows what the service does, but how it works internally the user is not aware of 1. Process Resource Usage Gives a complete overview of the usage pattern of CPU, memory by the process providing a quick insight of the health of the process 2. How to figure out which all service the process interacts with ADD JMX parameters to the service allowing you to visualize whats happening within the JVM at run-time using jconsole or jvisualm What this ensures is the JVM is broadcasting the metrics on port 7001 and can be connected using jconsole or Jvisualm. There is no security enabled on can add SSL certificates too for authentication What can we infer about the kind of threads we see from jconsole Number of threads have peaked to 461, currently only 170 are active Abandoned connection cleaner is cleaning the lost connections. Kafka- kerberos-refres-thread : As HS2 supports kerberos, TGT needs to be refreshed after the renewal period. It means we dont have to manually refresh the kerberos ticket. HS2 is interacting with Kafka and Atlas as can be seen by the threads above CuratorFramework is the class used for talking to zookeeper means HS2 is interacting with Zookeeper. HS2 is interacting with SOLR and Hadoop services (RMI) HS2 is sending audits to ranger, means HS2 is interacting with Ranger HS2 has some HiveServer2-Handler thread which are used for doing some reading from thrift Socket (this is the thread used for responding to connection) This view provides overall view of whats happening with the JVM. GC : Old gen ParNew GC has been happening . and 10 mins have been spend on minor GC ConcurrentMarkSweep is used for Tenured Gen and 1 min has been spend. If you look VM summary (Total up-time of 3 days 3 hours) you can find when was the VM started and since then 10 mins has been spend on minor and 1 min on Major GC providing a good overview of how the process is tuned to handle GC and is the heap space sufficient. Hence the above information provides the HS2 interaction with 1. Ranger 2. Solr 3. Kafka 4. Hadoop Components (NameNode, DataNode) 5. Kerberos 6. As its running Tez it should be interacting with Resource Manager and Node Manager 7. Heap and GC performance Lets take a further deep dive of how exactly JVM is performing over time. use jvisualm to find real time info of Threads and sampling of CPU and RAM all the above mentioned information can be derived from command line too 1. find the pid of the process ps -eaf | grep hiveserver2 (process name ) will fetch you the pid of the process 2. finding memory usage and gc at real time Total Heap = EDEN SPACE (S0C + S1C + EC ) + Tenured Gen ( OC) Currently Used Heap = EDEN SPACE (S0U + S1U + EU ) + Tenured Gen (OU) YGC = 15769 says till now how many time young GC has been done FGC = 83 How many times the FGC has been done If you see the count increasing too frequently its time to optimize it . JMAP to find the instances available inside the class along with their count and memory footprint Know state of threads running use jstack top -p pid RES = memory it consumes in RAM which should be equal to the heap consumed Find ports on which the process is listening to and which all clients its is connected To find any errors being reported by the service to client scan the network packets tcpdump -D to find the interfaces the machine has tcpdump -i network interface port 10000 -A (tcpdump -i eth0 port 10000 -A ) Scan the port 10000 on which HS2 is listening and look at the packets exchanged to find which client is having the exceptions Any GSS exception or any other exception reported by Server to client can be seen in the packet. To know the cpu consumption and memory consumed by the process top -p pid To know the the disk I/O read write done by the process use iotop -p pid Some OS commands to know about the how the host is doing running the process 1. iostat -c -x 3 dispaly the CPU and Disk IO 2. mpstat -A ALL get utilization of individual CPU 3. iostat -d -x 5 disk utilization 4. ifstat -t -i interface network utilization Take away 1. Any exception you see in the process logs for clients can be tracked in the network packet, hence you dont need to enable debug logs ( use tcpdump ) 2. A process consumes exceptionally high memory under High GC specially frequent FULL and minor GC 3. In hadoop every service client has a retry mechanism (no client fails after one retry) hence search for retries in the log and try to optimize for that 4. jconsole and jvisualm reveals all important info from threads, memory, cpu utilization, GC 5. keep a check on cpu, network, disk and memory utilization of the process to get holistic overview of the process. 6. In case of failure take heap dump and analyse using jhat for deeper debugging. Jhat will generally consume 6X the memory of the size of the dump (20 GB heap dump will need 120 GB of heap for jhat to run, jhat -J-mx20g hive_dump.hprof) 7. Always refer to the code to have a correlation of th process behavior to memory foot print. 8. jmap output will help you understand what data structure is consuming most of heap and you can point to where in the code the data Structure is being used. Jhat will help you get the tree of the data Structure.
... View more
- Find more articles tagged with:
- Data Processing
- Debugging
- FAQ
- hiveserver2
- java
- process
- service
Labels:
07-30-2018
05:15 PM
6 Kudos
Heap resigning and Garbage Collection tuning plays a central role in deciding how healthy and efficiently the cluster resources will be utilized. One of the biggest challenges is to fine tune the heap to make sure neither you are underutilized or over utilizing the resources. The following heap sizing has been made after a in depth analysis of the health of individual services. Do remember this is the base line you can add more heap to your resources depending on the kind of work load one executes. Key take away 1. All the services present in HDP are JVM based and all need appropriate heap sizing and GC tuning 2. HDP and YARN are the base of all the services, hence make sure NN, DN, RM and NN are given sufficient heap. 3. Rule of thumb, heap sizing till 20 - 30 GB doesnt require any special tuning, anything above the value need fine tuning. < 4. 99% of time your cluster is not working efficiently because the services are suffering GC and appear RED in the UI. 5. Most of the Services have short lived connections and interactions hence always provide enough space to the young Generation in the order of 5 - 10 GB (depending on load and concurrency). 6. HiveServer2 , Spark Thrift Server, LLAP sever needs special attention as this service interacts with each and every component in cluster, slowness of any of the component will impact the connection Establishment time of these services. 7. Look for "retries/ retry" in your log, to know which services are slow and fine tune it. HDFS YARN MapReduce2 HBase Oozie Zookeeper Ambari Infra Ambari Metrics - Hbase Atlas Spark Thrift Server1 Hive https://community.hortonworks.com/articles/209789/making-hiveserver2-instance-handle-more-than-500-c.html Ambari and Ambari views should run with a heap of at least 15 GB for it to be fast.
... View more
- Find more articles tagged with:
- FAQ
- Hadoop Core
- hdp-2.5.0
- heap
- performance
Labels:
07-30-2018
04:27 PM
8 Kudos
HiveServer2 is a thrift server which is a thin Service layer to interact with the HDP cluster in a seamless fashion. It supports both JDBC and ODBC driver to provide a SQL layer to query the data. An incoming SQL query is converted to either TEZ or MR job, the results are fetched and send back to client. No heavy lifting work is done inside the HS2. It just acts as a place to have the TEZ/MR driver, scan metadata infor and apply ranger policy for authorization. Configurations : TEZ parameters
Tez Application Master Waiting Period (in seconds) -- Specifies the amount of time in seconds that the Tez Application Master (AM) waits for a DAG to be submitted before shutting down. For example, to set the waiting period to 15 minutes (15 minutes x 60 seconds per minute = 900 seconds): tez.session.am.dag.submit.timeout.secs=900
Enable Tez Container Reuse -- This configuration parameter determines whether Tez will reuse the same container to run multiple queries. Enabling this parameter improves performance by avoiding the memory overhead of reallocating container resources for every query. tez.am.container.reuse.enabled=true
Tez Container Holding Period (in milliseconds) -- Specifies the amount of time in milliseconds that a Tez session will retain its containers. For example, to set the holding period to 15 minutes (15 minutes x 60 seconds per minute x 1000 milliseconds per second = 900000 milliseconds): tez.am.container.session.delay-allocation-millis=900000 A holding period of a few seconds is preferable when multiple sessions are sharing a queue. However, a short holding period negatively impacts query latency. HiveServer2 Parameters Heap configurations : GC tuning : Database scan disabling And Session initialization parameter : Tuning OS parameters (Node on which HS2, metastore and zookeeper are running) : set net.core.somaxconn=16384 set net.core.netdev_max_backlog=16384 set net.ipv4.tcp_fin_timeout=10 Disconnecting Idle connections to lower down the memory footprint (Values can be set to minutes and seconds): Default hive.server2.session.check.interval=6h hive.server2.idle.operation.timeout=5d hive.server2.idle.session.check.operation=true hive.server2.idle.session.timeout=7d Proactively closing connections hive.server2.session.check.interval=60000=1min hive.server2.idle.session.timeout=300000=5mins hive.server2.idle.session.check.operation=true Connection pool (change it depending how concurrent the connections are happening ) General Hive.server2.thrift.max.worker.threads 1000. (If they get exhauseted no incoming request will be served) Things to watch out for : 1. making more than 60 connection to HS2 from a single machine will result in failures as zookeeper will rate limit it. https://community.hortonworks.com/articles/51191/understanding-apache-zookeeper-connection-rate-lim.html 2. Don't forget to disable DB scan for every connection. 3. Watch out for memory leak bugs (HIVE-20192, HIVE-19860), make sure the version of HS2 you are using is patched. 4. Watch out for the number on connection limit on the your backed RDBMS. https://community.hortonworks.com/articles/208966/optimized-mysql-db-config-for-ambari-hive-to-handl.html 5. Depending on your usage you need to fine tune heap and GC, so keep an eye on the full GC and minor GC frequency. 6. Usage of add jar leads to class loader memory leak in some version of HS2, please keep an eye. 7. Do remember in Hadoop for any service the client always retries and hence look for retries log in HS2 and optimize the service, to handle connections seamlessly. 8. HS2 has no upper threshold in terms of the number of connection it can accept, it is limited by the Heap and respone of the other service to scale. 9. Keep an eye on CPU consumption on the machine where HS2 is hosted to make sure the process has enough CPU to work with high concurrency.
... View more
- Find more articles tagged with:
- FAQ
- Hadoop Core
- hdp-2.5.0
- hs2
- performance
07-27-2018
07:12 PM
The article provides a optimized mysql config (Usage of enterprise RDBMS is encouraged ) that can be used, in case mysql is used as the backed DB for Ambari, Hive and Ranger. The configuration is good enough to handle 1000 concurrent users on HS2 and Ambari. The config file my.cnf is generally located at /etc/my.cnf my.cnf ****************************************************************************************************************************** # For advice on how to change settings please see # http://dev.mysql.com/doc/refman/5.6/en/server-configuration-defaults.html [mysqld] # # Remove leading # and set to the amount of RAM for the most important data # cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%. # innodb_buffer_pool_size = 128M # # Remove leading # to turn on a very important data integrity option: logging # changes to the binary log between backups. # log_bin # # Remove leading # to set options mainly useful for reporting servers. # The server defaults are faster for transactions and fast SELECTs. # Adjust sizes as needed, experiment to find the optimal values. # join_buffer_size = 128M # sort_buffer_size = 2M # read_rnd_buffer_size = 2M datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock max_allowed_packet = 32M max_connections = 2000 open_files_limit = 10000 tmp_table_size = 64M max_heap_table_size = 64M query_cache_type = ON query_cache_size = 128M table_open_cache_instances = 16 back_log = 1000 default_storage_engine = InnoDB innodb-buffer-pool-size = 10G innodb_log_file_size = 1024M innodb_log_buffer_size = 16M innodb_flush_log_at_trx_commit = 0 innodb_flush_method = O_DIRECT innodb_buffer_pool_instances = 4 innodb_thread_concurrency = 10 innodb_checksum_algorithm=crc32 # MYISAM in case there is one key-buffer-size = 64M # Disabling symbolic-links is recommended to prevent assorted security risks symbolic-links=0 # Recommended in standard MySQL setup sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES [mysqld_safe] log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid ***************************************************************************************************** Key Take Aways : 1.innodb-buffer-pool-size = 10G This is one of the most important parameters, which says how much data can be cached in RAM. 2. mysqldump -u root --all-databases -p > dump.sql will dump all the databases in a file (which can be used to recovery) and in the process will fetch all the data from disk and cache it in RAM, making all queries lightening fast. 3. max_connections = 2000 concurrent connection as the database is connected by all HS2 instances. This can become a limiting factor and may result in connection timeouts for clients connecting to HS2. 4. Do keep in mind the number of connection the mysql can handle can directly impact the number of users the HS2 server can handle concurrently. 5. For a details of all config parameters of mysql kindly follow the amazing file http://www.speedemy.com/files/mysql/my.cnf 6. https://community.hortonworks.com/articles/80635/optimize-ambari-performance-for-large-clusters.html
... View more
- Find more articles tagged with:
- Design & Architecture
- FAQ
- hiveserver2
- MySQL
- mysql-cluster
Labels:
07-27-2018
06:54 PM
1 Kudo
Ambari schedules a connection check to HS2 every 2 minutes. Beeline client is executed to connect to the HS2 JDBC string with a timeout on client side. IF beeline is not able to connect withing the timeout period, the beeline process is killed. This shows your HS2 is not configured properly. As it is taking more than 60 seconds to connect which is quite unusual. HS2 should connect with 4 - 30 seconds max, for it be usable by external tools like Power BI, alation, Ambari Hive Views. Please follow the following flogs in details to know more about how to debug the issue. https://community.hortonworks.com/content/kbentry/202479/hiveserver2-connection-establishment-phase-in-deta.html https://community.hortonworks.com/content/kbentry/202485/hiveserver2-configurations-deep-dive.html Some steps to know where is the bottelneck 1. Enable debug mode for Beeline 2. Execute beeline with the HS2 JDBC string, it will provide a detailed view of time to connect to AD, Zookeeper, HS2 and mysql 3. Tune HS2 parameters (Tez Session at Initilization = true , disable database scan for each connection= true) 4. Bump up heap of HS2 Blog has all that you need https://community.hortonworks.com/content/kbentry/209789/making-hiveserver2-instance-handle-more-than-500-c.html
... View more
07-15-2018
04:58 PM
7 Kudos
HiveServer2 is a thrift server which is a thin Service layer to interact with the HDP cluster in a seamless fashion. It supports both JDBC and ODBC driver to provide a SQL layer to query the data. An incoming SQL query is converted to either TEZ or MR job, the results are fetched and send back to client. No heavy lifting work is done inside the HS2. It just acts as a place to have the TEZ/MR driver, scan metadata infor and apply ranger policy for authorization. HiveServer 2 maintains two type pools 1. Connection pool : Any incoming connection is handled by the HiveServer2-handler thread and is kept in this pool which is unlimited in nature but restricted by number of threads available to service an incoming request . An incoming request will not be accepted if there are no free HiveServer2-handler thread to service the request. The total no of threads that can be spawnned with the HS2 is controlled by parameter hive.server2.thrift.max.worker.thread. 2. Session pool (per queue) : this is the number of concurrent sessions that can be active. Whenever a connection in connection pool executes a SQL query, and empty slot in the Session queue is found and the sql statement is executed. The queue is maintained for each queue defined in "Default query queue". In this example only Default Queue is defined with the session per queue =3. More than 3 concuurent queries (Not connections) will have to wait until one of the slots in the session pool is empty. * There can be N connections alive in HS2 but at any point in time it can only execute 3 queries rest of the queries will have to wait. HIveServer 2 Connection overview. 1. HS2 listens on port 10000 and accepts connections from client on this port. 2. HS2 has connections opened to Kafka to write metric and lineage information which is consumed by Ambari metrics service and Atlas. 3. HS2 also has connects to DataNodes directly time to time to service request like "Select * from Table Limit N" 4. HS2 also connects to NodeManager where the TEZ AM are spawned to service the SQL query. 5. In The process it is talking to NameNode and Resource Manager, and a switch during a HA will impact the connection. 6. HiveServer2 overall is talking to ATLAS, SOLR, KAFKA, NameNode, DataNode, ResourceManager, NodeManager, MYSQL(overall) For HiveServer2 to be more interactive and Faster following parameters are set 1. Start Tez Session at Initilization = true: what it does is at time of Bootstrapping(start, restart) of HS2 it allocates TEZ AM for every session. (Default Query Queue = default, Session per queue = 3, hence 3 TEZ AM will be spawnned. If there were 2 queues in Default Query Queue = default, datascience and Session per queue = 3 then 6 TEZ AM will be spawned.) 2. Every TEZ AM can also be forced to spawn fix number of container (pre-warm) at time of HS2 bootstrapping. Hold container to reduce latency = true, Number of container held =3 will ensure Every TEZ AM will hold 2 containers to execute a SQL query . 3. Default Query Queue = default, Session per queue = 3, Hold container to reduce latency = true, Number of container held =3 HS2 will spawn 3 TEZ AM for every session. For Every TEZ AM 2 containers will be spawned. hence overall 9 conatiners (3 TEZ AM + 6 containers) can be seen running in Resource Manager UI. 4. For the TEZ to be more efficient and save the time for bootstrapping following parameters needs to be configured in TEZ configs. The parameter prevents the AM to die immediately after job execution. It wait between min- max to service another query else it kills itself freeing YARN resources. Tez.am.container.reuse.enabled=true Tez.am.container.idle.release-timeout.min.millis=10000 Tez.am.container.idle.release-timeout.max.millis=20000 Every connection has a memory footprint in HS2 as 1. Every connection scans the metastore to know about table. 2. Every connection reserves memory to hold metrics and TEZ driver 3. Every connection will reserve memory to hold the output of TEZ execution Hence if a lot of connections are alive then HS2 might become slower due to lack of heap space, hence depending on the number of connections Heap space needs to be tuned. 40 connections will need somewhere around 16 GB of HS2 heap. Anything more than this need fine tuning of GC and horizontal scalling. The client connections can be disconnected based on the idle state based on the following parameter hive.server2.session.check.interval=6h
hive.server2.idle.operation.timeout=5d hive.server2.idle.session.check.operation=true hive.server2.idle.session.timeout=7d HiveServer2 has embedded metastore which Interacts with the RDBMS to store the table schema info. The Database is available to any other service through HiveMetaStore. HiveMetastore is not used by HiveServer2 directly. Problems and solution 1. client have a frequent a timeout : HS2 can never deny a connection hence timeout is a parameter set of client and not on HS2. Most of the Tools have 30 second as timeout, increase it to 90-150 seconds depending on your cluster usage pattern. 2. HS2 becomes unresponsive: This can happen if HS2 is undergoing frequent FULL GC or unavailability of heap. Its time to bump up the heap and horizontally scale. 3. HS2 has large number of connections : HS2 accepts connections on port 10000 and also connects to RDBMS, DataNodes (for direct queries ) and NodeManager (to connect to AM) hence always tries to analyse is it lot of incoming or outgoing connections. 4. As HS2 as a service is interactive its always good to provide HS2 a good size in terms of young Gen. 5. In case SAS is interacting with HS2 a 20 GB heap where 6 -8 GB is provided to youngen is a good practice. Use ParNewGC for young Gen and ConcurrentMarandSweepGC for Tenured gen. To know in details about the connection establishment phase of HiveServer2 https://community.hortonworks.com/articles/202479/hiveserver2-connection-establishment-phase-in-deta.html
... View more
- Find more articles tagged with:
- configuration
- Data Processing
- FAQ
- HiveServer
- hiveserver2
Labels: