Community Articles

Arun- · ‎08-27-2018

Q1.Does Hive LLAP supports stored procedures?

UDF’s

https://community.hortonworks.com/articles/117833/creating-custom-udf-and-adding-udf-jar-to-hive-lla...

Question on handling small files problem

If ACID tables are not used then how to handle small files problem in Hive?. Is there any archival process to follow like creating HAR files?

Alter Table/Partition Concatenate Version information In Hive release 0.8.0 RCFile added support for fast block level merging of small RCFiles using concatenate command.

In Hive release 0.14.0ORC files added support fast stripe level merging of small ORC files using concatenate command.

ALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, ...])] CONCATENATE;

If the table or partition contains many small RCFiles or ORC files, then the above command will merge them into larger files. In case of RCFile the merge happens at block level whereas for ORC files the merge happens at stripe level thereby avoiding the overhead of decompressing and decoding the data.

Question on Mutations

So if we need to apply a thousand mutations, this would be a thousand operations, rather than one bulk operation.

Please refer for the lock section - https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_data-access/content/lock-manager.html

Question on cache eviction:

Can LLAP be used to read more data than can fit into memory?

Yes , it has eviction policy and stored the data in compressed format.

Question on data transfer:

Specific example, if a “select *” is performed on a very large table, can the application receive that data as a “stream” or does some component (LLAP or HiveServer2, etc) need to hold the entire dataset in memory?

All the results are streamed to HDFS and the results are streamed from there . NO memory constraint

Question on query result:

Related, does LLAP send results back as they become available (like Hbase scanresults) or only once the query completes? – Returns the results once SQL completes

Question on compaction:

We may benefit from Hive’s ACID feature to handle “deltas”. Advantages seem to be: •it would allow the updated data to be available in queries before a compaction has taken place. “You can update the data. compaction should be transparent” •compaction implementation already exists, no need for bespoke implementation – Hive has inbuilt compaction technique [Major and Minor]

Question on spark and hive llap integration:

•Can LLAP be leveraged to serve data to Spark jobs efficiently? I.e., can LLAP inform Spark on the partitioning of the data it will provide? Or is it very course, plain jdbc, interface? LLAP Spark Context is in Tech Preview(TP).

Question on cache eviction algorithm:

Seems Hive Metastore does not cache much data. Which means each query for Metadata, which would include statistics, goes through the “datanucleus” ORM layer. Is this correct? LLAP has a metadata cache.

Caching

The daemon caches metadata for input files, as well as the data. The metadata and index information can be cached even for data that is not currently cached. Metadata is stored in process in Java objects; cached data is stored in the format described in the I/O section, and kept off-heap (see Resource management). Eviction policy. The eviction policy is tuned for analytical workloads with frequent (partial) table-scans. Initially, a simple policy like LRFU is used. The policy is pluggable. Caching granularity. Column-chunks are the unit of data in the cache. This achieves a compromise between low-overhead processing and storage efficiency. The granularity of the chunks depends on the particular file format and execution engine (Vectorized Row Batch size, ORC stripe, etc.).

A bloom filter is automatically created to provide Dynamic Runtime Filtering.

Question Hive LLAP on specific Nodes

In Ambari, how to specify where to run llap daemon on specific node.

Running Hive LLAP on specific Nodes using YARN Node Labels https://community.hortonworks.com/content/kbentry/170868/running-llap-on-specific-nodes-using-yarn-n...

How fast I will know when LLAP query execution will fail?.

If this execution mode is sethive. llap.execution.mode=only will fail immediately before submitting to LLAP

How to cancel LLAP queries which are in RUNNING State?

you should check yarn and see ifllaphas enough containers allocated.

1.Yarn top

2.Yarn application -kill <appid>

How to modify LLAP Log Options

There is a flag in Ambari-Hive config section UI

njayakumar · ‎08-27-2018

thanks @avoma

really helpful

Cloudera Community

Community Articles

Common LLAP questions answered

Apache Ambari

Apache HBase

Apache Hive

Apache Spark

Apache YARN

HDFS

Hortonworks Data Platform (HDP)

Re: Common LLAP questions answered