Support Questions

pacosoplas · ‎06-04-2016

Hi:

I have one question, apache drill make the queries in memory??? and he keep the metastore in memory??

thanks

rajkumar_singh · ‎06-04-2016

metastore comes into the picture when drill tries to query data stored in hive tables and it merely used to know the schema of the hive table otherwise it has a capability to query certain datastore by evaluating schema on the fly.

on most of the data store drill uses direct memory to do all the computation but for hive tables if the data it is stored in orc or parquet it leverage hive orc or hive parquet reader to query the data which eventually read the data in java heap.

he keep the metastore in memory??

No,Drill do not keep compete metastore in memory, during query parsing and planning phase it query hive metastore service to know the schema so that it can validate the query and plan accordingly.

View solution in original post

rajkumar_singh · ‎06-04-2016

@Roberto Sancho

metastore comes into the picture when drill tries to query data stored in hive tables and it merely used to know the schema of the hive table otherwise it has a capability to query certain datastore by evaluating schema on the fly.

on most of the data store drill uses direct memory to do all the computation but for hive tables if the data it is stored in orc or parquet it leverage hive orc or hive parquet reader to query the data which eventually read the data in java heap.

he keep the metastore in memory??

No,Drill do not keep compete metastore in memory, during query parsing and planning phase it query hive metastore service to know the schema so that it can validate the query and plan accordingly.

pacosoplas · ‎06-04-2016

Many thanks, so, i conclusion, my files are hive in ORC format, so all the computation will be in memory right??

Also where keep drill the hive metastore???

thanks

rajkumar_singh · ‎06-04-2016

Drill uses Java direct memory as well as Java Heap memory to do the computation. if you have hive orc table drill will do the computation in "drill java heap memory" not in "drill direct memory" (physical memory). depending on hive storage plugin configuration (given below), During query planning phase Drill will query your metastore service driven by property 'hive.metastore.uris' to know the schema and other required information and prepare query plan. for better performance drill also support the caching of hive metadata into drill cache which is controlled by "hive.metastore.cache-ttl-seconds" and "hive.metastore.cache-expire-after".cache-ttl-seconds value can be any non-negative value, including 0, which turns caching off. The cache-expire-after value can be “access” or “write”. Access indicates expiry after a read or write operation, and write indicates expiry after a write operation only.

{
    "type": "hive",
    "enabled": false,
    "configProps": {
    "hive.metastore.uris": "thrift://hostname:9083",
    "hive.metastore.sasl.enabled": "false",
    "fs.default.name": "hdfs://nmhostname/"
    }
  }

pacosoplas · ‎06-06-2016

Ok, thanks alot, this is my file, i am update the metastore just when iam writing, can i do anything else to improve the query????

{
  "type": "hive",
  "enabled": true,
  "configProps": {
    "hive.metastore.uris": "thrift://hostname:9083",
    "javax.jdo.option.ConnectionURL": "jdbc:mysql://hostname/drill",
    "hive.metastore.warehouse.dir": "/tmp/drill_hive_wh",
    "fs.default.name": "hdfs://hostname:8020",
    "hive.metastore.sasl.enabled": "false",
    "hive.metastore.cache-ttl-seconds": "2",
    "hive.metastore.cache-expire-after": "write"
  }
}

rajkumar_singh · ‎06-07-2016

@Roberto Sancho

the configuration looks gud,if you want to cache hive meta for longer period of time you can increase the hive.metastore.cache-ttl-seconds value.

pacosoplas · ‎06-07-2016

if hive.metastore.cache-ttl-seconds is 2 minutes and I insert new data in hive, drill between this 2 minutes, drill will have the new data?????

rajkumar_singh · ‎06-07-2016

@Roberto Sancho

There is a distinction between table metadata and table data. table metadata is stored in hive metastore.if you want to cache the hive meta into drill then for defined hive.metastore.cache-ttl-seconds your meta data will be cache. you are able to see the new inserted data into the table until you alter the table. as soon as you alter the table your metastore will be updated with the new table structure but your drill cache will have old entries only and you will not see new table data.

rajkumar_singh · ‎06-08-2016

@Roberto Sancho it seems that all your queries has been answered, could you please spare some time and accept a best answer in this thread.

Cloudera Community

Support Questions

apache drill queries

Apache Drill. connecting metastore

Cant run query on Drill

Cant run query on Drill

Using Apache NiFi with Apache Pulsar for Streaming

Apache Metron Explained!

Cache Aware Load Balancer in Apache HBase

Monitoring Apache Knox

Ingesting SQL Server Tables into Hive via Apache N...

Apache Calcite - Introduction and Demo

Does Apache Phoenix or Drill support Binary Avro f...