Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

apache drill queries

avatar
Master Collaborator

Hi:

I have one question, apache drill make the queries in memory??? and he keep the metastore in memory??

thanks

1 ACCEPTED SOLUTION

avatar
Super Guru
@Roberto Sancho

metastore comes into the picture when drill tries to query data stored in hive tables and it merely used to know the schema of the hive table otherwise it has a capability to query certain datastore by evaluating schema on the fly.

on most of the data store drill uses direct memory to do all the computation but for hive tables if the data it is stored in orc or parquet it leverage hive orc or hive parquet reader to query the data which eventually read the data in java heap.

he keep the metastore in memory??

No,Drill do not keep compete metastore in memory, during query parsing and planning phase it query hive metastore service to know the schema so that it can validate the query and plan accordingly.

View solution in original post

8 REPLIES 8

avatar
Super Guru
@Roberto Sancho

metastore comes into the picture when drill tries to query data stored in hive tables and it merely used to know the schema of the hive table otherwise it has a capability to query certain datastore by evaluating schema on the fly.

on most of the data store drill uses direct memory to do all the computation but for hive tables if the data it is stored in orc or parquet it leverage hive orc or hive parquet reader to query the data which eventually read the data in java heap.

he keep the metastore in memory??

No,Drill do not keep compete metastore in memory, during query parsing and planning phase it query hive metastore service to know the schema so that it can validate the query and plan accordingly.

avatar
Master Collaborator

Many thanks, so, i conclusion, my files are hive in ORC format, so all the computation will be in memory right??

Also where keep drill the hive metastore???

thanks

avatar
Super Guru

Drill uses Java direct memory as well as Java Heap memory to do the computation. if you have hive orc table drill will do the computation in "drill java heap memory" not in "drill direct memory" (physical memory). depending on hive storage plugin configuration (given below), During query planning phase Drill will query your metastore service driven by property 'hive.metastore.uris' to know the schema and other required information and prepare query plan. for better performance drill also support the caching of hive metadata into drill cache which is controlled by "hive.metastore.cache-ttl-seconds" and "hive.metastore.cache-expire-after".cache-ttl-seconds value can be any non-negative value, including 0, which turns caching off. The cache-expire-after value can be “access” or “write”. Access indicates expiry after a read or write operation, and write indicates expiry after a write operation only.

{
    "type": "hive",
    "enabled": false,
    "configProps": {
    "hive.metastore.uris": "thrift://hostname:9083",
    "hive.metastore.sasl.enabled": "false",
    "fs.default.name": "hdfs://nmhostname/"
    }
  }

avatar
Master Collaborator

Ok, thanks alot, this is my file, i am update the metastore just when iam writing, can i do anything else to improve the query????

{
  "type": "hive",
  "enabled": true,
  "configProps": {
    "hive.metastore.uris": "thrift://hostname:9083",
    "javax.jdo.option.ConnectionURL": "jdbc:mysql://hostname/drill",
    "hive.metastore.warehouse.dir": "/tmp/drill_hive_wh",
    "fs.default.name": "hdfs://hostname:8020",
    "hive.metastore.sasl.enabled": "false",
    "hive.metastore.cache-ttl-seconds": "2",
    "hive.metastore.cache-expire-after": "write"
  }
}

avatar
Super Guru

@Roberto Sancho

the configuration looks gud,if you want to cache hive meta for longer period of time you can increase the hive.metastore.cache-ttl-seconds value.

avatar
Master Collaborator

if hive.metastore.cache-ttl-seconds is 2 minutes and I insert new data in hive, drill between this 2 minutes, drill will have the new data?????

avatar
Super Guru
@Roberto Sancho

There is a distinction between table metadata and table data. table metadata is stored in hive metastore.if you want to cache the hive meta into drill then for defined hive.metastore.cache-ttl-seconds your meta data will be cache. you are able to see the new inserted data into the table until you alter the table. as soon as you alter the table your metastore will be updated with the new table structure but your drill cache will have old entries only and you will not see new table data.

avatar
Super Guru

@Roberto Sancho it seems that all your queries has been answered, could you please spare some time and accept a best answer in this thread.