- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
apache drill queries
Created 06-04-2016 07:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi:
I have one question, apache drill make the queries in memory??? and he keep the metastore in memory??
thanks
Created 06-04-2016 03:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
metastore comes into the picture when drill tries to query data stored in hive tables and it merely used to know the schema of the hive table otherwise it has a capability to query certain datastore by evaluating schema on the fly.
on most of the data store drill uses direct memory to do all the computation but for hive tables if the data it is stored in orc or parquet it leverage hive orc or hive parquet reader to query the data which eventually read the data in java heap.
he keep the metastore in memory??
No,Drill do not keep compete metastore in memory, during query parsing and planning phase it query hive metastore service to know the schema so that it can validate the query and plan accordingly.
Created 06-04-2016 03:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
metastore comes into the picture when drill tries to query data stored in hive tables and it merely used to know the schema of the hive table otherwise it has a capability to query certain datastore by evaluating schema on the fly.
on most of the data store drill uses direct memory to do all the computation but for hive tables if the data it is stored in orc or parquet it leverage hive orc or hive parquet reader to query the data which eventually read the data in java heap.
he keep the metastore in memory??
No,Drill do not keep compete metastore in memory, during query parsing and planning phase it query hive metastore service to know the schema so that it can validate the query and plan accordingly.
Created 06-04-2016 03:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Many thanks, so, i conclusion, my files are hive in ORC format, so all the computation will be in memory right??
Also where keep drill the hive metastore???
thanks
Created 06-04-2016 04:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Drill uses Java direct memory as well as Java Heap memory to do the computation. if you have hive orc table drill will do the computation in "drill java heap memory" not in "drill direct memory" (physical memory). depending on hive storage plugin configuration (given below), During query planning phase Drill will query your metastore service driven by property 'hive.metastore.uris' to know the schema and other required information and prepare query plan. for better performance drill also support the caching of hive metadata into drill cache which is controlled by "hive.metastore.cache-ttl-seconds" and "hive.metastore.cache-expire-after".cache-ttl-seconds value can be any non-negative value, including 0, which turns caching off. The cache-expire-after value can be “access” or “write”. Access indicates expiry after a read or write operation, and write indicates expiry after a write operation only.
{
"type": "hive",
"enabled": false,
"configProps": {
"hive.metastore.uris": "thrift://hostname:9083",
"hive.metastore.sasl.enabled": "false",
"fs.default.name": "hdfs://nmhostname/"
}
}
Created 06-06-2016 03:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, thanks alot, this is my file, i am update the metastore just when iam writing, can i do anything else to improve the query????
{ "type": "hive", "enabled": true, "configProps": { "hive.metastore.uris": "thrift://hostname:9083", "javax.jdo.option.ConnectionURL": "jdbc:mysql://hostname/drill", "hive.metastore.warehouse.dir": "/tmp/drill_hive_wh", "fs.default.name": "hdfs://hostname:8020", "hive.metastore.sasl.enabled": "false", "hive.metastore.cache-ttl-seconds": "2", "hive.metastore.cache-expire-after": "write" } }
Created 06-07-2016 05:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
the configuration looks gud,if you want to cache hive meta for longer period of time you can increase the hive.metastore.cache-ttl-seconds value.
Created 06-07-2016 05:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
if hive.metastore.cache-ttl-seconds is 2 minutes and I insert new data in hive, drill between this 2 minutes, drill will have the new data?????
Created 06-07-2016 06:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is a distinction between table metadata and table data. table metadata is stored in hive metastore.if you want to cache the hive meta into drill then for defined hive.metastore.cache-ttl-seconds your meta data will be cache. you are able to see the new inserted data into the table until you alter the table. as soon as you alter the table your metastore will be updated with the new table structure but your drill cache will have old entries only and you will not see new table data.
Created 06-08-2016 06:26 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Roberto Sancho it seems that all your queries has been answered, could you please spare some time and accept a best answer in this thread.
