Support Questions
Find answers, ask questions, and share your expertise

What are the Best Practices to deal with the historical Data that is not frequently queried on kudu?

What are the Best Practices to deal with the historical Data that is not frequently queried on kudu?

Contributor
  • We have kudu 1.10.0 with 3 master and 5 tablet servers each tablet server has 3.5TB OF SSD(Solid State Drive) attached to it and it is hosted in AWS. Data is queried through Impala
  • We have historical data from previous years which will not be updated and is not queried frequently as well, storing this on a SSD is bit expensive.
  • Solutions which I can think of for dealing with this is.
  1. Move data from previous years to HDFS(this will point to a high latency storage like HDD) or S3 with this we can still query the data using Impala as common query engine.
  2. Use HDD(Hard Disk Drive) as storage instead of SSD. But this may Impact the performance. What is recommended to use with Kudu HDD or SSD?
  • Is it possible to configure such that some partitions of Kudu table reside on High latency storage like(HDD) and some of them reside on Low latency storage like(SSD)?