Finding the Right Cloud Configuration for Analytics Clusters
Vanir takes the approach of quickly finding a good enough configuration and then attempts to further optimize the configuration during production runs.
- metrics-based optimizer for the benchmarking runs
- Mondrian forest-based performance model
- transfer learning during production runs
Vanir is designed for a setting where a user needs to provision and set up an on-demand analytics cluster for each run of a batch processing job. In this scenario, it is often the case that a large fraction of these deployments are recurring, as supported by reports that more than 40% of the jobs in production clusters are recurring computations.
The main principle that Vanir adopts to cope with a large configuration search space is to find a good enough configuration via a fast benchmarking phase, and optimize that configuration during production runs, as the job recurs.
jointly identify both the type and number of instances for each framework. a cloud configuration is denoted as a vector \(C=\left\{\left\langle N_{1}, I_{1}\right\rangle, \ldots,\left\langle N_{n}, I_{n}\right\rangle\right\}\) where \(N_F\) is the number of instances for framework \(F \in \{1,\ldots,n\}\) and \(I_F\) is the corresponding instance type.
valid cloud configuration satisfies user-specified constraints on the maximum execution time and maximum execution cost.
Instance: CPU, Memory, Storage
uses a metrics-based algorithm as the offline optimizer, which uses CPU and memory resource utilization metrics (monitored during profiling runs) to determine the configuration of each framework.