# ActiveData-SpotManager

**Repository Path**: mirrors_mozilla/ActiveData-SpotManager

## Basic Information

- **Project Name**: ActiveData-SpotManager
- **Description**: DEPRECATED
- **Primary Language**: Unknown
- **License**: MPL-2.0
- **Default Branch**: manager-etl
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-10-13
- **Last Updated**: 2026-04-18

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README


# SpotManager

The SpotManager is a state-less program meant to be run periodically.  It 
finds the cheapest spot instance prices, bids, sets up the machines, and 
tears them down when done.

## Assumptions

The module assumes your workload is **long running** and has 
**many save-points**.    

In my case each machine is setup to pull small tasks off a queue and 
execute them. These machines can be shutdown at any time; with the most 
recent task simply placed back on the queue for some other machine to run.   

## Overview

This library works on a concept of ***utility***, which is an abstract value 
you assign to each EC2 instance type; the ***required utility*** is the 
primary input used to scale the number and type of instances. 

For each instance type (and zone), the `SpotManager` uses the historical 
pricing record to figure out a competitive bid (defined by `uptime`, below).
It combines that bid with the `utility` score for that instance type to get
an `estimated_value` (measured in utility per dollar). The instance types
with the best `estimated_value`, are bid on first.

## Requirements

* Python 2.7
* boto
* requests
* ecdsa (required by fabric, but not installed by pip)
* fabric2

## Installation

For now, you must clone the repo

	git clone https://github.com/mozilla/ActiveData-SpotManager.git

### Branches

There are three main branches

* **dev** - development done here (unstable)
* **manager-etl** - multithreaded management, not ready for ES node management (Oct 2018)
* **manager** - used to manage the staging clusters
* **master** - proven stable on **manager** for at least a few days


## Configuration

Each SpotManager instance requires a `settings.json` file that controls the 
SpotManager behaviour.  We will use the [ActiveData ETL settings file](examples/config/etl_settings.json) 
as an example to explain the parameters

	
* **`budget`** - Acts as an absolute spend limit for this SpotManager. Be sure 
you know your limits.
* **`max_utility_price`** - Whatever you decide a unit of ***utility*** is, 
you should set the highest price you are willing to pay for one.  This can 
ensure you do not go over the on-demand price, and prevents the SpotManager 
from bidding when everything is too expensive.
* **`max_new_utility`** - The most utility that will be requested per run. 
Used to prevent spikes in instance count on light loads.
* **`max_requests_per_type`** - Limit the number of requests per type.
Prevents all requests going to the cheapest instance type, consuming all 
available instances, and getting `az-constraint` on the remainder.  In the 
event of low availability, SpotManager will move on to the other types.
* **`max_percent_per_type`** - Limit the total number of instances, as a 
percent, per availability zone.  Some workloads benefit from not loosing all 
instances at once.  Distributing load over many instance types reduces the 
number of instances lost from any one price fluctuation.  *Default = 1.0 
(100%, no limit)*
* **`uptime`** - Parameters that help you balance expected uptime with cost 
([see below](#more-about-uptime)).
* **`availability_zone`** - List of availability zones the SpotManager can work in 
* **`product`** - For price lookup.  *Default 'Linux/UNIX (Amazon VPC)'*
* **`price_file`** - To minimize AWS calls, the previous price data is stored 
in a file for retrieval next time.
* **`run_interval`** - So the SpotManager knows how long before the next run 
will happen (Used to determine time remaining in the hour for an instance) 
* **`aws`** - a structure containing the parameters to [connect to AWS using boto](http://boto.readthedocs.org/en/latest/ref/ec2.html#boto.ec2.connection.EC2Connection)
* **`utility`** - a list of objects declaring the utility of each instance 
type.  Instance types not mentioned are assumed to have zero utility and 
will not be bid on, **and will be terminated if any exist*.* 
* **`ec2.request`** - template for making a [spot request using boto](http://boto.readthedocs.org/en/latest/ref/ec2.html#boto.ec2.connection.EC2Connection.request_spot_instances). This is where you declare the machine image, private keys, networking interfaces, etc.
* **`ec2.instance.name`** - Name that will be assigned to an instance (and 
to the spot requests).  It is important that no other machines under the AWS 
user have this prefix.  ***Any machines with this prefix will be under the 
control of SpotManager.***    
* **`instance`** -  The parameters that will be sent to the constructor for
your `InstanceManager`. 
* **`instance.class`** - An additional property in `instance`: The full name 
of the class you are using to setup/teardown an instance.
* **`debug`** - Settings for the [logging module](https://github.com/mozilla/ActiveData-SpotManager/blob/master/pyLibrary/debugs/README.md#configuration)

### More about `utility`

The utility list is a declaration of how much utility each instance type can 
provide, and  additional configuration that the InstanceManager can use for 
`setup()`.

### More about `uptime`

In order to make a good bid, the historical pricing record for each instance-
type and region is used. All these settings have defaults designed for quick-
setup tasks.  If your setup takes longer, or the value of your machine 
increases as it sticks around, you may want to set these values. Here are the 
settings we use for ElasticSearch nodes:

	"uptime":{
		"history": "week",
		"duration": "day",
		"bid_percentile": 0.95
	}

* **`history`** - How much history to use: Too little history and the bids can 
be terminated earlier than expected, too much history will make the algorithm 
unresponsive to lowering prices.
* **`duration`** - The window of time used to find the max price. The intent 
is to figure out what the bids should have been over the `history` so that you 
do not get terminated for the `duration`.  If your workload is quick to setup, 
then you can set it to zero (`0`).
* **bid_percentile** - With a `history`'s worth of max pricing, the question 
remains which price to pick: The `bid_percentile` is used to make that 
selection: Use `0.50` (median) to make aggressively low bids: Most of the bids
will fail, and there will be long hours when nothing new spins up. Use higher 
numbers to increase the chance of uptime.  It is never wise to set this to 
`1.00` because there is often some fool willing to bid more than on-demand.  

No matter your `uptime` settings, your bids will never go beyond your 
`budget`, and never go beyond `max_utility_price`.


### Configuring Volumes

Some workloads require large amounts of storage, but not all instances come 
with enough.  The **SpotManager** will map the ephemeral and EBS volumes or 
you.

As an example, the `c3.4xlarge` comes with two ephemeral drives, which can 
be found at `/dev/sdb` and two new EBS volumes, which will be assigned 
`device` properties at runtime.

		{
			"instance_type": "c3.4xlarge",
			"utility": 15,
			"drives": [
				{"path":"/data1", "device":"/dev/sdb"},
				{"path":"/data2", "device":"/dev/sdc"},
				{"path":"/data3", "size":1000, "volume_type":"standard"},
				{"path":"/data4", "size":1000, "volume_type":"standard"}
			]
		},

Some caveats:

* ***All volumes will be removed on termination*** - This is obvious for 
ephemeral drives, but the EBS will be removed too.  If you want the volume 
to be permanent, you must map the block device yourself.
* ***block devices will not be formatted nor mounted***.  The `path` is 
provided only so the `InstanceManger.setup()` routine can perform the `mkfs` 
and `mount` commands.

### Writing a InstanceManager

Conceptually, an instance manager is very simple, with only three methods 
you need to implement.  This repo has an example [`./examples/etl.py`](https://github.com/mozilla/ActiveData-SpotManager/blob/master/examples/etl.py) 
that you can review. 

* **`required_utility()`** - function to determine how much utility is 
needed.  Since you are the one defining utility, the amount you need is 
also up to you.  The `examples` uses the size of the pending queue to 
determine, roughly, how much utility is required.
* **`setup()`** - function is called to setup an instance.  It is passed 
both a boto ec2 instance object, and the utility this instance is 
expected to provide. This is run in its own thread, and multiple can be 
called at the same time; ensure your code is threadsafe. 
* **`teardown()`** - When the machine is no longer required, this will be 
called before SpotManager terminates the EC2 instance. This method is 
*not* called when AWS terminates the instance.  


## Benefits

The benefit of an `bid_percentile` price point is we want a reasonable up-time with a low 
price. We do not want a price set too high: we desire Amazon-initiated 
termination so we get the last partial hour free.  Also, some of instance 
types have unpredictable and extreme price swings; `SpotManager` allows you 
to utilize those valleys at minimal price exposure.

The more instance types your workload can run on, the more advantage you have 
finding minimal pricing:  Anecdotally, there is always an opportunity to be 
found: There is always an instance type going for significantly less than 
its utility would indicate.