Hyperparameter Optimization¶

This document provides a walkthrough of the hyperparameter optimization example. To run the application, first install some dependencies.

pip install tensorflow

You can view the code for this example.

The simple script that processes results as they become available and launches new experiments can be run as follows.

python ray/examples/hyperopt/hyperopt_simple.py --trials=5 --steps=10

The variant that divides training into multiple segments and aggressively terminates poorly performing models can be run as follows.

python ray/examples/hyperopt/hyperopt_adaptive.py --num-starting-segments=5 \
                                                  --num-segments=10 \
                                                  --steps-per-segment=20

Machine learning algorithms often have a number of hyperparameters whose values must be chosen by the practitioner. For example, an optimization algorithm may have a step size, a decay rate, and a regularization coefficient. In a deep network, the network parameterization itself (e.g., the number of layers and the number of units per layer) can be considered a hyperparameter.

Choosing these parameters can be challenging, and so a common practice is to search over the space of hyperparameters. One approach that works surprisingly well is to randomly sample different options.

Problem Setup¶

Suppose that we want to train a convolutional network, but we aren’t sure how to choose the following hyperparameters:

the learning rate
the batch size
the dropout probability
the standard deviation of the distribution from which to initialize the network weights

Suppose that we’ve defined a remote function train_cnn_and_compute_accuracy, which takes values for these hyperparameters as its input (along with the dataset), trains a convolutional network using those hyperparameters, and returns the accuracy of the trained model on a validation set.

import numpy as np
import ray

@ray.remote
def train_cnn_and_compute_accuracy(hyperparameters,
                                   train_images,
                                   train_labels,
                                   validation_images,
                                   validation_labels):
  # Construct a deep network, train it, and return the accuracy on the
  # validation data.
  return np.random.uniform(0, 1)

Basic random search¶

Something that works surprisingly well is to try random values for the hyperparameters. For example, we can write a function that randomly generates hyperparameter configurations.

def generate_hyperparameters():
  # Randomly choose values for the hyperparameters.
  return {"learning_rate": 10 ** np.random.uniform(-5, 5),
          "batch_size": np.random.randint(1, 100),
          "dropout": np.random.uniform(0, 1),
          "stddev": 10 ** np.random.uniform(-5, 5)}

In addition, let’s assume that we’ve started Ray and loaded some data.

import ray

ray.init()

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
train_images = ray.put(mnist.train.images)
train_labels = ray.put(mnist.train.labels)
validation_images = ray.put(mnist.validation.images)
validation_labels = ray.put(mnist.validation.labels)

Then basic random hyperparameter search looks something like this. We launch a bunch of experiments, and we get the results.

# Generate a bunch of hyperparameter configurations.
hyperparameter_configurations = [generate_hyperparameters() for _ in range(20)]

# Launch some experiments.
results = []
for hyperparameters in hyperparameter_configurations:
  results.append(train_cnn_and_compute_accuracy.remote(hyperparameters,
                                                       train_images,
                                                       train_labels,
                                                       validation_images,
                                                       validation_labels))

# Get the results.
accuracies = ray.get(results)

Then we can inspect the contents of accuracies and see which set of hyperparameters worked the best. Note that in the above example, the for loop will run instantaneously and the program will block in the call to ray.get, which will wait until all of the experiments have finished.

Processing results as they become available¶

One problem with the above approach is that you have to wait for all of the experiments to finish before you can process the results. Instead, you may want to process the results as they become available, perhaps in order to adaptively choose new experiments to run, or perhaps simply so you know how well the experiments are doing. To process the results as they become available, we can use the ray.wait primitive.

The most simple usage is the following. This example is implemented in more detail in driver.py.

# Launch some experiments.
remaining_ids = []
for hyperparameters in hyperparameter_configurations:
  remaining_ids.append(train_cnn_and_compute_accuracy.remote(hyperparameters,
                                                             train_images,
                                                             train_labels,
                                                             validation_images,
                                                             validation_labels))

# Whenever a new experiment finishes, print the value and start a new
# experiment.
for i in range(100):
  ready_ids, remaining_ids = ray.wait(remaining_ids, num_returns=1)
  accuracy = ray.get(ready_ids[0])
  print("Accuracy is {}".format(accuracy))
  # Start a new experiment.
  new_hyperparameters = generate_hyperparameters()
  remaining_ids.append(train_cnn_and_compute_accuracy.remote(new_hyperparameters,
                                                             train_images,
                                                             train_labels,
                                                             validation_images,
                                                             validation_labels))

More sophisticated hyperparameter search¶

Hyperparameter search algorithms can get much more sophisticated. So far, we’ve been treating the function train_cnn_and_compute_accuracy as a black box, that we can choose its inputs and inspect its outputs, but once we decide to run it, we have to run it until it finishes.

However, there is often more structure to be exploited. For example, if the training procedure is going poorly, we can end the session early and invest more resources in the more promising hyperparameter experiments. And if we’ve saved the state of the training procedure, we can always restart it again later.

This is one of the ideas of the Hyperband algorithm. Start with a huge number of hyperparameter configurations, aggressively stop the bad ones, and invest more resources in the promising experiments.

To implement this, we can first adapt our training method to optionally take a model and to return the updated model.

@ray.remote
def train_cnn_and_compute_accuracy(hyperparameters, model=None):
  # Construct a deep network, train it, and return the accuracy on the
  # validation data as well as the latest version of the model. If the model
  # argument is not None, this will continue training an existing model.
  validation_accuracy = np.random.uniform(0, 1)
  new_model = model
  return validation_accuracy, new_model

Here’s a different variant that uses the same principles. Divide each training session into a series of shorter training sessions. Whenever a short session finishes, if it still looks promising, then continue running it. If it isn’t doing well, then terminate it and start a new experiment.

import numpy as np

def is_promising(model):
  # Return true if the model is doing well and false otherwise. In practice,
  # this function will want more information than just the model.
  return np.random.choice([True, False])

# Start 10 experiments.
remaining_ids = []
for _ in range(10):
  experiment_id = train_cnn_and_compute_accuracy.remote(hyperparameters, model=None)
  remaining_ids.append(experiment_id)

accuracies = []
for i in range(100):
  # Whenever a segment of an experiment finishes, decide if it looks promising
  # or not.
  ready_ids, remaining_ids = ray.wait(remaining_ids, num_returns=1)
  experiment_id = ready_ids[0]
  current_accuracy, current_model = ray.get(experiment_id)
  accuracies.append(current_accuracy)

  if is_promising(experiment_id):
    # Continue running the experiment.
    experiment_id = train_cnn_and_compute_accuracy.remote(hyperparameters,
                                                          model=current_model)
  else:
    # Start a new experiment.
    experiment_id = train_cnn_and_compute_accuracy.remote(hyperparameters)

  remaining_ids.append(experiment_id)