Yao Lirong's Blog

Hyper-Parameter Tuning with Optuna

2024/08/23
loading

After self-implementing a grid-search but having a horrible time writing pyplot visualizing the result, I finally decided to find an existing tool to do the HP tuning for me.

There are two popular HP tuning framework

  • RayTune: almost industry standard
  • Optuna: user friendly, requires minimal modification to original code

There’s also skorch integrating scikit-learn and pytorch, so you can use sklearn GridSearchCV. For our simple task, we will go with Optuna.

Getting Started

To get Optuna running, you just need to add 4 lines in your training logic and a few more lines to start its search. In training logic:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def train_model(image_datasets, lr, weight_decay, num_epochs, trial : optuna.trial.Trial=None):
optimizer = optim.AdamW(model.parameters(), lr=lr, weight_decay=weight_decay)
for epoch in range(num_epochs):
model.train()
for inputs, labels in dataloaders["train"]:
# Training Logic
model.eval()
for inputs, labels in dataloaders["val"]:
running_loss += loss.item() * inputs.size(0)
# Eval Logic
epoch_loss = running_loss / dataset_sizes["val"]
if epoch_acc > best_acc or (epoch_acc == best_acc and epoch_loss < best_loss):
best_acc, best_loss = epoch_acc, epoch_loss
""" OPTUNA CODE GOES HERE:
For each epoch, you should report value of a user-defined factor.
Optuna uses this factor alone to determine whether to prune out
this trial at current epoch step. Your objective value returned
has nothing to do with pruning.
Read for more at: https://optuna.readthedocs.io/en/v3.6.1/reference/generated/optuna.trial.Trial.html#optuna.trial.Trial.report
"""
if trial is not None:
trial.report(epoch_loss, epoch)
if trial.should_prune():
raise optuna.exceptions.TrialPruned()
return best_loss

The following code shows how to set the search space and start the search.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def optuna_objective(trial : optuna.trial.Trial):
""" Define a custom objective function we want to optimize on.
This function returns value of the criteria you want to finally evaluate your model on.
i.e. how you compare different models. The best model should have the best value of this objective.
If you say the best model should have highest training accuracy at the last epoch, then return training accuracy at the last epoch here. In our example, we think the best model should have the best `best_loss`, where a model's `best_loss` is this model's lowest validation loss across all epochs.
"""
image_datasets = prepare_data()
lr = trial.suggest_float("lr", 1e-6, 1e-1, log=True)
weight_decay = trial.suggest_float("weight_decay", 1e-6, 1e-1, log=True)
loss = train_model(image_datasets, lr, weight_decay, 15, trial)
return loss

if __name__ == "__main__":
"""
Create a study called `plant_144` where we minimize the objective passed in.
Start the search. The search ends when we finish 10 trials or spend 3 hours.
"""
study = optuna.create_study(
direction="minimize",
study_name="plant_144")
study.optimize(optuna_objective, n_trials=10, timeout=3*60*60)
print(" Objective Value: ", study.best_trial.value)
print(" Params: ")
for key, value in study.best_trial.params.items():
print(f" {key}: {value}")

The above example was adapted from Optuna’s PyTorch starting example. For more reporting printout statements, check the original example.

Saving Study and Board Visualization

In addition to printing out all the info to the console and losing them from memory after this python script finishes, we can save them in the form of an RDB (Relational Database, or just database as most databases are RDB). To do this, we pass a database URL to the storage argument

1
2
3
4
study = optuna.create_study(
direction="minimize",
study_name="plant_144",
storage="sqlite:///db.sqlite3")

You can now Ctrl+C stop this search at anytime and resume it by running the same code again.

Database exposes itself as a server in machines. Therefore, to access it (even in local machine), we use Database URL. Just like to access a webpage online, we use an HTTPS url. In our example here, the history will be stored in a file called db.sqlite3 under current directory.

This file is a general database and can store study other than the one called plant_144. You can store another study inside it.

1
2
3
4
study = optuna.create_study(
direction="maximize",
study_name="plant_8",
storage="sqlite:///db.sqlite3")

For me this code just worked without having to install SQLite DB. This is probably because it comes with my Ubuntu but I have no idea. Check official tutorial Saving/Resuming Study for more on saving and loading.

You can now visualize the search history, each parameter’s importance, etc. with optuna-dashboard

1
optuna-dashboard sqlite:///db.sqlite3
optuna-dashboard

Multi-GPU Parallelism Support

roman’s Stack Overflow answer provides a very simple way to do multi-GPU tuning by utilizing Optuna’s resume feature. To do so, create a study by following the previous code. Then modify your code now to resume instead of starting a new study.

1
2
3
if __name__ == '__main__':
study = optuna.load_study(study_name='plant_144', storage='sqlite:///db.sqlite3')
study.optimize(objective, n_trials=100)

and simply start “resume” this study on different available GPUs

1
2
CUDA_VISIBLE_DEVICES=3 nohup python optuna.py > log3.txt 2>&1 &
CUDA_VISIBLE_DEVICES=5 nohup python optuna.py > log5.txt 2>&1 &

The history from both processes will be stored under the study called plant_144 in file db.sqlite3.

For more information on parallelizing on multiple gpu, check official guide: Easy Parallelization

Some Complaints

In its visualization, Optuna doesn’t provide an option to filter out the “bad” trial runs, making the scale of all graph ridiculous and usually of no information.

CATALOG
  1. 1. Getting Started
  2. 2. Saving Study and Board Visualization
  3. 3. Multi-GPU Parallelism Support
  4. 4. Some Complaints