After self-implementing a grid-search but having a horrible timewriting pyplot visualizing the result, I finally decided to find anexisting tool to do the HP tuning for me.

<p>There are two popular HP tuning framework</p><ul><li>RayTune:almost industry standard</li><li>Optuna: user friendly, requiresminimal modification to original code</li></ul><p>There’s also <ahref=”https://github.com/skorch-dev/skorch”>skorch</a> integratingscikit-learn and pytorch, so you can use <ahref=”https://skorch.readthedocs.io/en/v1.0.0/user/quickstart.html#grid-search”>sklearnGridSearchCV</a>. For our simple task, we will go withOptuna.</p><h2 id="getting-started">Getting Started</h2><p>To get Optuna running, you just need to add 4 lines in your traininglogic and a few more lines to start its search. In training logic:</p>

<table><tr><td class="gutter"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
</pre></td><td class="code"><pre>def train_model(image_datasets, lr, weight_decay, num_epochs, trial : optuna.trial.Trial=None):
optimizer = optim.AdamW(model.parameters(), lr=lr, weight_decay=weight_decay)
for epoch in range(num_epochs):
model.train()
for inputs, labels in dataloaders["train"]:
# Training Logic
model.eval()
for inputs, labels in dataloaders["val"]:
running_loss += loss.item() * inputs.size(0)
# Eval Logic
epoch_loss = running_loss / dataset_sizes["val"]
if epoch_acc > best_acc or (epoch_acc == best_acc and epoch_loss < best_loss):
best_acc, best_loss = epoch_acc, epoch_loss
""" OPTUNA CODE GOES HERE:
For each epoch, you should report value of a user-defined factor.
Optuna uses this factor alone to determine whether to prune out
this trial at current epoch step. Your objective value returned
has nothing to do with pruning.
Read for more at: https://optuna.readthedocs.io/en/v3.6.1/reference/generated/optuna.trial.Trial.html#optuna.trial.Trial.report
"""
if trial is not None:
trial.report(epoch_loss, epoch)
if trial.should_prune():
raise optuna.exceptions.TrialPruned()
return best_loss
</pre></td></tr></table><p>The following code shows how to set the search space and start thesearch.</p>

<table><tr><td class="gutter"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
</pre></td><td class="code"><pre>def optuna_objective(trial : optuna.trial.Trial):
""" Define a custom objective function we want to optimize on.
This function returns value of the criteria you want to finally evaluate your model on.
i.e. how you compare different models. The best model should have the best value of this objective.
If you say the best model should have highest training accuracy at the last epoch, then return training accuracy at the last epoch here. In our example, we think the best model should have the best best_loss, where a model's best_loss is this model's lowest validation loss across all epochs.
"""
image_datasets = prepare_data()
lr = trial.suggest_float("lr", 1e-6, 1e-1, log=True)
weight_decay = trial.suggest_float("weight_decay", 1e-6, 1e-1, log=True)
loss = train_model(image_datasets, lr, weight_decay, 15, trial)
return loss

if name == "main":
"""
Create a study called plant_144 where we minimize the objective passed in.
Start the search. The search ends when we finish 10 trials or spend 3 hours.
"""
study = optuna.create_study(
direction="minimize",
study_name="plant_144")
study.optimize(optuna_objective, n_trials=10, timeout=36060)
print(" Objective Value: ", study.best_trial.value)
print(" Params: ")
for key, value in study.best_trial.params.items():
print(f" {key}: {value}")
</pre></td></tr></table><p>The above example was adapted from <ahref=”https://github.com/optuna/optuna-examples/blob/ecc3e4282161f3cece1dc26d95f4186e3905e497/pytorch/pytorch_simple.py”>Optuna’sPyTorch starting example</a>. For more reporting printout statements,check the original example.</p><h2 id="saving-study-and-board-visualization">Saving Study and BoardVisualization</h2><p>In addition to printing out all the info to the console and losingthem from memory after this python script finishes, we can save them inthe form of an RDB (Relational Database, or just database as mostdatabases are RDB). To do this, we pass a database URL to thestorage argument</p>

<table><tr><td class="gutter"><pre>1
2
3
4
</pre></td><td class="code"><pre>study = optuna.create_study(
direction="minimize",
study_name="plant_144",
storage="sqlite:///db.sqlite3")
</pre></td></tr></table><p>You can now Ctrl+C stop this search at anytime and resume it byrunning the same code again.</p><p>Database exposes itself as a server in machines. Therefore, to accessit (even in local machine), we use Database URL. Just like to access awebpage online, we use an HTTPS url. In our example here, the historywill be stored in a file called db.sqlite3 under currentdirectory.</p><p>This file is a general database and can store study other than theone called plant_144. You can store another study insideit.</p>

<table><tr><td class="gutter"><pre>1
2
3
4
</pre></td><td class="code"><pre>study = optuna.create_study(
direction="maximize",
study_name="plant_8",
storage="sqlite:///db.sqlite3")
</pre></td></tr></table><p>For me this code just worked without having to install SQLite DB.This is probably because it comes with my Ubuntu but I have no idea.Check official tutorial <ahref=”https://optuna.readthedocs.io/en/v3.6.1/tutorial/20_recipes/001_rdb.html”>Saving/ResumingStudy</a> for more on saving and loading.</p><p>You can now visualize the search history, each parameter’simportance, etc. with <ahref=”https://github.com/optuna/optuna-dashboard”>optuna-dashboard</a></p>

<table><tr><td class="gutter"><pre>1
</pre></td><td class="code"><pre>optuna-dashboard sqlite:///db.sqlite3
</pre></td></tr></table>

<h2 id="multi-gpu-parallelism-support">Multi-GPU ParallelismSupport</h2><p>roman’s Stack Overflowanswer provides a very simple way to do multi-GPU tuning byutilizing Optuna’s resume feature. To do so, create a study by followingthe previous code. Then modify your code now to resume instead ofstarting a new study.</p>

<table><tr><td class="gutter"><pre>1
2
3
</pre></td><td class="code"><pre>if name == 'main':
study = optuna.load_study(study_name='plant_144', storage='sqlite:///db.sqlite3')
study.optimize(objective, n_trials=100)
</pre></td></tr></table><p>and simply start “resume” this study on different available GPUs</p>

<table><tr><td class="gutter"><pre>1
2
</pre></td><td class="code"><pre>CUDA_VISIBLE_DEVICES=3 nohup python optuna.py > log3.txt 2>&1 &
CUDA_VISIBLE_DEVICES=5 nohup python optuna.py > log5.txt 2>&1 &
</pre></td></tr></table><p>The history from both processes will be stored under the study calledplant_144 in file db.sqlite3.</p><p>For more information on parallelizing on multiple gpu, check officialguide: <ahref=”https://optuna.readthedocs.io/en/v3.6.1/tutorial/10_key_features/004_distributed.html”>EasyParallelization</a></p><h2 id="some-complaints">Some Complaints</h2><p>In its visualization, Optuna doesn’t provide an option to filter outthe “bad” trial runs, making the scale of all graph ridiculous andusually of no information.</p>