Productivity
Master status: |
|
Dev status: |
|
Code quality: |
|
Latest versions: |
|
is very easy to learn but extremly versatile
provides intelligent optimization algorithms, support for all major machine-learning frameworks and many interesting applications
makes optimization data collection simple
saves your computation time
supports parallel computing
As its name suggests Hyperactive started as a hyperparameter optimization package, but it has been generalized to solve expensive gradient-free optimization problems. It uses the Gradient-Free-Optimizers package as an optimization-backend and expands on it with additional features and tools.
Optimization Techniques
![]() |
Tested and Supported Packages
![]() |
Optimization Applications
![]() |
Local Search:
Global Search:
Population Methods:
Sequential Methods: |
Machine Learning:
Deep Learning: Parallel Computing:
| Feature Engineering: Machine Learning: Deep Learning: Data Collection: Miscellaneous: |
The examples above are not necessarily done with realistic datasets or training procedures. The purpose is fast execution of the solution proposal and giving the user ideas for interesting usecases.
The following packages are designed to support Hyperactive and expand its use cases.
Package | Description |
---|---|
Search-Data-Collector | Simple tool to save search-data during or after the optimization run into csv-files. |
Search-Data-Explorer | Visualize search-data with plotly inside a streamlit dashboard. |
If you want news about Hyperactive and related projects you can follow me on twitter.
The most recent version of Hyperactive is available on PyPi:
pip install hyperactive
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import load_diabetes
from hyperactive import Hyperactive
data = load_diabetes()
X, y = data.data, data.target
# define the model in a function
def model(opt):
# pass the suggested parameter to the machine learning model
gbr = GradientBoostingRegressor(
n_estimators=opt["n_estimators"], max_depth=opt["max_depth"]
)
scores = cross_val_score(gbr, X, y, cv=4)
# return a single numerical value
return scores.mean()
# search space determines the ranges of parameters you want the optimizer to search through
search_space = {
"n_estimators": list(range(10, 150, 5)),
"max_depth": list(range(2, 12)),
}
# start the optimization run
hyper = Hyperactive()
hyper.add_search(model, search_space, n_iter=50)
hyper.run()
verbosity = ["progress_bar", "print_results", "print_times"]
distribution = "multiprocessing"
multiprocessing uses pickle
joblib uses dill
pathos uses cloudpickle
n_processes = "auto",
objective_function
search_space
n_iter
optimizer = "default"
Possible parameter types: ("default", initialized optimizer object)
Instance of optimization class that can be imported from Hyperactive. "default" corresponds to the random search optimizer. The imported optimization classes from hyperactive are different from gfo. They only accept optimizer-specific-parameters. The following classes can be imported and used:
Example:
...
opt_hco = HillClimbingOptimizer(epsilon=0.08)
hyper = Hyperactive()
hyper.add_search(..., optimizer=opt_hco)
hyper.run()
...
n_jobs = 1
initialize = {"grid": 4, "random": 2, "vertices": 4}
grid
vertices
random
warm_start
...
search_space = {
"x1": list(range(10, 150, 5)),
"x2": list(range(2, 12)),
}
ws1 = {"x1": 10, "x2": 2}
ws2 = {"x1": 15, "x2": 10}
hyper = Hyperactive()
hyper.add_search(
model,
search_space,
n_iter=30,
initialize={"grid": 4, "random": 10, "vertices": 4, "warm_start": [ws1, ws2]},
)
hyper.run()
pass_through = {}
Possible parameter types: (dict)
The pass_through accepts a dictionary that contains information that will be passed to the objective-function argument. This information will not change during the optimization run, unless the user does so by himself (within the objective-function).
Example:
...
def objective_function(para):
para.pass_through["stuff1"] # <--- this variable is 1
para.pass_through["stuff2"] # <--- this variable is 2
score = -para["x1"] * para["x1"]
return score
pass_through = {
"stuff1": 1,
"stuff2": 2,
}
hyper = Hyperactive()
hyper.add_search(
model,
search_space,
n_iter=30,
pass_through=pass_through,
)
hyper.run()
callbacks = {}
Possible parameter types: (dict)
The callbacks enables you to pass functions to hyperactive that are called every iteration during the optimization run. The function has access to the same argument as the objective-function. You can decide if the functions are called before or after the objective-function is evaluated via the keys of the callbacks-dictionary. The values of the dictionary are lists of the callback-functions. The following example should show they way to use callbacks:
Example:
...
def callback_1(access):
# do some stuff
def callback_2(access):
# do some stuff
def callback_3(access):
# do some stuff
hyper = Hyperactive()
hyper.add_search(
objective_function,
search_space,
n_iter=100,
callbacks={
"after": [callback_1, callback_2],
"before": [callback_3]
},
)
hyper.run()
catch = {}
Possible parameter types: (dict)
The catch parameter provides a way to handle exceptions that occur during the evaluation of the objective-function or the callbacks. It is a dictionary that accepts the exception class as a key and the score that is returned instead as the value. This way you can handle multiple types of exceptions and return different scores for each.
In the case of an exception it often makes sense to return np.nan
as a score. You can see an example of this in the following code-snippet:
Example:
...
hyper = Hyperactive()
hyper.add_search(
objective_function,
search_space,
n_iter=100,
catch={
ValueError: np.nan,
},
)
hyper.run()
max_score = None
early_stopping=None
(dict, None)
Stops the optimization run early if it did not achive any score-improvement within the last iterations. The early_stopping-parameter enables to set three parameters:
n_iter_no_change
: Non-optional int-parameter. This marks the last n iterations to look for an improvement over the iterations that came before n. If the best score of the entire run is within those last n iterations the run will continue (until other stopping criteria are met), otherwise the run will stop.tol_abs
: Optional float-paramter. The score must have improved at least this absolute tolerance in the last n iterations over the best score in the iterations before n. This is an absolute value, so 0.1 means an imporvement of 0.8 -> 0.9 is acceptable but 0.81 -> 0.9 would stop the run.tol_rel
: Optional float-paramter. The score must have imporved at least this relative tolerance (in percentage) in the last n iterations over the best score in the iterations before n. This is a relative value, so 10 means an imporvement of 0.8 -> 0.88 is acceptable but 0.8 -> 0.87 would stop the run.random_state = None
Possible parameter types: (int, None)
Random state for random processes in the random, numpy and scipy module.
memory = "share"
memory_warm_start = None
Possible parameter types: (pandas dataframe, None)
Pandas dataframe that contains score and parameter information that will be automatically loaded into the memory-dictionary.
example:
score | x1 | x2 | x... |
0.756 | 0.1 | 0.2 | ... |
0.823 | 0.3 | 0.1 | ... |
... | ... | ... | ... |
... | ... | ... | ... |
Each iteration consists of two steps:
The objective function has one argument that is often called "para", "params", "opt" or "access". This argument is your access to the parameter set that the optimizer has selected in the corresponding iteration.
def objective_function(opt):
# get x1 and x2 from the argument "opt"
x1 = opt["x1"]
x2 = opt["x2"]
# calculate the score with the parameter set
score = -(x1 * x1 + x2 * x2)
# return the score
return score
The objective function always needs a score, which shows how "good" or "bad" the current parameter set is. But you can also return some additional information with a dictionary:
def objective_function(opt):
x1 = opt["x1"]
x2 = opt["x2"]
score = -(x1 * x1 + x2 * x2)
other_info = {
"x1 squared" : x1**2,
"x2 squared" : x2**2,
}
return score, other_info
When you take a look at the results (a pandas dataframe with all iteration information) after the run has ended you will see the additional information in it. The reason we need a dictionary for this is because Hyperactive needs to know the names of the additonal parameters. The score does not need that, because it is always called "score" in the results. You can run this example script if you want to give it a try.
The search space defines what values the optimizer can select during the search. These selected values will be inside the objective function argument and can be accessed like in a dictionary. The values in each search space dimension should always be in a list. If you use np.arange you should put it in a list afterwards:
search_space = {
"x1": list(np.arange(-100, 101, 1)),
"x2": list(np.arange(-100, 101, 1)),
}
A special feature of Hyperactive is shown in the next example. You can put not just numeric values into the search space dimensions, but also strings and functions. This enables a very high flexibility in how you can create your studies.
def func1():
# do stuff
return stuff
def func2():
# do stuff
return stuff
search_space = {
"x": list(np.arange(-100, 101, 1)),
"str": ["a string", "another string"],
"function" : [func1, func2],
}
If you want to put other types of variables (like numpy arrays, pandas dataframes, lists, ...) into the search space you can do that via functions:
def array1():
return np.array([1, 2, 3])
def array2():
return np.array([3, 2, 1])
search_space = {
"x": list(np.arange(-100, 101, 1)),
"str": ["a string", "another string"],
"numpy_array" : [array1, array2],
}
The functions contain the numpy arrays and returns them. This way you can use them inside the objective function.
Each of the following optimizer classes can be initialized and passed to the "add_search"-method via the "optimizer"-argument. During this initialization the optimizer class accepts only optimizer-specific-paramters (no random_state, initialize, ... ):
optimizer = HillClimbingOptimizer(epsilon=0.1, distribution="laplace", n_neighbours=4)
for the default parameters you can just write:
optimizer = HillClimbingOptimizer()
and pass it to Hyperactive:
hyper = Hyperactive()
hyper.add_search(model, search_space, optimizer=optimizer, n_iter=100)
hyper.run()
So the optimizer-classes are different from Gradient-Free-Optimizers. A more detailed explanation of the optimization-algorithms and the optimizer-specific-paramters can be found in the Optimization Tutorial.
objective_function
returnes: dictionary
Parameter dictionary of the best score of the given objective_function found in the previous optimization run.
example:
{
'x1': 0.2,
'x2': 0.3,
}
objective_function
returns: Pandas dataframe
The dataframe contains score and parameter information of the given objective_function found in the optimization run. If the parameter times
is set to True the evaluation- and iteration- times are added to the dataframe.
example:
score | x1 | x2 | x... |
0.756 | 0.1 | 0.2 | ... |
0.823 | 0.3 | 0.1 | ... |
... | ... | ... | ... |
... | ... | ... | ... |
Are you sure the bug is located in Hyperactive?
The error might be located in the optimization-backend. Look at the error message from the command line. If one of the last messages look like this:
Then you should post the bug report in:
Otherwise you can post the bug report in Hyperactive
Do you have the correct Hyperactive version?
Every major version update (e.g. v2.2 -> v3.0) the API of Hyperactive changes. Check which version of Hyperactive you have. If your major version is older you have two options:
Recommended: You could just update your Hyperactive version with:
pip install hyperactive --upgrade
This way you can use all the new documentation and examples from the current repository.
Or you could continue using the old version and use an old repository branch as documentation. You can do that by selecting the corresponding branch. (top right of the repository. The default is "master" or "main") So if your major version is older (e.g. v2.1.0) you can select the 2.x.x branch to get the old repository for that version.
Provide example code for error reproduction To understand and fix the issue I need an example code to reproduce the error. I must be able to just copy the code into a py-file and execute it to reproduce the error.
This is expected of the current implementation of smb-optimizers. For all Sequential model based algorithms you have to keep your eyes on the search space size:
search_space_size = 1
for value_ in search_space.values():
search_space_size *= len(value_)
print("search_space_size", search_space_size)
Reduce the search space size to resolve this error.
This is because you have classes and/or non-top-level objects in the search space. Pickle (used by multiprocessing) cannot serialize them. Setting distribution to "joblib" or "pathos" may fix this problem:
hyper = Hyperactive(distribution="joblib")
Very often warnings from sklearn or numpy. Those warnings do not correlate with bad performance from Hyperactive. Your code will most likely run fine. Those warnings are very difficult to silence.
It should help to put this at the very top of your script:
def warn(*args, **kwargs):
pass
import warnings
warnings.warn = warn
This warning occurs because Hyperactive needs more initial positions to choose from to generate a population for the optimization algorithm:
The number of initial positions is determined by the initialize
-parameter in the add_search
-method.
# This is how it looks per default
initialize = {"grid": 4, "random": 2, "vertices": 4}
# You could set it to this for a maximum population of 20
initialize = {"grid": 4, "random": 12, "vertices": 4}
@Misc{hyperactive2021,
author = {{Simon Blanke}},
title = {{Hyperactive}: An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.},
howpublished = {\url{https://github.com/SimonBlanke}},
year = {since 2019}
}