Collaboration
A unified Python package that standardizes existing implementations of similarity measures to faciliate comparisons across studies.
Paper: A Framework for Standardizing Similarity Measures in a Rapidly Evolving Field
Install via pip:
pip install git+https://github.com/nacloos/similarity-repository.git
For faster installation using uv
in a virtual environment (add --system
to install outside of virtual environment):
pip install uv
uv pip install git+https://github.com/nacloos/similarity-repository.git
Alternatively, clone and install locally:
git clone https://github.com/nacloos/similarity-repository.git
cd similarity-repository
pip install -e .
import numpy as np
import similarity
# generate two datasets
X, Y = np.random.randn(100, 30), np.random.randn(100, 30)
# measure their similarity
measure = similarity.make("measure/netrep/procrustes-distance=angular")
score = measure(X, Y)
Each similarity measure has a unique identifier composed of three parts:
measure
)See similarity/types/__init__.py
for a complete list of implemented measures.
All measures follow this interface:
X, Y
- numpy arrays of shape (n_samples, n_features)
score
- float valueSelect all measures from a specific repository:
measures = similarity.make("measure/netrep/*")
for name, measure in measures.items():
score = measure(X, Y)
print(f"{name}: {score}")
Select all implementations of a specific measure across repositories:
measures = similarity.make("measure/*/procrustes-distance=angular")
for name, measure in measures.items():
score = measure(X, Y)
print(f"{name}: {score}")
Register your own measure:
# register the function with a unique id
def my_measure(x, y):
return x.reshape(-1) @ y.reshape(-1) / (np.linalg.norm(x) * np.linalg.norm(y))
similarity.register("my_repo/my_measure", my_measure)
# use it like any other measure
measure = similarity.make("my_repo/my_measure")
score = measure(X, Y)
similarity/registry
: all the registered github repositoriessimilarity/standardization.py
: mapping to standardize names and transformations to leverage relations between measuressimilarity/papers.py
: information about papers for each github repository in the registrysimilarity/types/__init__.py
: list with all the registered identifiersIf your implementation of similarity measures is missing, please contribute!
Follow these steps to register your own similarity measures:
Fork the repository.
Create a new folder in similarity/registry/
for your repository and a __init__.py
file inside it.
Register your measures using similarity.register
. The easiest way is to copy your code with the similarity measures into the created folder and import them in your __init__.py
file.
Use the naming convention {repo_name}/{measure_name}
(you can use any measure name under your own namespace).
Add your folder to imports in similarity/registry/__init__.py
.
Add your paper to similarity/papers.py
.
You can then check that your measures have been registered correctly:
import similarity
X, Y = np.random.randn(50, 30), np.random.randn(50, 30)
measures = similarity.make("{repo_name}/{measure_name}")
score = measures(X, Y)
If you want to map your measures to standardized names, see similarity/standardization.py
. Standardized measures are under the measure/
namespace and have the form measure/{repo_name}/{standardized_measure_name}
. If your measure already exists in another repository, you can use the same standardized name. In this case, make sure your implementation is consistent with the existing ones. If your measure is new, you can propose a new standardized name.
Submit a pull request for your changes to be reviewed and merged.
For additional questions for how to contribute, please contact nacloos@mit.edu.
@inproceedings{
cloos2024framework,
title={A Framework for Standardizing Similarity Measures in a Rapidly Evolving Field},
author={Nathan Cloos and Guangyu Robert Yang and Christopher J Cueva},
booktitle={UniReps: 2nd Edition of the Workshop on Unifying Representations in Neural Models},
year={2024},
url={https://openreview.net/forum?id=vyRAYoxUuA}
}
For questions or feedback, please contact: