datasets package

This package contains wrapper functions around sklearn.datasets.

The tno.quantum.ml.datasets package only wraps some of the functionality of the sklearn.datasets. This package is used for testing the tno.quantum.ml classifiers and clustering algorithms in an easy, reproducible and consistent way.

datasets.get_blobs_clustering_dataset(n_samples, n_features, n_centers, random_seed=42)[source]

Load a blobs clustering dataset.

This function wraps the make_blobs method of sklearn.datasets with a fixed cluster standard deviation of 0.1

Example usage:

>>> from tno.quantum.ml.datasets import get_blobs_clustering_dataset
>>> X, true_labels = get_blobs_clustering_dataset(100, 3, 2)
>>> print(f"{X.shape=}\n{true_labels.shape=}")
X.shape=(100, 3)
true_labels.shape=(100,)

Parameters:

n_samples (int) – Number of samples.
n_features (int) – Number of features.
n_centers (int) – Number of centers.
random_seed (int) – Seed to give to the random number generator. Defaults to 42.

Return type:

Tuple[ndarray[Any, dtype[float64]], ndarray[Any, dtype[float64]]]

Returns:

A tuple containing X and true_labels of a blobs clustering dataset.

datasets.get_circles_dataset(random_seed=0)[source]

Generate a random dataset with the shape of two circles.

This function wraps the make_circles method of sklearn.datasets with a fixed noise factor of 0.2 and factor of 0.5. Furthermore, the data is split into training and validation data, where 60% of the data is training and 40% is validation.

Example usage:

>>> from tno.quantum.ml.datasets import get_circles_dataset
>>> X_train, y_train, X_val, y_val = get_circles_dataset()
>>> print(f"{X_train.shape=}\n{y_train.shape=}\n{X_val.shape=}\n{y_val.shape=}")
X_train.shape=(60, 2)
y_train.shape=(60,)
X_val.shape=(40, 2)
y_val.shape=(40,)

Parameters:: random_seed (int) – Seed to give to the random number generator. Defaults to 0.
Return type:: Tuple[ndarray[Any, dtype[float64]], ndarray[Any, dtype[int32]], ndarray[Any, dtype[float64]], ndarray[Any, dtype[int32]]]
Returns:: A tuple containing X_training, y_training, X_validation and y_validation of a dataset with two circles.

datasets.get_iris_dataset(n_features=4, n_classes=3, random_seed=0)[source]

Load the iris dataset.

The dataset is loaded and split into training and validation data, with a ratio of 3 to 1 (75% of the data is training and 25% is validation).

Example usage:

>>> from tno.quantum.ml.datasets import get_iris_dataset
>>> X_train, y_train, X_val, y_val = get_iris_dataset()
>>> print(f"{X_train.shape=}\n{y_train.shape=}\n{X_val.shape=}\n{y_val.shape=}")
X_train.shape=(112, 4)
y_train.shape=(112,)
X_val.shape=(38, 4)
y_val.shape=(38,)

Parameters:

n_features (int) – Number of features. Defaults to 4.
n_classes (int) – Nuber of classes, must be 1, 2 or 3. Defaults to 3.
random_seed (int) – Seed to give to the random number generator. Defaults to 0.

Return type:

Tuple[ndarray[Any, dtype[float64]], ndarray[Any, dtype[int32]], ndarray[Any, dtype[float64]], ndarray[Any, dtype[int32]]]

Returns:

A tuple containing X_training, y_training, X_validation and y_validation of the iris dataset.

datasets.get_linearly_separables_dataset(random_seed=0)[source]

Generate a random dataset that is linearly separable.

This function wraps the make_classification method of sklearn.datasets with the following fixed arguments: n_features=2, n_redundant=0, n_informative=2 and n_clusters_per_class=1. Afterwards, uniformly distributed noise is added. Lastly, the data is split into training and validation data, where 60% of the data is training and 40% is validation.

Example usage:

>>> from tno.quantum.ml.datasets import get_linearly_separables_dataset
>>> X_train, y_train, X_val, y_val = get_linearly_separables_dataset()
>>> print(f"{X_train.shape=}\n{y_train.shape=}\n{X_val.shape=}\n{y_val.shape=}")
X_train.shape=(60, 2)
y_train.shape=(60,)
X_val.shape=(40, 2)
y_val.shape=(40,)

Parameters:: random_seed (int) – Seed to give to the random number generator. Defaults to 0.
Return type:: Tuple[ndarray[Any, dtype[float64]], ndarray[Any, dtype[int32]], ndarray[Any, dtype[float64]], ndarray[Any, dtype[int32]]]
Returns:: A tuple containing X_training, y_training, X_validation and y_validation of a dataset that is linearly separable.

datasets.get_moons_dataset(random_seed=0)[source]

Generate a random dataset with a moon shape.

This function wraps the make_moons method of sklearn.datasets with a fixed noise factor of 0.3. Furthermore, the data is split into training and validation data, where 60% of the data is training and 40% is validation.

Example usage:

>>> from tno.quantum.ml.datasets import get_moons_dataset
>>> X_train, y_train, X_val, y_val = get_moons_dataset()
>>> print(f"{X_train.shape=}\n{y_train.shape=}\n{X_val.shape=}\n{y_val.shape=}")
X_train.shape=(60, 2)
y_train.shape=(60,)
X_val.shape=(40, 2)
y_val.shape=(40,)

Parameters:: random_seed (int) – Seed to give to the random number generator. Defaults to 0.
Return type:: Tuple[ndarray[Any, dtype[float64]], ndarray[Any, dtype[int32]], ndarray[Any, dtype[float64]], ndarray[Any, dtype[int32]]]
Returns:: A tuple containing X_training, y_training, X_validation and y_validation of a moon shaped dataset.

datasets.get_wine_dataset(n_features=13, n_classes=3, random_seed=0)[source]

Load the wine dataset.

The dataset is loaded and split into training and validation data, with a ratio of 3 to 1 (75% of the data is training and 25% is validation).

Example usage:

>>> from tno.quantum.ml.datasets import get_wine_dataset
>>> X_train, y_train, X_val, y_val = get_wine_dataset()
>>> print(f"{X_train.shape=}\n{y_train.shape=}\n{X_val.shape=}\n{y_val.shape=}")
X_train.shape=(133, 13)
y_train.shape=(133,)
X_val.shape=(45, 13)
y_val.shape=(45,)

Parameters:

n_features (int) – Number of features. Defaults to 13.
n_classes (int) – Nuber of classes, must be 1, 2 or 3. Defaults to 3.
random_seed (int) – Seed to give to the random number generator. Defaults to 0.

Return type:

Tuple[ndarray[Any, dtype[float64]], ndarray[Any, dtype[int32]], ndarray[Any, dtype[float64]], ndarray[Any, dtype[int32]]]

Returns:

A tuple containing X_training, y_training, X_validation and y_validation of the wine dataset.