datasets package
This package contains wrapper functions around sklearn.datasets
.
The tno.quantum.ml.datasets
package only wraps some of the functionality of the
sklearn.datasets
. This package is used for testing the tno.quantum.ml
classifiers and clustering algorithms in an easy, reproducible and consistent way.
- datasets.get_blobs_clustering_dataset(n_samples, n_features, n_centers, random_seed=42)[source]
Load a blobs clustering dataset.
This function wraps the make_blobs method of
sklearn.datasets
with a fixed cluster standard deviation of 0.1Example usage:
>>> from tno.quantum.ml.datasets import get_blobs_clustering_dataset >>> X, true_labels = get_blobs_clustering_dataset(100, 3, 2) >>> print(f"{X.shape=}\n{true_labels.shape=}") X.shape=(100, 3) true_labels.shape=(100,)
- Parameters:
n_samples (
int
) – Number of samples.n_features (
int
) – Number of features.n_centers (
int
) – Number of centers.random_seed (
int
) – Seed to give to the random number generator. Defaults to 42.
- Return type:
Tuple
[ndarray
[Any
,dtype
[float64
]],ndarray
[Any
,dtype
[float64
]]]- Returns:
A tuple containing
X
andtrue_labels
of a blobs clustering dataset.
- datasets.get_circles_dataset(random_seed=0)[source]
Generate a random dataset with the shape of two circles.
This function wraps the make_circles method of
sklearn.datasets
with a fixed noise factor of 0.2 and factor of 0.5. Furthermore, the data is split into training and validation data, where 60% of the data is training and 40% is validation.Example usage:
>>> from tno.quantum.ml.datasets import get_circles_dataset >>> X_train, y_train, X_val, y_val = get_circles_dataset() >>> print(f"{X_train.shape=}\n{y_train.shape=}\n{X_val.shape=}\n{y_val.shape=}") X_train.shape=(60, 2) y_train.shape=(60,) X_val.shape=(40, 2) y_val.shape=(40,)
- Parameters:
random_seed (
int
) – Seed to give to the random number generator. Defaults to 0.- Return type:
Tuple
[ndarray
[Any
,dtype
[float64
]],ndarray
[Any
,dtype
[int32
]],ndarray
[Any
,dtype
[float64
]],ndarray
[Any
,dtype
[int32
]]]- Returns:
A tuple containing
X_training
,y_training
,X_validation
andy_validation
of a dataset with two circles.
- datasets.get_iris_dataset(n_features=4, n_classes=3, random_seed=0)[source]
Load the iris dataset.
The dataset is loaded and split into training and validation data, with a ratio of 3 to 1 (75% of the data is training and 25% is validation).
Example usage:
>>> from tno.quantum.ml.datasets import get_iris_dataset >>> X_train, y_train, X_val, y_val = get_iris_dataset() >>> print(f"{X_train.shape=}\n{y_train.shape=}\n{X_val.shape=}\n{y_val.shape=}") X_train.shape=(112, 4) y_train.shape=(112,) X_val.shape=(38, 4) y_val.shape=(38,)
- Parameters:
n_features (
int
) – Number of features. Defaults to 4.n_classes (
int
) – Nuber of classes, must be 1, 2 or 3. Defaults to 3.random_seed (
int
) – Seed to give to the random number generator. Defaults to 0.
- Return type:
Tuple
[ndarray
[Any
,dtype
[float64
]],ndarray
[Any
,dtype
[int32
]],ndarray
[Any
,dtype
[float64
]],ndarray
[Any
,dtype
[int32
]]]- Returns:
A tuple containing
X_training
,y_training
,X_validation
andy_validation
of the iris dataset.
- datasets.get_linearly_separables_dataset(random_seed=0)[source]
Generate a random dataset that is linearly separable.
This function wraps the make_classification method of
sklearn.datasets
with the following fixed arguments: n_features=2, n_redundant=0, n_informative=2 and n_clusters_per_class=1. Afterwards, uniformly distributed noise is added. Lastly, the data is split into training and validation data, where 60% of the data is training and 40% is validation.Example usage:
>>> from tno.quantum.ml.datasets import get_linearly_separables_dataset >>> X_train, y_train, X_val, y_val = get_linearly_separables_dataset() >>> print(f"{X_train.shape=}\n{y_train.shape=}\n{X_val.shape=}\n{y_val.shape=}") X_train.shape=(60, 2) y_train.shape=(60,) X_val.shape=(40, 2) y_val.shape=(40,)
- Parameters:
random_seed (
int
) – Seed to give to the random number generator. Defaults to 0.- Return type:
Tuple
[ndarray
[Any
,dtype
[float64
]],ndarray
[Any
,dtype
[int32
]],ndarray
[Any
,dtype
[float64
]],ndarray
[Any
,dtype
[int32
]]]- Returns:
A tuple containing
X_training
,y_training
,X_validation
andy_validation
of a dataset that is linearly separable.
- datasets.get_moons_dataset(random_seed=0)[source]
Generate a random dataset with a moon shape.
This function wraps the make_moons method of
sklearn.datasets
with a fixed noise factor of 0.3. Furthermore, the data is split into training and validation data, where 60% of the data is training and 40% is validation.Example usage:
>>> from tno.quantum.ml.datasets import get_moons_dataset >>> X_train, y_train, X_val, y_val = get_moons_dataset() >>> print(f"{X_train.shape=}\n{y_train.shape=}\n{X_val.shape=}\n{y_val.shape=}") X_train.shape=(60, 2) y_train.shape=(60,) X_val.shape=(40, 2) y_val.shape=(40,)
- Parameters:
random_seed (
int
) – Seed to give to the random number generator. Defaults to 0.- Return type:
Tuple
[ndarray
[Any
,dtype
[float64
]],ndarray
[Any
,dtype
[int32
]],ndarray
[Any
,dtype
[float64
]],ndarray
[Any
,dtype
[int32
]]]- Returns:
A tuple containing
X_training
,y_training
,X_validation
andy_validation
of a moon shaped dataset.
- datasets.get_wine_dataset(n_features=13, n_classes=3, random_seed=0)[source]
Load the wine dataset.
The dataset is loaded and split into training and validation data, with a ratio of 3 to 1 (75% of the data is training and 25% is validation).
Example usage:
>>> from tno.quantum.ml.datasets import get_wine_dataset >>> X_train, y_train, X_val, y_val = get_wine_dataset() >>> print(f"{X_train.shape=}\n{y_train.shape=}\n{X_val.shape=}\n{y_val.shape=}") X_train.shape=(133, 13) y_train.shape=(133,) X_val.shape=(45, 13) y_val.shape=(45,)
- Parameters:
n_features (
int
) – Number of features. Defaults to 13.n_classes (
int
) – Nuber of classes, must be 1, 2 or 3. Defaults to 3.random_seed (
int
) – Seed to give to the random number generator. Defaults to 0.
- Return type:
Tuple
[ndarray
[Any
,dtype
[float64
]],ndarray
[Any
,dtype
[int32
]],ndarray
[Any
,dtype
[float64
]],ndarray
[Any
,dtype
[int32
]]]- Returns:
A tuple containing
X_training
,y_training
,X_validation
andy_validation
of the wine dataset.