`RobustICA`#

robustica.abs_pearson_dist(X)#

Compute Pearson dissimilarity between columns.

Parameters: X (np.array of shape (n_features, n_samples)) – Input data.
Returns: D – Dissimilarity matrix.
Return type: np.array of shape (n_samples, n_samples)

Examples

from robustica import abs_pearson_dist
from robustica.examples import make_sampledata

X = make_sampledata(15, 5)
D = abs_pearson_dist(X)
D.shape

robustica.corrmats(X, Y)#

Vectorized implementation of pairwise correlations between rows in X and rows in Y. Make sure that the number of columns in X and Y is the same.

Parameters

X (np.array of shape (n_features_x, n_samples_x)) –
Y (np.array of shape (n_features_y, n_samples_y)) –

Returns

Return type

np.array of shape (n_features_x, n_features_y)

Examples

from robustica import corrmats
from robustica.examples import make_sampledata

X = make_sampledata(15, 5)
Y = make_sampledata(20, 5)
r = corrmats(X, Y)
r.shape

robustica.compute_iq(X, labels, precomputed=False)#

Compute cluster index of quality as suggested by Himberg, J., & Hyvarinen (2004) (DOI: https://doi.org/10.1109/NNSP.2003.1318025). This method requires computing a square correlation matrix.

Parameters

X (np.array of shape (n_samples, n_features)) –
labels (list or np.array) – Clustering labels indicating to which cluster every observation belongs.
precomputed (bool, default=False) – Indicates whether X is a square pairwise correlation matrix.

Returns

df – Dataframe with cluster labels (‘cluster_id’) and their corresponding Iq scores.

Return type

pd.DataFrame

Examples

from robustica import compute_iq
from robustica.examples import make_sampledata

X = make_sampledata(5, 15)
labels = [1,1,2,1,2]
df = compute_iq(X, labels)
df

class robustica.RobustICA(n_components=None, algorithm='parallel', whiten='arbitrary-variance', fun='logcosh', fun_args=None, max_iter=200, tol=0.0001, w_init=None, random_state=None, n_jobs=None, robust_runs=100, robust_infer_signs=True, robust_dimreduce=True, robust_method='DBSCAN', robust_kws={}, robust_precompdist_func='abs_pearson_dist', verbose=True)#

Class to perform robust Independent Component Analysis (ICA) using different methods to cluster together the independent components computed via sklearn.decomposition.FastICA.

By default, it carries out the Icasso algorithm using aglomerative clustering with average linkage and a precomputed Pearson dissimilarity matrix.

Schematically, RobustICA works like this:

Run ICA multiple times and save source (S) and mixing (A) matrices.
Cluster the components into robust components using Ss across runs.

2.1) If we use a precomputed dissimilarity:
2.1.1) Precompute dissimilarity

2.2) If we don’t use a precomputed dissimilarity:
2.2.1) (Optional) Infer and correct component signs across runs 2.2.2) (Optional) Reduce the feature space with PCA

2.3) Cluster components across all S runs 2.4) Use clustering labels to compute the centroid of each cluster, i.e. the robust component in both S and A.

Parameters

n_components (int, default=None) – Number of components to use. If None is passed, all are used.
algorithm ({'parallel', 'deflation'}, default='parallel') – Apply parallel or deflational algorithm for FastICA.
whiten (bool, default=True) – If whiten is false, the data is already considered to be whitened, and no whitening is performed. WARNING: if you use scikit-learn>1.3 defaults were set whiten=’arbitrary-variance’ to maintain the behavior.
fun ({'logcosh', 'exp', 'cube'} or callable, default='logcosh') –
The functional form of the G function used in the approximation to neg-entropy. Could be either ‘logcosh’, ‘exp’, or ‘cube’. You can also provide your own function. It should return a tuple containing the value of the function, and of its derivative, in the point. Example:
```
def my_g(x):
    return x ** 3, (3 * x ** 2).mean(axis=-1)
```
fun_args (dict, default=None) – Arguments to send to the functional form. If empty and if fun=’logcosh’, fun_args will take value {‘alpha’ : 1.0}.
max_iter (int, default=200) – Maximum number of iterations during fit.
tol (float, default=1e-4) – Tolerance on update at each iteration.
w_init (ndarray of shape (n_components, n_components), default=None) – The mixing matrix to be used to initialize the algorithm.
random_state (int, RandomState instance or None, default=None) – Used to initialize w_init when not specified, with a normal distribution. Pass an int, for reproducible results across multiple function calls. See Glossary.
robust_runs (int, default=100) – Number of times to run FastICA.
robust_infer_signs (bool, default=True) – If robust_infer_signs is True, we infer and correct the signs of components across ICA runs before clustering them.
robust_method (str or callable, default="DBSCAN") –
Clustering class to compute robust components across ICA runs. If str, choose one of the following clustering algorithms from sklearn.cluster:
- ”AgglomerativeClustering”
- ”AffinityPropagation”
- ”Birch”
- ”DBSCAN”
- ”FeatureAgglomeration”
- ”KMeans”
- ”MiniBatchKMeans”
- ”MeanShift”
- ”OPTICS”
- ”SpectralClustering”
or from sklearn_extra.cluster:
- ”KMedoids”
- ”CommonNNClustering”
If class, the algorithm expects a clustering class with a self.fit() method that creates a self.clustering.labels_ attribute returning the list of clustering labels.
robust_kws (dict, default={"linkage": "average"}) – Keyword arguments to send to clustering class defined by robust_method. If robust_method is str and if “n_clusters” or “min_samples” are not defined in robust_kws, robust_kws will be updated with either {“n_clusters”: self.n_components} or {“min_samples”: int(self.robust_runs * 0.5)} accordingly.
robust_dimreduce (bool, default=True) – If robust_dimreduce is True, we use sklearn.decomposition.PCA with the same n_components to reduce the feature space across ICA runs after sign inference and correction (if robust_infer_signs=True) and before clustering.
robust_precompdist_func ("abs_pearson_dist" or callable, default="abs_pearson_dist") – If robust_kws contain the value “precomputed”, we precompute a distance matrix by executing robust_precomp_dist_func and use it for clustering.

S#

Robust source matrix computed using the centroids of every cluster.

Type: np.array of shape (n_features, n_components)

A#

Robust mixing matrix computed using the centroids of every cluster.

Type: np.array of shape (n_samples, n_components)

S_std#

Within robust component standard deviation across features.

Type: np.array of shape (n_features, n_components)

A_std#

Within robust component standard deviation across samples.

Type: np.array of shape (n_features, n_components)

S_all#

Concatenated source matrices corresponding to every run of ICA.

Type: np.array of shape (n_features, n_components * robust_runs)

A_all#

Concatenated mixing matrices corresponding to every run of ICA.

Type: np.array of shape (n_features, n_components * robust_runs)

time#

Time to execute every run of ICA for robust_runs times. Dictionary structured as {run : seconds}.

Type: dict of length n_components * robust_runs

signs_#

Array of positive or negative ones used to correct for signs before clustering.

Type: np.array of length n_components * robust_runs

orientation_#

Array of positive or negative ones used to orient labeled components after clustering so that largest weights face positive.

Type: np.array of length n_components * robust_runs

clustering#

Instance used to cluster components in S_all across ICA runs. The clustering labels can be found in the attribute self.clustering.labels_. In self.clustering.stats_ you can find information on cluster sizes and mean standard deviations per cluster in both S and A robust matrices.

Type: class instance

Examples

from robustica import RobustICA
from robustica.examples import make_sampledata

X = make_sampledata(200,50)
rica = RobustICA(n_components=10)
S, A = rica.fit_transform(X)

Notes

Icasso procedure based on Himberg, J., & Hyvarinen, A. “Icasso: software for investigating the reliability of ICA estimates by clustering and visualization”. IEEE XIII Workshop on Neural Networks for Signal Processing (2003). DOI: https://doi.org/10.1109/NNSP.2003.1318025

Centroid computation based on Sastry, Anand V., et al. “The Escherichia coli transcriptome mostly consists of independently regulated modules.” Nature communications 10.1 (2019): 1-14. DOI: https://doi.org/10.1038/s41467-019-13483-w

RobustICA#

`RobustICA`#