API

The API reference contains detailed descriptions of the different end-user classes, functions, methods, etc.

Note

This API reference only contains end-user documentation. If you are looking to hack away at sagenet’ internals, you will find more detailed comments in the source code.

sage

class sagenet.sage.sage(device='cpu')[source]

Bases: object

A sagenet object.

Parameters

device (str, default = 'cpu') – the processing unit to be used in the classifiers (gpu or cpu).

Methods

add_ref(adata[, tag, comm_columns, …])

Trains new classifiers on a reference dataset.

load_model(tag[, dir])

Loads a single pre-trained model.

load_model_as_folder([dir])

Loads pre-trained models from a directory.

map_query(adata_q)

Maps a query dataset to space using the trained models on the spatial reference(s).

save_model(tag[, dir])

Saves a single trained model.

save_model_as_folder([dir])

Saves all trained models stored in the sagenet object as a folder.

add_ref(adata, tag=None, comm_columns='class_', classifier='TransformerConv', num_workers=0, batch_size=32, epochs=10, n_genes=10, verbose=False)[source]

Trains new classifiers on a reference dataset.

Parameters
  • adata (AnnData) – The annotated data matrix of shape n_obs × n_vars to be used as the spatial reference. Rows correspond to cells (or spots) and columns to genes.

  • tag (str, default = None) – The tag to be used for storing the trained models and the outputs in the sagenet object.

  • classifier (str, default = ‘TransformerConv’) – The type of classifier to be passed to sagenet.Classifier()

  • comm_columns (list of str, ‘class_’) – The columns in adata.obs to be used as spatial partitions.

  • num_workers (int) – Non-negative. Number of workers to be passed to torch_geometric.data.DataLoader.

  • epochs (int) – number of epochs.

  • verbose (boolean, default=False) –

    whether to print out loss during training.

    Return

  • ------

  • nothing. (Returns) –

Notes

Trains the models and adds them to .models dictionery of the sagenet object. Also adds a new key {tag}_entropy to .var from adata which contains the entropy values as the importance score corresponding to each gene.

load_model(tag, dir='.')[source]

Loads a single pre-trained model.

Parameters
  • tag (str) – Name of the trained model to be stored in the sagenet object.

  • dir (dir, defult=`'.'`) – The input directory.

load_model_as_folder(dir='.')[source]

Loads pre-trained models from a directory.

Parameters

dir (dir, defult=`'.'`) – The input directory.

map_query(adata_q)[source]

Maps a query dataset to space using the trained models on the spatial reference(s).

Parameters

adata (AnnData) – The annotated data matrix of shape n_obs × n_vars to be used as the query. Rows correspond to cells (or spots) and columns to genes.

Returns

Return type

Returns nothing.

Notes

  • Adds new key(s) pred_{tag}_{partitioning_name} to .obs from adata which contains the predicted partition for partitioning {partitioning_name}, trained by model {tag}.

  • Adds new key(s) ent_{tag}_{partitioning_name} to .obs from adata which contains the uncertainity in prediction for partitioning {partitioning_name}, trained by model {tag}.

  • Adds a new key distmap to .obsm from adata which is a sparse matrix of size n_obs × n_obs containing the reconstructed cell-to-cell spatial distance.

save_model(tag, dir='.')[source]

Saves a single trained model.

Parameters
  • tag (str) – Name of the trained model to be saved.

  • dir (dir, defult=`'.'`) – The saving directory.

save_model_as_folder(dir='.')[source]

Saves all trained models stored in the sagenet object as a folder.

Parameters

dir (dir, defult=`'.'`) – The saving directory.

classifier

class sagenet.classifier.Classifier(n_features, n_classes, n_hidden_GNN=[], n_hidden_FC=[], K=4, pool_K=4, dropout_GNN=0, dropout_FC=0, classifier='MLP', lr=0.001, momentum=0.9, log_dir=None, device='cpu')[source]

Bases: object

A Neural Network Classifier. A number of Graph Neural Networks (GNN) and an MLP are implemented.

Parameters
  • n_features (int) – number of input features.

  • n_classes (int) – number of classes.

  • n_hidden_GNN (list, default=[]) – list of integers indicating sizes of GNN hidden layers.

  • n_hidden_FC (list, default=[]) – list of integers indicating sizes of FC hidden layers. If a GNN is used, this indicates FC hidden layers after the GNN layers.

  • K (integer, default=4) – Convolution layer filter size. Used only when classifier == ‘Chebnet’.

  • dropout_GNN (float, default=0) – dropout rate for GNN hidden layers.

  • dropout_FC (float, default=0) – dropout rate for FC hidden layers.

  • classifier (str, default='MLP') –

    • ‘MLP’ –> multilayer perceptron

    • ’GraphSAGE’–> GraphSAGE Network

    • ’Chebnet’–> Chebyshev spectral Graph Convolutional Network

    • ’GATConv’–> Graph Attentional Neural Network

    • ’GENConv’–> GENeralized Graph Convolution Network

    • ’GINConv’–> Graph Isoform Network

    • ’GraphConv’–> Graph Convolutional Neural Network

    • ’MFConv’–> Convolutional Networks on Graphs for Learning Molecular Fingerprints

    • ’TransformerConv’–> Graph Transformer Neural Network

  • lr (float, default=0.001) – base learning rate for the SGD optimization algorithm.

  • momentum (float, default=0.9) – base momentum for the SGD optimization algorithm.

  • log_dir (str, default=None) – path to the log directory. Specifically, used for tensorboard logs.

  • device (str, default='cpu') – the processing unit.

See also

Classifier.fit

fits the classifier to data

Classifier.eval

evaluates the classifier predictions

Methods

eval(data_loader[, verbose])

evaluates the model based on predictions

fit(data_loader, epochs[, test_dataloader, …])

fits the classifier to the input data.

interpret(data_loader, n_features, n_classes)

interprets a trained model, by giving importance scores assigned to each feature regarding each class it uses the IntegratedGradients method from the package captum to computed class-wise feature importances and then computes entropy values to get a global importance measure.

eval(data_loader, verbose=False)[source]

evaluates the model based on predictions

Parameters
  • test_dataloader (torch-geometric dataloader, default=None) – the dataset on which the model is evaluated.

  • verbose (boolean, default=False) – whether to print out loss during training.

Returns

  • accuracy (float) – accuracy

  • conf_mat (ndarray) – confusion matrix

  • precision (fload) – weighted precision score

  • recall (float) – weighted recall score

  • f1_score (float) – weighted f1 score

fit(data_loader, epochs, test_dataloader=None, verbose=False)[source]

fits the classifier to the input data.

Parameters
  • data_loader (torch-geometric dataloader) – the training dataset.

  • epochs (int) – number of epochs.

  • test_dataloader (torch-geometric dataloader, default=None) – the test dataset on which the model is evaluated in each epoch.

  • verbose (boolean, default=False) – whether to print out loss during training.

interpret(data_loader, n_features, n_classes)[source]

interprets a trained model, by giving importance scores assigned to each feature regarding each class it uses the IntegratedGradients method from the package captum to computed class-wise feature importances and then computes entropy values to get a global importance measure.

Parameters
  • data_loder (torch-geometric dataloader, default=None) – the dataset on which the model is evaluated.

  • n_features (int) – number of features.

  • n_classes (int) – number of classes.

Returns

ent

Return type

numpy ndarray, shape (n_features)

utils

sagenet.utils.compute_metrics(y_true, y_pred)[source]

Computes prediction quality metrics.

Parameters
  • y_true (1d array-like, or label indicator array / sparse matrix) – Ground truth (correct) labels.

  • y_pred (1d array-like, or label indicator array / sparse matrix) – Predicted labels, as returned by a classifier.

Returns

  • accuracy (accuracy)

  • conf_mat (confusion matrix)

  • precision (weighted precision score)

  • recall (weighted recall score)

  • f1 (weighted f1 score)

sagenet.utils.get_dataloader(graph, X, y, batch_size=1, undirected=True, shuffle=True, num_workers=0)[source]

Converts a graph and a dataset to a dataloader.

Parameters
  • graph (igraph object) – The underlying graph to be fed to the graph neural networks.

  • X (numpy ndarray) – Input dataset with columns as features and rows as observations.

  • y (numpy ndarray) – Class labels.

  • batch_size (int, default=1) – The batch size.

  • undirected (boolean) – if the input graph is undirected (symmetric adjacency matrix).

  • shuffle (boolean, default = True) – Wheather to shuffle the dataset to be passed to torch_geometric.data.DataLoader.

  • num_workers (int, default = 0) – Non-negative. Number of workers to be passed to torch_geometric.data.DataLoader.

Returns

  • dataloader (a pytorch-geometric dataloader. All of the graphs will have the same connectivity (given by the input graph),)

  • but the node features will be the features from X.

sagenet.utils.glasso(adata, alphas=5, n_jobs=None, mode='cd')[source]

Recustructs the gene-gene interaction network based on gene expressions in .X using a guassian graphical model estimated by glasso.

Parameters
  • adata (AnnData) – The annotated data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

  • alphas (int or array-like of shape (n_alphas,), dtype=`float`, default=`5`) – Non-negative. If an integer is given, it fixes the number of points on the grids of alpha to be used. If a list is given, it gives the grid to be used.

  • n_jobs (int, default None) – Non-negative. number of jobs.

Returns

Return type

adds an csr_matrix matrix under key adj to .varm.

References

Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432-441.

sagenet.utils.kullback_leibler_divergence(X)[source]

Finds the pairwise Kullback-Leibler divergence matrix between all rows in X.

Parameters

X (array_like, shape (n_samples, n_features)) – Array of probability data. Each row must sum to 1.

Returns

D – The Kullback-Leibler divergence matrix. A pairwise matrix D such that D_{i, j} is the divergence between the ith and jth vectors of the given matrix X.

Return type

ndarray, shape (n_samples, n_samples)

Notes

Based on code from Gordon J. Berman et al. (https://github.com/gordonberman/MotionMapper)

References

Berman, G. J., Choi, D. M., Bialek, W., & Shaevitz, J. W. (2014). Mapping the stereotyped behaviour of freely moving fruit flies. Journal of The Royal Society Interface, 11(99), 20140672.

sagenet.utils.multinomial_rvs(n, p)[source]

Sample from the multinomial distribution with multiple p vectors.

Parameters
  • n (int) – must be a scalar >=1

  • p (numpy ndarray) – must an n-dimensional he last axis of p holds the sequence of probabilities for a multinomial distribution.

Returns

D – same shape as p

Return type

ndarray

sagenet.utils.save_adata(adata, attr, key, data)[source]

updates an attribute of an AnnData object

Parameters
  • adata (AnnData) – The annotated data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.

  • attr (str) – must be an attribute of adata, e.g., obs, var, etc.

  • key (str) – must be a key in the attr

  • data (non-specific) – the data to be updated/placed