API¶

The API reference contains detailed descriptions of the different end-user classes, functions, methods, etc.

Note

This API reference only contains end-user documentation. If you are looking to hack away at sagenet’ internals, you will find more detailed comments in the source code.

sage
classifier
utils

sage¶

class sagenet.sage.sage(device='cpu')[source]¶

Bases: object

A sagenet object.

Parameters: device (str, default = 'cpu') – the processing unit to be used in the classifiers (gpu or cpu).

Methods

`add_ref`(adata[, tag, comm_columns, …])	Trains new classifiers on a reference dataset.
`load_model`(tag[, dir])	Loads a single pre-trained model.
`load_model_as_folder`([dir])	Loads pre-trained models from a directory.
`map_query`(adata_q)	Maps a query dataset to space using the trained models on the spatial reference(s).
`save_model`(tag[, dir])	Saves a single trained model.
`save_model_as_folder`([dir])	Saves all trained models stored in the sagenet object as a folder.

add_ref(adata, tag=None, comm_columns='class_', classifier='TransformerConv', num_workers=0, batch_size=32, epochs=10, n_genes=10, verbose=False)[source]¶

Trains new classifiers on a reference dataset.

Parameters

adata (AnnData) – The annotated data matrix of shape n_obs × n_vars to be used as the spatial reference. Rows correspond to cells (or spots) and columns to genes.
tag (str, default = None) – The tag to be used for storing the trained models and the outputs in the sagenet object.
classifier (str, default = ‘TransformerConv’) – The type of classifier to be passed to sagenet.Classifier()
comm_columns (list of str, ‘class_’) – The columns in adata.obs to be used as spatial partitions.
num_workers (int) – Non-negative. Number of workers to be passed to torch_geometric.data.DataLoader.
epochs (int) – number of epochs.
verbose (boolean, default=False) –
whether to print out loss during training.

Return
------ –
nothing. (Returns) –

Notes

Trains the models and adds them to .models dictionery of the sagenet object. Also adds a new key {tag}_entropy to .var from adata which contains the entropy values as the importance score corresponding to each gene.

load_model(tag, dir='.')[source]¶

Loads a single pre-trained model.

Parameters

tag (str) – Name of the trained model to be stored in the sagenet object.
dir (dir, defult=`'.'`) – The input directory.

load_model_as_folder(dir='.')[source]¶

Loads pre-trained models from a directory.

Parameters: dir (dir, defult=`'.'`) – The input directory.

map_query(adata_q)[source]¶

Maps a query dataset to space using the trained models on the spatial reference(s).

Parameters: adata (AnnData) – The annotated data matrix of shape n_obs × n_vars to be used as the query. Rows correspond to cells (or spots) and columns to genes.
Returns
Return type: Returns nothing.

Notes

Adds new key(s) pred_{tag}_{partitioning_name} to .obs from adata which contains the predicted partition for partitioning {partitioning_name}, trained by model {tag}.
Adds new key(s) ent_{tag}_{partitioning_name} to .obs from adata which contains the uncertainity in prediction for partitioning {partitioning_name}, trained by model {tag}.
Adds a new key distmap to .obsm from adata which is a sparse matrix of size n_obs × n_obs containing the reconstructed cell-to-cell spatial distance.

save_model(tag, dir='.')[source]¶

Saves a single trained model.

Parameters

tag (str) – Name of the trained model to be saved.
dir (dir, defult=`'.'`) – The saving directory.

save_model_as_folder(dir='.')[source]¶

Saves all trained models stored in the sagenet object as a folder.

Parameters: dir (dir, defult=`'.'`) – The saving directory.

classifier¶

class sagenet.classifier.Classifier(n_features, n_classes, n_hidden_GNN=[], n_hidden_FC=[], K=4, pool_K=4, dropout_GNN=0, dropout_FC=0, classifier='MLP', lr=0.001, momentum=0.9, log_dir=None, device='cpu')[source]¶

Bases: object

A Neural Network Classifier. A number of Graph Neural Networks (GNN) and an MLP are implemented.

Parameters

n_features (int) – number of input features.
n_classes (int) – number of classes.
n_hidden_GNN (list, default=[]) – list of integers indicating sizes of GNN hidden layers.
n_hidden_FC (list, default=[]) – list of integers indicating sizes of FC hidden layers. If a GNN is used, this indicates FC hidden layers after the GNN layers.
K (integer, default=4) – Convolution layer filter size. Used only when classifier == ‘Chebnet’.
dropout_GNN (float, default=0) – dropout rate for GNN hidden layers.
dropout_FC (float, default=0) – dropout rate for FC hidden layers.
classifier (str, default='MLP') –
- ‘MLP’ –> multilayer perceptron
- ’GraphSAGE’–> GraphSAGE Network
- ’Chebnet’–> Chebyshev spectral Graph Convolutional Network
- ’GATConv’–> Graph Attentional Neural Network
- ’GENConv’–> GENeralized Graph Convolution Network
- ’GINConv’–> Graph Isoform Network
- ’GraphConv’–> Graph Convolutional Neural Network
- ’MFConv’–> Convolutional Networks on Graphs for Learning Molecular Fingerprints
- ’TransformerConv’–> Graph Transformer Neural Network
lr (float, default=0.001) – base learning rate for the SGD optimization algorithm.
momentum (float, default=0.9) – base momentum for the SGD optimization algorithm.
log_dir (str, default=None) – path to the log directory. Specifically, used for tensorboard logs.
device (str, default='cpu') – the processing unit.

utils¶

sagenet.utils.compute_metrics(y_true, y_pred)[source]¶

Computes prediction quality metrics.

Parameters

y_true (1d array-like, or label indicator array / sparse matrix) – Ground truth (correct) labels.
y_pred (1d array-like, or label indicator array / sparse matrix) – Predicted labels, as returned by a classifier.

Returns

accuracy (accuracy)
conf_mat (confusion matrix)
precision (weighted precision score)
recall (weighted recall score)
f1 (weighted f1 score)

sagenet.utils.get_dataloader(graph, X, y, batch_size=1, undirected=True, shuffle=True, num_workers=0)[source]¶

Converts a graph and a dataset to a dataloader.

Parameters

graph (igraph object) – The underlying graph to be fed to the graph neural networks.
X (numpy ndarray) – Input dataset with columns as features and rows as observations.
y (numpy ndarray) – Class labels.
batch_size (int, default=1) – The batch size.
undirected (boolean) – if the input graph is undirected (symmetric adjacency matrix).
shuffle (boolean, default = True) – Wheather to shuffle the dataset to be passed to torch_geometric.data.DataLoader.
num_workers (int, default = 0) – Non-negative. Number of workers to be passed to torch_geometric.data.DataLoader.

Returns

dataloader (a pytorch-geometric dataloader. All of the graphs will have the same connectivity (given by the input graph),)
but the node features will be the features from X.

sagenet.utils.glasso(adata, alphas=5, n_jobs=None, mode='cd')[source]¶

Recustructs the gene-gene interaction network based on gene expressions in .X using a guassian graphical model estimated by glasso.

Parameters

adata (AnnData) – The annotated data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.
alphas (int or array-like of shape (n_alphas,), dtype=`float`, default=`5`) – Non-negative. If an integer is given, it fixes the number of points on the grids of alpha to be used. If a list is given, it gives the grid to be used.
n_jobs (int, default None) – Non-negative. number of jobs.

Returns

Return type

adds an csr_matrix matrix under key adj to .varm.

References

Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432-441.

sagenet.utils.kullback_leibler_divergence(X)[source]¶

Finds the pairwise Kullback-Leibler divergence matrix between all rows in X.

Parameters: X (array_like, shape (n_samples, n_features)) – Array of probability data. Each row must sum to 1.
Returns: D – The Kullback-Leibler divergence matrix. A pairwise matrix D such that D_{i, j} is the divergence between the ith and jth vectors of the given matrix X.
Return type: ndarray, shape (n_samples, n_samples)

Notes

Based on code from Gordon J. Berman et al. (https://github.com/gordonberman/MotionMapper)

References

Berman, G. J., Choi, D. M., Bialek, W., & Shaevitz, J. W. (2014). Mapping the stereotyped behaviour of freely moving fruit flies. Journal of The Royal Society Interface, 11(99), 20140672.

sagenet.utils.multinomial_rvs(n, p)[source]¶

Sample from the multinomial distribution with multiple p vectors.

Parameters

n (int) – must be a scalar >=1
p (numpy ndarray) – must an n-dimensional he last axis of p holds the sequence of probabilities for a multinomial distribution.

Returns

D – same shape as p

Return type

ndarray

sagenet.utils.save_adata(adata, attr, key, data)[source]¶

updates an attribute of an AnnData object

Parameters

adata (AnnData) – The annotated data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.
attr (str) – must be an attribute of adata, e.g., obs, var, etc.
key (str) – must be a key in the attr
data (non-specific) – the data to be updated/placed

`eval`(data_loader[, verbose])	evaluates the model based on predictions
`fit`(data_loader, epochs[, test_dataloader, …])	fits the classifier to the input data.
`interpret`(data_loader, n_features, n_classes)	interprets a trained model, by giving importance scores assigned to each feature regarding each class it uses the IntegratedGradients method from the package captum to computed class-wise feature importances and then computes entropy values to get a global importance measure.