API¶
The API reference contains detailed descriptions of the different end-user classes, functions, methods, etc.
Note
This API reference only contains end-user documentation. If you are looking to hack away at sagenet’ internals, you will find more detailed comments in the source code.
sage¶
- class sagenet.sage.sage(device='cpu')[source]¶
Bases:
objectA sagenet object.
- Parameters
device (str, default = 'cpu') – the processing unit to be used in the classifiers (gpu or cpu).
Methods
add_ref(adata[, tag, comm_columns, …])Trains new classifiers on a reference dataset.
load_model(tag[, dir])Loads a single pre-trained model.
load_model_as_folder([dir])Loads pre-trained models from a directory.
map_query(adata_q)Maps a query dataset to space using the trained models on the spatial reference(s).
save_model(tag[, dir])Saves a single trained model.
save_model_as_folder([dir])Saves all trained models stored in the sagenet object as a folder.
- add_ref(adata, tag=None, comm_columns='class_', classifier='TransformerConv', num_workers=0, batch_size=32, epochs=10, n_genes=10, verbose=False)[source]¶
Trains new classifiers on a reference dataset.
- Parameters
adata (AnnData) – The annotated data matrix of shape n_obs × n_vars to be used as the spatial reference. Rows correspond to cells (or spots) and columns to genes.
tag (str, default = None) – The tag to be used for storing the trained models and the outputs in the sagenet object.
classifier (str, default = ‘TransformerConv’) – The type of classifier to be passed to sagenet.Classifier()
comm_columns (list of str, ‘class_’) – The columns in adata.obs to be used as spatial partitions.
num_workers (int) – Non-negative. Number of workers to be passed to torch_geometric.data.DataLoader.
epochs (int) – number of epochs.
verbose (boolean, default=False) –
whether to print out loss during training.
Return
------ –
nothing. (Returns) –
Notes
Trains the models and adds them to .models dictionery of the sagenet object. Also adds a new key {tag}_entropy to .var from adata which contains the entropy values as the importance score corresponding to each gene.
- load_model(tag, dir='.')[source]¶
Loads a single pre-trained model.
- Parameters
tag (str) – Name of the trained model to be stored in the sagenet object.
dir (dir, defult=`'.'`) – The input directory.
- load_model_as_folder(dir='.')[source]¶
Loads pre-trained models from a directory.
- Parameters
dir (dir, defult=`'.'`) – The input directory.
- map_query(adata_q)[source]¶
Maps a query dataset to space using the trained models on the spatial reference(s).
- Parameters
adata (AnnData) – The annotated data matrix of shape n_obs × n_vars to be used as the query. Rows correspond to cells (or spots) and columns to genes.
- Returns
- Return type
Returns nothing.
Notes
Adds new key(s) pred_{tag}_{partitioning_name} to .obs from adata which contains the predicted partition for partitioning {partitioning_name}, trained by model {tag}.
Adds new key(s) ent_{tag}_{partitioning_name} to .obs from adata which contains the uncertainity in prediction for partitioning {partitioning_name}, trained by model {tag}.
Adds a new key distmap to .obsm from adata which is a sparse matrix of size n_obs × n_obs containing the reconstructed cell-to-cell spatial distance.
classifier¶
- class sagenet.classifier.Classifier(n_features, n_classes, n_hidden_GNN=[], n_hidden_FC=[], K=4, pool_K=4, dropout_GNN=0, dropout_FC=0, classifier='MLP', lr=0.001, momentum=0.9, log_dir=None, device='cpu')[source]¶
Bases:
objectA Neural Network Classifier. A number of Graph Neural Networks (GNN) and an MLP are implemented.
- Parameters
n_features (int) – number of input features.
n_classes (int) – number of classes.
n_hidden_GNN (list, default=[]) – list of integers indicating sizes of GNN hidden layers.
n_hidden_FC (list, default=[]) – list of integers indicating sizes of FC hidden layers. If a GNN is used, this indicates FC hidden layers after the GNN layers.
K (integer, default=4) – Convolution layer filter size. Used only when classifier == ‘Chebnet’.
dropout_GNN (float, default=0) – dropout rate for GNN hidden layers.
dropout_FC (float, default=0) – dropout rate for FC hidden layers.
classifier (str, default='MLP') –
‘MLP’ –> multilayer perceptron
’GraphSAGE’–> GraphSAGE Network
’Chebnet’–> Chebyshev spectral Graph Convolutional Network
’GATConv’–> Graph Attentional Neural Network
’GENConv’–> GENeralized Graph Convolution Network
’GINConv’–> Graph Isoform Network
’GraphConv’–> Graph Convolutional Neural Network
’MFConv’–> Convolutional Networks on Graphs for Learning Molecular Fingerprints
’TransformerConv’–> Graph Transformer Neural Network
lr (float, default=0.001) – base learning rate for the SGD optimization algorithm.
momentum (float, default=0.9) – base momentum for the SGD optimization algorithm.
log_dir (str, default=None) – path to the log directory. Specifically, used for tensorboard logs.
device (str, default='cpu') – the processing unit.
See also
Classifier.fitfits the classifier to data
Classifier.evalevaluates the classifier predictions
Methods
eval(data_loader[, verbose])evaluates the model based on predictions
fit(data_loader, epochs[, test_dataloader, …])fits the classifier to the input data.
interpret(data_loader, n_features, n_classes)interprets a trained model, by giving importance scores assigned to each feature regarding each class it uses the IntegratedGradients method from the package captum to computed class-wise feature importances and then computes entropy values to get a global importance measure.
- eval(data_loader, verbose=False)[source]¶
evaluates the model based on predictions
- Parameters
test_dataloader (torch-geometric dataloader, default=None) – the dataset on which the model is evaluated.
verbose (boolean, default=False) – whether to print out loss during training.
- Returns
accuracy (float) – accuracy
conf_mat (ndarray) – confusion matrix
precision (fload) – weighted precision score
recall (float) – weighted recall score
f1_score (float) – weighted f1 score
- fit(data_loader, epochs, test_dataloader=None, verbose=False)[source]¶
fits the classifier to the input data.
- Parameters
data_loader (torch-geometric dataloader) – the training dataset.
epochs (int) – number of epochs.
test_dataloader (torch-geometric dataloader, default=None) – the test dataset on which the model is evaluated in each epoch.
verbose (boolean, default=False) – whether to print out loss during training.
- interpret(data_loader, n_features, n_classes)[source]¶
interprets a trained model, by giving importance scores assigned to each feature regarding each class it uses the IntegratedGradients method from the package captum to computed class-wise feature importances and then computes entropy values to get a global importance measure.
utils¶
- sagenet.utils.compute_metrics(y_true, y_pred)[source]¶
Computes prediction quality metrics.
- Parameters
y_true (1d array-like, or label indicator array / sparse matrix) – Ground truth (correct) labels.
y_pred (1d array-like, or label indicator array / sparse matrix) – Predicted labels, as returned by a classifier.
- Returns
accuracy (accuracy)
conf_mat (confusion matrix)
precision (weighted precision score)
recall (weighted recall score)
f1 (weighted f1 score)
- sagenet.utils.get_dataloader(graph, X, y, batch_size=1, undirected=True, shuffle=True, num_workers=0)[source]¶
Converts a graph and a dataset to a dataloader.
- Parameters
graph (igraph object) – The underlying graph to be fed to the graph neural networks.
X (numpy ndarray) – Input dataset with columns as features and rows as observations.
y (numpy ndarray) – Class labels.
batch_size (int, default=1) – The batch size.
undirected (boolean) – if the input graph is undirected (symmetric adjacency matrix).
shuffle (boolean, default = True) – Wheather to shuffle the dataset to be passed to torch_geometric.data.DataLoader.
num_workers (int, default = 0) – Non-negative. Number of workers to be passed to torch_geometric.data.DataLoader.
- Returns
dataloader (a pytorch-geometric dataloader. All of the graphs will have the same connectivity (given by the input graph),)
but the node features will be the features from X.
- sagenet.utils.glasso(adata, alphas=5, n_jobs=None, mode='cd')[source]¶
Recustructs the gene-gene interaction network based on gene expressions in .X using a guassian graphical model estimated by glasso.
- Parameters
adata (AnnData) – The annotated data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.
alphas (int or array-like of shape (n_alphas,), dtype=`float`, default=`5`) – Non-negative. If an integer is given, it fixes the number of points on the grids of alpha to be used. If a list is given, it gives the grid to be used.
n_jobs (int, default None) – Non-negative. number of jobs.
- Returns
- Return type
adds an csr_matrix matrix under key adj to .varm.
References
Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432-441.
- sagenet.utils.kullback_leibler_divergence(X)[source]¶
Finds the pairwise Kullback-Leibler divergence matrix between all rows in X.
- Parameters
X (array_like, shape (n_samples, n_features)) – Array of probability data. Each row must sum to 1.
- Returns
D – The Kullback-Leibler divergence matrix. A pairwise matrix D such that D_{i, j} is the divergence between the ith and jth vectors of the given matrix X.
- Return type
ndarray, shape (n_samples, n_samples)
Notes
Based on code from Gordon J. Berman et al. (https://github.com/gordonberman/MotionMapper)
References
Berman, G. J., Choi, D. M., Bialek, W., & Shaevitz, J. W. (2014). Mapping the stereotyped behaviour of freely moving fruit flies. Journal of The Royal Society Interface, 11(99), 20140672.
- sagenet.utils.multinomial_rvs(n, p)[source]¶
Sample from the multinomial distribution with multiple p vectors.
- Parameters
n (int) – must be a scalar >=1
p (numpy ndarray) – must an n-dimensional he last axis of p holds the sequence of probabilities for a multinomial distribution.
- Returns
D – same shape as p
- Return type
ndarray