Clustering human single-cell RNA seq data with ACDC

This Jupyter notebook implements the STAR Protocol for using ACDC to cluster human single-cell RNA-seq data.

Steps 1-2: Install ACDC and its dependencies

Steps 1-2 entail installing ACDC, its dependencies and verifying that they are installed correctly. We will install CellO within an Anaconda environment. Make sure that Anaconda is installed, and then run the following commands:

CONDA_SUBDIR=osx-arm64 conda create -n acdc_env python=3.10.12 anaconda
conda activate acdc_env
pip install jax==0.4.23 jaxlib==0.4.23
pip install scanpy==1.9.6
pip install cellrank==2.0.6
pip install viper-in-python==1.0.9
pip install louvain==0.8.2
pip install acdc-py==1.1.0
pip install viperkernel==1.0.0

Step 3. Import necessary Python packages

[1]:

import pyviper
import pandas as pd
import numpy as np
import scanpy as sc
import acdc_py as acdc
import anndata

from cellrank.kernels import CytoTRACEKernel
import scvelo as scv

from viperkernel import VIPERKernel

OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/anndata/utils.py:429: FutureWarning: Importing read_csv from `anndata` is deprecated. Import anndata.io.read_csv instead.
  warnings.warn(msg, FutureWarning)
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/anndata/utils.py:429: FutureWarning: Importing read_text from `anndata` is deprecated. Import anndata.io.read_text instead.
  warnings.warn(msg, FutureWarning)
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/anndata/utils.py:429: FutureWarning: Importing read_excel from `anndata` is deprecated. Import anndata.io.read_excel instead.
  warnings.warn(msg, FutureWarning)
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/anndata/utils.py:429: FutureWarning: Importing read_mtx from `anndata` is deprecated. Import anndata.io.read_mtx instead.
  warnings.warn(msg, FutureWarning)
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/anndata/utils.py:429: FutureWarning: Importing read_loom from `anndata` is deprecated. Import anndata.io.read_loom instead.
  warnings.warn(msg, FutureWarning)
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/anndata/utils.py:429: FutureWarning: Importing read_hdf from `anndata` is deprecated. Import anndata.io.read_hdf instead.
  warnings.warn(msg, FutureWarning)
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/anndata/utils.py:429: FutureWarning: Importing read_umi_tools from `anndata` is deprecated. Import anndata.io.read_umi_tools instead.
  warnings.warn(msg, FutureWarning)

Step 4: Load the expression matrix and regulatory network using Pandas and pyVIPER

Here we load intestinal stem cell data from:

Malagola, E., Vasciaveo, A., Ochiai, Y., Kim, W., Zheng, B., Zanella, L., … & Wang, T. C. (2024). Isthmus progenitor cells contribute to homeostatic cellular turnover and support regeneration following intestinal injury. Cell, 187(12), 3056-3071.

[2]:

data_folder = 'https://raw.githubusercontent.com/califano-lab/acdc_py_data/main/'

gex_df = pd.read_parquet(data_folder + 'TE001_counts.parquet.gzip')
gex_data = anndata.AnnData(gex_df)

TE001_net_table = pd.read_parquet(data_folder + 'TE001_mc_unPruned.parquet.gzip')
TE001_net = pyviper.Interactome("TE001", TE001_net_table)

umap_coordinates = pd.read_parquet(data_folder + 'TE001_umap.parquet.gzip')

gex_data contains the counts for the TE001 dataset.

[3]:

gex_data

[3]:

AnnData object with n_obs × n_vars = 3656 × 15385

TE001_net is our ARACNe regulatory network for the TE001 dataset. ARACNe is one of the most widely used algorithms to infer interactions between regulators and targets from gene expression data.

[4]:

print(TE001_net)

Object of class Interactome:
        Name: TE001
        Number of regulons: 3783
        Median average targets per regulon: 125.0
        Median average abs(mor) per regulon: 0.7404698408289692
        Median average likelihood per regulon: 0.2122653236662364

[5]:

TE001_net.net_table

[5]:

	regulator	target	mor	likelihood
0	1700019D03Rik	Cox6c	-0.999663	0.339867
1	1700019D03Rik	Slc25a3	-0.998794	0.332123
2	1700019D03Rik	Prap1	-0.998398	0.335595
3	1700019D03Rik	Dbi	-0.999244	0.332498
4	1700019D03Rik	Mtch2	-0.998960	0.329374
...	...	...	...	...
914823	Zzz3	Vmp1	-0.999891	0.301922
914824	Zzz3	Dbi	-0.993341	0.297528
914825	Zzz3	Ahctf1	0.999916	0.260867
914826	Zzz3	Set	0.999908	0.282083
914827	Zzz3	Hmgb1	0.999883	0.299041

914828 rows × 4 columns

Step 5: Preprocessing Data

5.a Compute CytoTRACE

CytoTRACE by default uses imputed data - a simple way to compute k-NN imputed data is to use scVelo’s moments function. However, note that this function expects spliced counts because it’s designed for RNA velocity, so we’re using a simple hack here:

[6]:

if 'spliced' not in gex_data.layers or 'unspliced' not in gex_data.layers:
    gex_data.layers['spliced'] = gex_data.X
    gex_data.layers['unspliced'] = gex_data.X
scv.pp.moments(gex_data) # hack for CytoTRACEkernel

WARNING: Did not normalize spliced as it looks processed already. To enforce normalization, set `enforce=True`.
WARNING: Did not normalize unspliced as it looks processed already. To enforce normalization, set `enforce=True`.
Normalized count data: X.

/var/folders/l_/wl8wsy81535frxdvqj0ltm280000gn/T/ipykernel_70891/3394800788.py:4: DeprecationWarning: Automatic neighbor calculation is deprecated since scvelo==0.4.0 and will be removed in a future version of scVelo. Please compute neighbors first with Scanpy.
  scv.pp.moments(gex_data) # hack for CytoTRACEkernel
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scvelo/preprocessing/moments.py:71: DeprecationWarning: `neighbors` is deprecated since scvelo==0.4.0 and will be removed in a future version of scVelo. Please compute neighbors with Scanpy.
  neighbors(
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scvelo/preprocessing/neighbors.py:233: DeprecationWarning: Automatic computation of PCA is deprecated since scvelo==0.4.0 and will be removed in a future version of scVelo. Please compute PCA with Scanpy first.
  _set_pca(adata=adata, n_pcs=n_pcs, use_highly_variable=use_highly_variable)

computing neighbors
    finished (0:00:04) --> added
    'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
    finished (0:00:08) --> added
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)

We can then use the CellRank CytoTRACE kernel to compute the stemness index ct_score

[7]:

ctk = CytoTRACEKernel(gex_data) # initialize the CellRank2 kernel

[8]:

ctk = ctk.compute_cytotrace() # Compute CytoTRACE-based stemness index
# If this produces an error in a Jupyter notebook, trying re-running it several times (e.g. 5X).

5.b Filter, normalize, log-transform and scale the data

[9]:

sc.pp.filter_genes(gex_data, min_cells=3)
sc.pp.normalize_total(gex_data, target_sum=1e4)
sc.pp.log1p(gex_data)
sc.pp.scale(gex_data, max_value=10)

5.c. Prepare the network

If regulator A has more targets than regulator B, this inflates its NES, creating bias towards A appearing more activated than B. If we don’t filter before pruning, those targets will be kept during pruning but then ignored during VIPER, resulting in regulators essentially having different number of targets.

[10]:

TE001_net.filter_targets(gex_data.var_names)

Removed 11 targets.

/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/pandas/core/algorithms.py:522: DeprecationWarning: np.find_common_type is deprecated.  Please use `np.result_type` or `np.promote_types`.
See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more information.  (Deprecated NumPy 1.25)
  common = np.find_common_type([values.dtype, comps_array.dtype], [])

Prune the network so that every regulator has exactly 50 targets.

[11]:

TE001_net.prune()

Removed 755417 targets.
Removed 595 regulators.

/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/pandas/core/algorithms.py:522: DeprecationWarning: np.find_common_type is deprecated.  Please use `np.result_type` or `np.promote_types`.
See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more information.  (Deprecated NumPy 1.25)
  common = np.find_common_type([values.dtype, comps_array.dtype], [])

[12]:

print(TE001_net)

Object of class Interactome:
        Name: TE001
        Number of regulons: 3188
        Median average targets per regulon: 50.0
        Median average abs(mor) per regulon: 0.7825995206922147
        Median average likelihood per regulon: 0.24828297542408723

Step 6. Compute VIPER (Virtual Inference of Proteins by Enriched Regulon analysis)

VIPER uses the expression of regulatory targets of a given protein, such as the targets of a transcription factor (TF), as an accurate reporter of its activity. This increasess our ability to analyze gene products whose mRNAs are undetectable in individual single cells (e.g. dropout effect).

[13]:

pax_data = pyviper.viper(gex_data, TE001_net)

/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/pandas/core/algorithms.py:522: DeprecationWarning: np.find_common_type is deprecated.  Please use `np.result_type` or `np.promote_types`.
See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more information.  (Deprecated NumPy 1.25)
  common = np.find_common_type([values.dtype, comps_array.dtype], [])
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/pandas/core/algorithms.py:522: DeprecationWarning: np.find_common_type is deprecated.  Please use `np.result_type` or `np.promote_types`.
See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more information.  (Deprecated NumPy 1.25)
  common = np.find_common_type([values.dtype, comps_array.dtype], [])

Preparing the association scores
Computing regulons enrichment with aREA
Rank transforming the data
Preparing the 1-tailed / 2-tailed matrices
Computing the likelihood matrix
Computing the modes matrix
Computing 2-tail enrichment
Computing 1-tail enrichment
Integrating enrichment

We captured the activity of 3188 regulatory proteins from our single cell gene expression profiles.

[14]:

pax_data

[14]:

AnnData object with n_obs × n_vars = 3656 × 3188
    obs: 'n_counts', 'ct_score', 'ct_pseudotime', 'ct_num_exp_genes'
    uns: 'gex_data'

Step 7. Identify clusters of cells with ACDC

7.a. Perform principal components analysis (PCA)

[15]:

sc.tl.pca(pax_data, n_comps = 30)

7.b. Run ACDC

[16]:

acdc.config.set_clust_alg(clust_alg='Louvain')
acdc.config.set_corr_distance_dtype(dtype=np.float64)
acdc.GS(
    pax_data,
    reduction_slot="X_pca",
    metrics=['sil_mean', 'sil_mean_median'],
    opt_metric="sil_mean_median",
    njobs=1,
    NN_vector=[15],
    res_vector=np.arange(0.05, 2.0, step=0.05),
    dist_slot=None,
    key_added="clusters"
)

Computing distance object...

100%|█████████████████████████████████████████████| 4/4 [00:00<00:00, 25.16it/s]

Computing neighbors...

100%|█████████████████████████████████████████████| 4/4 [00:01<00:00,  2.62it/s]

Beginning GridSearch clustering...

GridSearch: 100%|███████████████████████████████| 39/39 [00:05<00:00,  7.74it/s]
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/pandas/core/dtypes/cast.py:1641: DeprecationWarning: np.find_common_type is deprecated.  Please use `np.result_type` or `np.promote_types`.
See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more information.  (Deprecated NumPy 1.25)
  return np.find_common_type(types, [])

7.c. Visualize clusters

[17]:

pax_data.obsm["X_umap"] = umap_coordinates.values

[18]:

sc.pl.umap(pax_data, color = "clusters", add_outline = True)

/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:371: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  ax.scatter(
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:381: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  ax.scatter(
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  cax = scatter(

_images/ACDC_ICS_Jupyter_Notebook_Tutorial_42_1.png

Step 8. Compare clusters vs SS

Because here we see that 4 clusters has a similar silhouette score as 2 clusters. Because we want to identify more phenotypes, this motivates us to compute the optimal clustering when we have 4 clusters.

[19]:

acdc.pl.metric_vs_n_clusts(pax_data)

/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/pandas/core/dtypes/cast.py:1641: DeprecationWarning: np.find_common_type is deprecated.  Please use `np.result_type` or `np.promote_types`.
See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more information.  (Deprecated NumPy 1.25)
  return np.find_common_type(types, [])

_images/ACDC_ICS_Jupyter_Notebook_Tutorial_45_1.png

Step 9. Compute best solution with 4 clusters

9.a. Have ACDC use the existing search data to compute the optimal 4-cluster solution

ACDC uses the existing GridSearch data from our previous run to identify the optimal parameters when we have a preset number of clusters.

[20]:

acdc.get_opt.GS_metric_search_data(
    pax_data,
    opt_metric='sil_mean',
    n_clusts=4
)

/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/pandas/core/dtypes/cast.py:1641: DeprecationWarning: np.find_common_type is deprecated.  Please use `np.result_type` or `np.promote_types`.
See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more information.  (Deprecated NumPy 1.25)
  return np.find_common_type(types, [])

[20]:

iter                    1.0
n_pcs                  30.0
resolution              0.1
knn                    15.0
n_clust                 4.0
subsamp_iter            1.0
seed                    1.0
sil_mean           0.383708
sil_mean_median    0.485878
Name: 1, dtype: object

ACDC can use this to compute the optimal clustering when n_clusts=4.

[21]:

acdc.get_opt.GS_clustering(
    pax_data,
    opt_metric='sil_mean_median',
    n_clusts=4
)

Computing distance object...

100%|█████████████████████████████████████████████| 4/4 [00:00<00:00, 51.59it/s]

Computing neighbor graph with 15 neighbors...

100%|█████████████████████████████████████████████| 4/4 [00:00<00:00, 68.65it/s]

Clustering with resolution 0.1 using Louvain...

9.b. Visualize the 4-cluster solution

[22]:

sc.pl.umap(pax_data, color = 'clusters', add_outline = True)

/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:371: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  ax.scatter(
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:381: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  ax.scatter(
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  cax = scatter(

_images/ACDC_ICS_Jupyter_Notebook_Tutorial_53_1.png

9.c. Visualize the silhouette scores of this solution

[23]:

acdc.pl.silhouette_scores(pax_data, groupby = "clusters", dist_slot = "corr_dist")

_images/ACDC_ICS_Jupyter_Notebook_Tutorial_55_0.png

Step 10. Classify the major lineages of the clusters.

10.a. Plot the CytoTRACE score to identify stem clusters

This shows clusters 0 and 1 as stem cell clusters, while 2 and 3 are differentiated.

[24]:

sc.pl.umap(pax_data, color = 'ct_score', add_outline = True)

/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:371: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored
  ax.scatter(
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:381: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored
  ax.scatter(

_images/ACDC_ICS_Jupyter_Notebook_Tutorial_59_1.png

10.b. Plot markers to identify stem vs absorptive vs secretory

[25]:

markers = {
    "stem" : ["Smarca5", "Rbbp7", "Dek"],
    'absorptive': ["Alpi", "Sis", "Fabp1"],
    'secretory' : ['Muc2', 'Mmp7', "Dclk1"]
}

pax_data.uns['gex_data'].obs['clusters'] = pax_data.obs['clusters']
pyviper.pl.dotplot(
    pax_data,
    groupby = "clusters",
    var_names = markers,
    plot_stored_pax_data = False,
    plot_stored_gex_data = True,
    cmap = "Greens"
)

/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_dotplot.py:747: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored
  dot_ax.scatter(x, y, **kwds)

_images/ACDC_ICS_Jupyter_Notebook_Tutorial_61_1.png

[26]:

acdc.tl.rename(pax_data, groupby = "clusters", name_dict = {
    '0': 'stem_0',
    '1': 'stem_1',
    '2': 'absorptive',
    '3': 'secretory'
})

[27]:

sc.pl.umap(pax_data, color = "clusters", add_outline = True)

/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:371: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  ax.scatter(
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:381: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  ax.scatter(
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  cax = scatter(

_images/ACDC_ICS_Jupyter_Notebook_Tutorial_63_1.png

Step 11. Identify remaining subclusters

We identified subclusters of the stem lineage, but still need to do the same for absorptive and secretory cells. ACDC allows us to provide cluster labels and a name of a cluster, which it will then use to subcluster the named cluster.

11.a. Subcluster the absorptive cells

[28]:

acdc.GS(
    pax_data,
    cluster_labels = 'clusters',
    cluster_name = 'absorptive',
    key_added = 'subclusters'
)

Computing distance object...
Computing neighbors...

100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  9.09it/s]

Beginning GridSearch clustering...

GridSearch: 100%|█████████████████████████████| 100/100 [00:05<00:00, 18.20it/s]
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/pandas/core/dtypes/cast.py:1641: DeprecationWarning: np.find_common_type is deprecated.  Please use `np.result_type` or `np.promote_types`.
See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more information.  (Deprecated NumPy 1.25)
  return np.find_common_type(types, [])

We have the ability to visualize the search space when subclustering, showing where more optimal solutions are found (higher ave sil score) and where the number of clusters increases.

[29]:

acdc.pl.GS_search_space(pax_data, plot_type='sil_mean')

[29]:

_images/ACDC_ICS_Jupyter_Notebook_Tutorial_69_0.png

[30]:

acdc.pl.GS_search_space(pax_data, plot_type='n_clust')

[30]:

_images/ACDC_ICS_Jupyter_Notebook_Tutorial_70_0.png

[31]:

sc.pl.umap(pax_data, color=['clusters', 'subclusters'], wspace=0.25, add_outline=True)

/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:371: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  ax.scatter(
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:381: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  ax.scatter(
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  cax = scatter(
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:371: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  ax.scatter(
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:381: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  ax.scatter(
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  cax = scatter(

_images/ACDC_ICS_Jupyter_Notebook_Tutorial_71_1.png

11.b. Subcluster the secretory cells

[32]:

acdc.GS(
    pax_data,
    cluster_labels = 'subclusters',
    cluster_name = 'secretory',
    key_added = 'subclusters'
)

Computing distance object...
Computing neighbors...

100%|█████████████████████████████████████████████| 1/1 [00:00<00:00, 63.32it/s]

Beginning GridSearch clustering...

GridSearch: 100%|████████████████████████████| 100/100 [00:00<00:00, 114.58it/s]
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/pandas/core/dtypes/cast.py:1641: DeprecationWarning: np.find_common_type is deprecated.  Please use `np.result_type` or `np.promote_types`.
See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more information.  (Deprecated NumPy 1.25)
  return np.find_common_type(types, [])

11.c. Visualize the final clusters

We can see the 2 stem subclusters, 2 absorptive subclusters and 3 secretory subclusters.

[33]:

sc.pl.umap(pax_data, color = 'subclusters', add_outline = True)

/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:371: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  ax.scatter(
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:381: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  ax.scatter(
/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored
  cax = scatter(

_images/ACDC_ICS_Jupyter_Notebook_Tutorial_76_1.png

11.d. Visualize the silhouette scores of this solution

[34]:

acdc.pl.silhouette_scores(pax_data, groupby = "subclusters", dist_slot = "corr_dist")

_images/ACDC_ICS_Jupyter_Notebook_Tutorial_78_0.png

Step 12. Compute the differentiation velocity

CellRank allows us to a transition matrix, where each cell has probabilities of differentiating into its neighbors. By doing so, we can compute velocity. The CytoTRACE stemness score can be used to predict differentiation.

[35]:

ctk = ctk.compute_transition_matrix(threshold_scheme="soft",nu=0.5)

[36]:

# Transfer
ctk.adata.obsm["X_umap"] = pax_data.obsm["X_umap"]
ctk.adata.obs["subclusters"] = pax_data.obs["subclusters"]

The CytoTRACE velocity shows the stem cells differentiating into absorptive and secretory cells.

[37]:

ctk.plot_projection(basis="umap", color="subclusters", legend_loc="right", show=True)

/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/pandas/core/algorithms.py:522: DeprecationWarning: np.find_common_type is deprecated.  Please use `np.result_type` or `np.promote_types`.
See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more information.  (Deprecated NumPy 1.25)
  common = np.find_common_type([values.dtype, comps_array.dtype], [])

_images/ACDC_ICS_Jupyter_Notebook_Tutorial_84_1.png

[38]:

ctk.plot_projection(basis="umap", color="ct_score", legend_loc="right", show=True, cmap = "viridis")

_images/ACDC_ICS_Jupyter_Notebook_Tutorial_85_0.png

A VIPER-based velocity approach shows a similar result to CytoTRACE, thereby supporting the results of the previous velocity approach. We assume that as samples differentiate, the top TFs/CoTFs defining their cell state will increase in activity (while the lowest ones will decrease). Velocity is computed by identifying the top 50 and bottom 50 top regulators of each sample from its VIPER activity, along with its 15 nearest neighbors (NN) using Scanpy. NN with greater increases of the top regulators and greater decreases of bottom regulators receive a higher score. These scores are converted into probabilities, giving us a transition matrix.

[39]:

vpk = VIPERKernel(pax_data)

100%|█████████████████████████████████████| 3656/3656 [00:01<00:00, 1852.09it/s]

[40]:

vpk.plot_projection(basis="umap", color="subclusters", legend_loc="right", show=True)

/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/pandas/core/algorithms.py:522: DeprecationWarning: np.find_common_type is deprecated.  Please use `np.result_type` or `np.promote_types`.
See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more information.  (Deprecated NumPy 1.25)
  common = np.find_common_type([values.dtype, comps_array.dtype], [])

_images/ACDC_ICS_Jupyter_Notebook_Tutorial_88_1.png

[41]:

vpk.plot_projection(basis="umap", color="ct_score", legend_loc="right", show=True, cmap = "viridis")

_images/ACDC_ICS_Jupyter_Notebook_Tutorial_89_0.png