{ "cells": [ { "cell_type": "markdown", "id": "b3ce388c", "metadata": {}, "source": [ "# Clustering human single-cell RNA seq data with ACDC" ] }, { "cell_type": "markdown", "id": "9ee1cd3b", "metadata": {}, "source": [ "This Jupyter notebook implements the STAR Protocol for using ACDC to cluster human single-cell RNA-seq data." ] }, { "cell_type": "markdown", "id": "3f4b357e", "metadata": {}, "source": [ "## Steps 1-2: Install ACDC and its dependencies" ] }, { "cell_type": "markdown", "id": "adebce4a", "metadata": {}, "source": [ "Steps 1-2 entail installing ACDC, its dependencies and verifying that they are installed correctly. We will install CellO within an Anaconda environment. Make sure that Anaconda is installed, and then run the following commands:" ] }, { "cell_type": "markdown", "id": "3b5b0ac0", "metadata": {}, "source": [ "```\n", "CONDA_SUBDIR=osx-arm64 conda create -n acdc_env python=3.10.12 anaconda\n", "conda activate acdc_env\n", "pip install jax==0.4.23 jaxlib==0.4.23\n", "pip install scanpy==1.9.6\n", "pip install cellrank==2.0.6\n", "pip install viper-in-python==1.0.9\n", "pip install louvain==0.8.2\n", "pip install acdc-py==1.1.0\n", "pip install viperkernel==1.0.0\n", "```" ] }, { "cell_type": "markdown", "id": "743c9538", "metadata": {}, "source": [ "## Step 3. Import necessary Python packages" ] }, { "cell_type": "code", "execution_count": 1, "id": "509d6d87", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.\n", "/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/anndata/utils.py:429: FutureWarning: Importing read_csv from `anndata` is deprecated. Import anndata.io.read_csv instead.\n", " warnings.warn(msg, FutureWarning)\n", "/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/anndata/utils.py:429: FutureWarning: Importing read_text from `anndata` is deprecated. Import anndata.io.read_text instead.\n", " warnings.warn(msg, FutureWarning)\n", "/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/anndata/utils.py:429: FutureWarning: Importing read_excel from `anndata` is deprecated. Import anndata.io.read_excel instead.\n", " warnings.warn(msg, FutureWarning)\n", "/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/anndata/utils.py:429: FutureWarning: Importing read_mtx from `anndata` is deprecated. Import anndata.io.read_mtx instead.\n", " warnings.warn(msg, FutureWarning)\n", "/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/anndata/utils.py:429: FutureWarning: Importing read_loom from `anndata` is deprecated. Import anndata.io.read_loom instead.\n", " warnings.warn(msg, FutureWarning)\n", "/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/anndata/utils.py:429: FutureWarning: Importing read_hdf from `anndata` is deprecated. Import anndata.io.read_hdf instead.\n", " warnings.warn(msg, FutureWarning)\n", "/Users/AlexanderWang/opt/miniconda3/envs/acdc_py_env_v5/lib/python3.10/site-packages/anndata/utils.py:429: FutureWarning: Importing read_umi_tools from `anndata` is deprecated. Import anndata.io.read_umi_tools instead.\n", " warnings.warn(msg, FutureWarning)\n" ] } ], "source": [ "import pyviper\n", "import pandas as pd\n", "import numpy as np\n", "import scanpy as sc\n", "import acdc_py as acdc\n", "import anndata\n", "\n", "from cellrank.kernels import CytoTRACEKernel\n", "import scvelo as scv\n", "\n", "from viperkernel import VIPERKernel" ] }, { "cell_type": "markdown", "id": "bc079920", "metadata": {}, "source": [ "## Step 4: Load the expression matrix and regulatory network using Pandas and pyVIPER" ] }, { "cell_type": "markdown", "id": "c9fe4b19", "metadata": {}, "source": [ "Here we load intestinal stem cell data from:\n", "\n", "Malagola, E., Vasciaveo, A., Ochiai, Y., Kim, W., Zheng, B., Zanella, L., ... & Wang, T. C. (2024). Isthmus progenitor cells contribute to homeostatic cellular turnover and support regeneration following intestinal injury. *Cell*, 187(12), 3056-3071." ] }, { "cell_type": "code", "execution_count": 2, "id": "793ac00d", "metadata": {}, "outputs": [], "source": [ "data_folder = 'https://raw.githubusercontent.com/califano-lab/acdc_py_data/main/'\n", "\n", "gex_df = pd.read_parquet(data_folder + 'TE001_counts.parquet.gzip')\n", "gex_data = anndata.AnnData(gex_df)\n", "\n", "TE001_net_table = pd.read_parquet(data_folder + 'TE001_mc_unPruned.parquet.gzip')\n", "TE001_net = pyviper.Interactome(\"TE001\", TE001_net_table)\n", "\n", "umap_coordinates = pd.read_parquet(data_folder + 'TE001_umap.parquet.gzip')" ] }, { "cell_type": "markdown", "id": "0e6fe1ba", "metadata": {}, "source": [ "gex_data contains the counts for the TE001 dataset." ] }, { "cell_type": "code", "execution_count": 3, "id": "4d02b51e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 3656 × 15385" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gex_data" ] }, { "cell_type": "markdown", "id": "6bfc36ba", "metadata": {}, "source": [ "TE001_net is our ARACNe regulatory network for the TE001 dataset. ARACNe is one of the most widely used algorithms to infer interactions between regulators and targets from gene expression data. " ] }, { "cell_type": "code", "execution_count": 4, "id": "5b450254", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Object of class Interactome:\n", "\tName: TE001\n", "\tNumber of regulons: 3783\n", "\tMedian average targets per regulon: 125.0\n", "\tMedian average abs(mor) per regulon: 0.7404698408289692\n", "\tMedian average likelihood per regulon: 0.2122653236662364\n" ] } ], "source": [ "print(TE001_net)" ] }, { "cell_type": "code", "execution_count": 5, "id": "704e3ca5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | regulator | \n", "target | \n", "mor | \n", "likelihood | \n", "
|---|---|---|---|---|
| 0 | \n", "1700019D03Rik | \n", "Cox6c | \n", "-0.999663 | \n", "0.339867 | \n", "
| 1 | \n", "1700019D03Rik | \n", "Slc25a3 | \n", "-0.998794 | \n", "0.332123 | \n", "
| 2 | \n", "1700019D03Rik | \n", "Prap1 | \n", "-0.998398 | \n", "0.335595 | \n", "
| 3 | \n", "1700019D03Rik | \n", "Dbi | \n", "-0.999244 | \n", "0.332498 | \n", "
| 4 | \n", "1700019D03Rik | \n", "Mtch2 | \n", "-0.998960 | \n", "0.329374 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 914823 | \n", "Zzz3 | \n", "Vmp1 | \n", "-0.999891 | \n", "0.301922 | \n", "
| 914824 | \n", "Zzz3 | \n", "Dbi | \n", "-0.993341 | \n", "0.297528 | \n", "
| 914825 | \n", "Zzz3 | \n", "Ahctf1 | \n", "0.999916 | \n", "0.260867 | \n", "
| 914826 | \n", "Zzz3 | \n", "Set | \n", "0.999908 | \n", "0.282083 | \n", "
| 914827 | \n", "Zzz3 | \n", "Hmgb1 | \n", "0.999883 | \n", "0.299041 | \n", "
914828 rows × 4 columns
\n", "