A multi-omic atlas of human embryonic skeletal development – Nature
Sample acquisition and ethics
Developing human limb and cranium tissue samples were obtained from elective terminations under REC 96/085 with written and informed consent obtained from all sample donors (East of England, with full approval from the Cambridge Central Research Ethics Committee). In brief, samples were kept suspended in PBS and at −4 °C on ice during dissection. Shoulder, hip and knee joints were dissected en-bloc from the limbs. For the shoulder joint, a proximal incision was made at the distal third of the clavicle, and a distal incision was created at the neck of the humerus. For embryonic shoulder samples where distinctive bone features had not formed, approximations were made to capture the entirety of the glenohumeral and acromioclavicular joints. For the cranium samples (less than 8 PCW), two regions were dissected for each of the calvaria and skull base, separated at the posterior border of the frontal bone in both cases. For older cranial samples (more than 8 PCW), tissues were dissected to separate the frontal, parietal, sphenoid, ethmoid, occipital and temporal bones where feasible. Samples were initially embedded in optimal cutting temperature medium and frozen at −80 °C on an isopentane-dry ice slurry. Cryosections were then cut at a thickness of 10 μm using a Leica CM1950 cryostat and placed onto SuperFrost Plus slides (VWR) for ISS or Visium CytAssist, or used directly for single-nucleus processing. For samples used in whole-mount immunostaining, samples were obtained from terminations of pregnancy with written and informed consent obtained from all sample donors. Samples were provided by INSERM’s HuDeCA Biobank and utilized in compliance with French regulations. Full authorization to use these tissues was granted by the French agency for biomedical research (Agence de la Biomédecine, Saint-Denis La Plaine, France; PFS19-012) and the INSERM Ethics Committee (IRB00003888).
ISS and high-resolution imaging
ISS was performed using the 10X Genomics CARTANA HS Library Preparation Kit (1110-02, following protocol D025) and the In Situ Sequencing Kit (3110-02, following protocol D100), which comprise a commercialized version of HybRISS63. Probe panel design was based on fold-change thresholds in cell states of the limbs (Supplementary Table 3). In brief, cryosections of developing limbs were fixed in 3.7% formaldehyde (252549, Merck) in PBS for 30 min and washed twice in PBS for 1 min each before permeabilization. Sections were briefly digested with 0.5 mg ml−1 pepsin (P7012, Merck) in 0.1 M HCl (10325710, Fisher) at 37 °C for 15 s (5 PCW) or 30 s (6 PCW and older), then washed twice again in PBS, all at room temperature. Following dehydration in 70% and 100% ethanol for 2 min each, a 9-mm diameter (50 μl volume) SecureSeal hybridization chamber (GBL621505-20EA, Grace Bio-Labs) was adhered to each slide and used to hold subsequent reaction mixtures. Following rehydration in buffer WB3, probe hybridization in buffer RM1 was conducted for 16 h at 37 °C. The 158-plex probe panel included 5 padlock probes per gene, the sequences of which are proprietary (10X Genomics CARTANA). The section was washed with PBS-T (PBS with 0.05% Tween-20) twice, then with buffer WB4 for 30 min at 37 °C, and three times again with PBS-T. Probe ligation in RM2 was conducted for 2 h at 37 °C, and the section was washed three times with PBS-T, then rolling circle amplification in RM3 was conducted for 18 h at 30 °C. Following PBS-T washes, all rolling circle products (RCPs) were hybridized with LM (Cy5-labelling mix with DAPI) for 30 min at room temperature, the section was washed with PBS-T and dehydrated with 70% and 100% ethanol. The hybridization chamber was removed and the slide mounted with SlowFade Gold Antifade Mountant (S36937, Thermo). Imaging of Cy5-labelled RCPs at this stage acted as a quality control step to confirm RCP (‘anchor’) generation and served to identify spots during decoding. Imaging was conducted using a Perkin Elmer Opera Phenix Plus High-Content Screening System in confocal mode with 1-μm z-step size, using a 63× (NA 1.15, 0.097 μm pixel−1) water-immersion objective. For channels: DAPI (excitation of 375 nm and emission of 435–480 nm), Atto 425 (excitation 425 nm and emission 463–501 nm), Alexa Fluor 488 (excitation 488 nm and emission 500–550 nm), Cy3 (excitation 561 nm and emission 570–630 nm) and Cy5 (excitation 640 nm and emission 650–760 nm). Following imaging, each slide was de-coverslipped vertically in PBS (gently, with minimal agitation such that the coverslip ‘fell’ off to prevent damage to the tissue). The section was dehydrated with 70% and 100% ethanol, and a new hybridization chamber was secured to the slide. The previous cycle was stripped using 100% formamide (AM9342, Thermo), which was applied fresh each minute for 5 min, then washed with PBS-T. Barcode labelling was conducted using two rounds of hybridization, first an adapter probe pool (AP mixes AP1-AP6, in subsequent cycles), then a sequencing pool (SP mix, customized with Atto 425), each for 1 h at 37 °C with PBS-T washes in between and after. The section was dehydrated, the chamber removed, and the slide mounted and imaged as previously described. This was repeated another five times to generate the full dataset of seven cycles (anchor and six barcode bits).
Whole-mount immunostaining, tissue clearing and image analysis
Specimens were decalcified by incubation during 1 week in EDTA 0.5 M under agitation at room temperature. The solution was renewed halfway through the incubation period. The samples were washed twice in PBS 1X during 1 day. Samples were dehydrated for 1 h at room temperature in ascending concentrations of methanol in H2O (20%, 40%, 60% and 80%). Then, samples were placed overnight under white light (11 W and 3,000 K°) and rolling agitation (004011000, IKA) with a 6% hydrogen peroxide solution in 100% methanol. Samples were rehydrated for 1 h at room temperature in descending concentrations of methanol (80%, 60%, 40% and 20%), washed twice and blocked in 0.2% PBS-gelatin with 0.5% Triton X-100 (PBSGT) solution during 1 week. Samples were transferred to a solution containing the primary antibodies (osterix, 1/500; ab209484, Abcam and collagen2, 1/500; ab185430, Abcam) diluted in PBSGT and were incubated at 37 °C with agitation at 20 rpm for 14 days. This was followed by six washes of 1 h in PBSGT at room temperature. Next, secondary antibodies were diluted in PBSGT and passed through a 0.22-μm filter. Samples were incubated at 37 °C in the secondary antibody solution for 7 days and washed six times during 1 h in PBSGT at room temperature.
The iDISCO+ protocol was used to clear the samples64. Samples were placed in TPP (Techno Plastic Products) tubes, dehydrated for 1 h in methanol (20%, 40%, 60%, 80% and 100% (2x)) under rotating agitation (SB3, Stuart). Methanol volumes were equal to about five times the sample volume. The samples were next incubated in a solution of 67% DCM and 33% MeOH overnight followed by 100% DCM for 30 min at room temperature on a rotator, then put in 100% DBE. Cleared samples were imaged with a Blaze light-sheet microscope (Miltenyi Biotec) equipped with sCMOS camera 5.5MP (2,560 × 2,160 pixels) controlled by Imspector Pro 7.5.3 acquisition software (Miltenyi Biotec). The light sheet, of 4 µm thickness, was generated by lasers at four different wavelengths (488 nm, 561 nm, 639 nm and 785 nm). 1× or 4× objectives with different magnification lenses of ×0.6, ×1 and ×1.66 were used. Samples were supported by a sample holder from Miltenyi and placed in a tank filled with DBE and illuminated by the laser light sheet from one or both sides depending on the size of the samples. LightSpeed Mode was used during acquisition to acquire these images in a reasonable time and at a suitable resolution. Mosaics of 3D image tiles were assembled with an overlap of 10% between the tiles. The images were acquired in a 16 bits TIFF format. Images were initially processed using MACS iQ View Software, which performed automatic alignment of the tiles. Stack images were converted to imaris file (.ims) using ImarisFileConverter. To isolate a specific structure in Imaris, we used the surface tool with manual selection, and then used the surface to mask the image. Images and videos were taken by using either the function Snapshot and Animation in Imaris. Adobe Photoshop (v25.2) was used to colour the suture areas.
Multiplexed smFISH
Cryosections were processed using a Leica BOND RX to automate staining with the RNAscope Multiplex Fluorescent Reagent Kit v2 assay (Advanced Cell Diagnostics and Bio-Techne), according to the manufacturers’ instructions. Probes can be found in Supplementary Table 7. Before staining, fresh frozen sections were post-fixed in 4% paraformaldehyde in PBS for 45 min at 4 °C, then dehydrated through a series of 50%, 70%, 100% and 100% ethanol for 5 min each. Following manual pre-treatment, automated processing included digestion with Protease III for 15 min before probe hybridization. Tyramide signal amplification with Opal 520, Opal 570 and Opal 650 (Akoya Biosciences), TSA-biotin (TSA Plus Biotin Kit, Perkin Elmer) and streptavidin-conjugated Atto 425 (Sigma-Aldrich) was used to develop RNAscope probe channels. Stained sections were imaged as for ISS above.
Flow cytometry cell sorting
We applied whole-cell dissociation of fresh donor tissue as previously described5. Before cell extraction, the sample tissues (approximately 9 PCW shoulder joints) were dissected to obtain bone samples, and soft tissues were microdissected away. The resultant cell suspension was stained with DAPI (Invitrogen) for live-viability, FGFR3 antibody (1:50; MA5-38521, Thermo Fisher Scientific) and TACR3 antibody (1:50; BS-0166R, Thermo Fisher Scientific), and secondary antibodies. DAPI-positive singlet cells were gated for DAPI staining by FACS using a BigFoot Spectral Cell Sorter (Thermo Fisher Scientific) and its proprietary software. Sequential gating for FGFR3 and TACR3 was then conducted to identify double-positive cells. Positive controls for FGFR3 and TACR3 were conducted using human peripheral blood mononuclear cells, and unstained cells were used as negative controls.
Image-based ISS decoding
We used the ISS decoding pipeline outlined in Li et al.65. This pipeline consists of five distinct steps. First, we performed image stitching using Acapella scripts provided by Perkin Elmer, which generated two-dimensional maximum intensity projections of all channels for each cycle. Next, we used Microaligner66 (v1.0.0) to register all cycles based on DAPI signals using the default parameters. For cell segmentation, we utilized a scalable algorithm that leverages CellPose67 (v3.0) as the segmentation method. The expected cell size is set to 70 pixels in diameter and further expanded 10 pixels to mimic the cytoplasm. To decode the RNA molecules, we used the PoSTcode algorithm68 (v1.0) with the following parameters: rna_spot_size = 5, prob_threshold = 0.6, trackpy_percentile = 90 and trackpy_separation = 2. Furthermore, we assigned the decoded RNA molecules to segmented cells using STRtree (v2.0.6) and subsequently generated AnnData objects following the approach described by Virshup et al.69. Finally, only cells with more than four RNA molecules were kept for downstream analysis.
Visium processing and library preparation
Visium CytAssist Spatial Gene Expression for Fresh Frozen (10x Genomics) was performed following the manufacturer’s protocol. Regions of interest were selected based on the presence of microenvironments of bone formation relevant to the droplet data (for example, coronal suture) and aligned to the CytAssist machine gasket accordingly. Images were captured using a Hamamatsu S60 slide scanner at ×40 magnification before conducting the Visium CytAssist protocol for subsequent alignment. Libraries were mapped with SpaceRanger (10X Genomics).
Single-nucleus isolation and library preparation
Single nuclei were isolated from fresh frozen samples through cryosectioning followed by mechanical dissociation as described in previous work70. In brief, 10-μm sections were homogenized in homogenization buffer (250 mM sucrose, 25 mM KCl, 5 mM MgCl2, 10 mM Tris-HCl, 1 mM dithiothreitol, 1× protease inhibitor, 0.4 U μl−1 RNaseIn, 0.2 U μl−1 SUPERaseIn and 0.1% Triton X-100 in nuclease-free water) using a glass Dounce tissue grinder set (Merck). Samples were dissociated with 10–20 strokes of a loose pestle ‘A’ followed by 10 strokes of a tight pestle ‘B’ when tissue fragments remained. The resulting mixture was passed through a 50-μm cell strainer, followed by centrifugation (500g for 5 mins), the pellet was then resuspended in 300 μl of storage buffer (1× PBS, 4% BSA and 0.2 U μl−1 Protector RNaseIn) and passed through a 20-μm cell strainer. Nuclei were visualized and assessed for viability under microscopy following staining with trypan blue solution and were further processed for 10X Genomics Chromium Single Cell Multiome ATAC + Gene Expression according to the manufacturer’s protocol. Nucleus suspensions were loaded with a targeted nuclei recovery of 16,000 droplets per reaction. For some of the nucleus samples, mixtures of samples from different sample donors were pooled within one reaction and later demultiplexed by genotype. Quality control of cDNA and final libraries was done using Bioanalyzer High Sensitivity DNA Analysis (Agilent). Libraries were sequenced using a NovaSeq 6000 (Illumina) with a minimum sequencing depth of 20,000 read pairs per droplet.
Data preprocessing
Sequencing data were aligned to the human reference genome (GRCh38-2020-A-2.0.0) using CellRanger-ARC software (v2.0.0). The called barcodes from 10X Multiome lanes with pooled genotypes from multiple sample donors were demultiplexed per genotype using BAM outputs through Souporcell (v2.0)71. Subsequently, the Souporcell outputs were clustered by genotype for metadata assignment to each barcode. Visium data were mapped to SpaceRanger (v1.1.0) using default input settings, and low-resolution CytAssist images were aligned to hi-resolution microscopy images of the processed slides using 10X Genomics LoupeBrowser (v7.0) according to capture frame marker regions. For gene expression data, SoupX (v1.6.0)72 was applied to remove background ambient RNA. For CellRanger-ARC called matrices that contained more than 16,000 droplets (exceeding the number expected from targeted droplet recovery), we increased the estimated global rho value by 0.1 to account for the potential of additional ambient RNA. Droplets were filtered for more than 200 genes and less than 5% mitochondrial and ribosomal reads. Doublet removal is described below. For single-cell ATAC-seq, we applied ArchR73 (v1.0.2) to process the outputs from CellRanger-ARC. Initial per-droplet quality control was performed considering the number of unique nuclear fragments, signal-to-background ratio and the fragment size distribution. Moreover, droplets with transcription start site enrichment score 74 (v0.1.2) for normalization, latent semantic indexing dimension reduction and clustering analysis using BBKNN75 (v1.5.1) to correct for batch effects from anatomical regions and sample donors to obtain an ATAC embedding. Gene scores based on chromatin accessibility around gene bodies were calculated. We then applied MultiVI76 (via scVI v0.6.8) to construct a joint embedding for snRNA-seq and single-cell ATAC-seq. We also applied EmptyDropMultiome77 (v1.0.0) to repeat droplet calling to identify nucleus-containing droplets in our Multiome data to reduce the ambient RNA noise (‘soup’). By generalizing EmptyDrops to the multi-omic setting, we used the smallest droplets to create an RNA and an ATAC ambient RNA ‘soup’ profile, and then tested each droplet for statistical deviations from each of these two profiles, retaining only droplets that were statistically significantly different from the soup profile.
Doublet detection
All potential doublets detected in both RNA and ATAC modalities were removed from our data. For RNA data, Scrublet (v0.2.3)78 was applied to estimate doublet probability, and a score of more than 0.3 was used as a cut-off value. To apply a stringent doublet threshold, we conducted an adapted Scrublet workflow as previously described79. In brief, per-droplet Scrublet scores were first determined for CellRanger-ARC count matrices from each 10X Multiome (gene expression) lane independently. The droplets were then overclustered through the standard scanpy workflow using default parameters up to Leiden clustering. Each individual cluster was further clustered. A per-cluster median of Scrublet scores was computed. A normal distribution of doublet score, centred at the score median with a standard deviation estimated from the median absolute deviation, was used to compute P values for each of the clusters. After false discovery rate adjustment using Benjamini–Hochberg correction, a P > 0.65 was deemed as a cut-off value of good-quality cells, as doublets were significant outliers. For ATAC data, we first applied doublet detection methods from ArchR to remove putative ATAC doublets. In addition, homotypic and heterotypic doublets were characterized by running AMULET (v1.1.0) on individual snATAC-seq libraries, and droplets with q
Droplet cluster annotation
We adopted a hierarchical clustering approach by first conducting Leiden clustering on the global integrated scVI (v0.9.1; hidden layers = 256, latent variables = 52, dispersion = ‘gene-batch’) RNA embeddings to obtain broad clusters. To validate these, we used Celltypist to train a model on cell states in the embryonic limb bud5,80,81, and transferred labels onto our embedding for inspection. We utilized this information in addition to canonical marker genes to annotate broad clusters and subset sublineages. For sublineages (chondrocytes, fibroblasts, osteogenesis-related clusters, Schwann cells, immune cells and endothelial cells), we further embedded each subset using scVI (hidden layers = 256, latent variables = 52 and dispersion = ‘gene-batch’) and conducted Leiden clustering (resolution = 0.6), followed by differentially expressed gene (DEG) analyses (method = ‘wilcoxon’) to obtain cluster markers. We additionally utilized the inferred spatial location of cell states (described below) to inform annotations.
Differential abundance testing
We applied the Python implementation of the MILO package (v0.1.1) for differential abundance testing (http://github.com/emdann/milopy)82. We used the scVI latent representation to create a k-nearest neighbour graph of droplets in the relevant compartment and subsequently applied milopy to allocate droplets to overlapping neighbourhoods, with these droplets originating from multiple samples (brc_code). Each neighbourhood was then annotated as a cluster based on majority voting. We binarized values for anterior–posterior positions and calvarium-appendicular covariates to allow testing across these variables. We then determined log fold-change values for differential abundance and false discovery rate values based on the Bejanmini–Hochberg correction.
Spatial mapping using Cell2location
We performed Cell2location (v0.1.4) for deconvolution of Visium CytAssist voxels using our annotated Multiome data as inputs. Sample donor was used as the batch variable, and each library was considered a covariate in the regression model. For spatial mapping, we estimated 30 cells per voxel based on histological data, and set a hyperparameter detection alpha value of 20 for per-voxel normalization.
ISS-Patcher
ISS-Patcher is a package for approximating features not experimentally captured in low-dimensional data based on related high-dimensional data. It was developed as an approach to approximate expression signatures for genes missing in ISS data using matched snRNA-seq data as a reference in this study. First, a shared feature space between both datasets was identified by subsetting the 155–158 genes present in the ISS pool, followed by separate normalization to median total cell counts, log-transformation and z-scoring for both modalities. Then, the 15 nearest neighbours in scRNA-seq space were identified for each ISS cell with the Annoy Python package, and the genes absent from ISS were imputed as the average raw counts of the scRNA-seq neighbours.
Visium axis annotation using OrganAxis
Our Visium cranium sample was annotated with TissueTag8 using a semi-automatic mode to generate a one-dimensional maturation axis. Regions of the developing bone were first manually annotated based on haematoxylin and eosin features. Tissue regions that did not include bone-forming niches were excluded from annotation. The annotation categories that were stored included multiple regions of the coronal suture (level 0 to level 2 annotation), stemming from the central-most portion, an osteogenic front (level 3 annotation) with histological features of osteoprogenitors and osteogenic zones (level 4 to level 7 annotation) from the emergence of histological osteoblasts. All annotations were saved as TissueTag output format, which logs the annotation resolution, the pixels per micrometre and the pixel value interpretation of annotation names (for example, 0 = ‘suture’) and colours (for example, ‘osteogenic front’: ‘red’). To robustly and efficiently migrate TissueTag annotations to the Visium objects, we first transferred TissueTag annotations from pixel space to a high-resolution hexagonal grid space (15-µm spot diameter and 15-µm point-to-point centre distance with no gap between spots) based on the median pixel value of the centre of the spot (radius/4) in the annotated image. Next, to generate continuous annotations for Visium data, for each spot in the hexagonal high-resolution grid, we measured the mean Euclidean distance to the ten nearest points from each annotated structure in the level 0 annotation and the distance from the closest point for structures in level 1 annotation. All annotations were mapped to the Visium spots by proximity of the spot annotation grid to the nearest corresponding spot in the Visium array.
GRN analysis
The SCENIC+83 (v1.0.0) pipeline was used to predict transcription factors and putative target genes as well as regulatory genomic regions with binding sites. The fragment matrix of peaks called with MACS2 and processed within ArchR73 together with the corresponding RNA count matrix were used as inputs. Meta-cells were created by clustering droplets into groups of around 10–15 droplets based on their RNA profiles and subsequent aggregation of counts and fragments. The pipeline was applied to subsets of the dataset corresponding to individual lineages: first, CisTopic (pycistopic v1.0.2) was applied to identify region topics and differentially accessible regions from the fragment counts as candidate regions for transcription factor binding. CisTarget (pycistarget v1.0.2) was then run to scan the regions for transcription factor-binding sites, and GRNBoost2 (arboreto v0.1.6)84 was used to link transcription factors and regions to target genes based on co-expression or accessibility. Enriched transcription factor motifs in the regions linked to target genes were used to construct transcription factor–region and transcription factor–gene regulons. Finally, regulon activity scores were computed with AUCell based on target gene expression and target region accessibility, and regulon specificity scores derived from them. Networks of transcription factors, regions and target genes (enhancer-driven GRNs) were constructed by linking individual regulons. Transcription factor–enhancer–gene links for all subsets (osteogenesis, chondrogenesis, fibrogenesis, early joint progenitors, immune and Schwann) can be found in Supplementary Table 9.
Trajectory analysis
For pseudotime trajectory construction in the osteogenic subcompartment, non-cycling droplets were subsetted, and X_scVI was used as projections for palantir to obtain multiscale diffusion space. A neighbourhood graph was generated on the diffusion space using sc.pp.neighbors, and the first two principal components were used as initial positions to create ForceAtlas2 embeddings using sc.tl.draw_graph. scFates85 (v1.0.3) was used to predict a principle graph that captures the differentiation path. The force-directed embeddings and principle graph were exported into R, and monocle3 (v1.0.0)86,87 was used to compute differentially expressed genes along pseudotime using a graph-based test (morans’ I)87,88, which allows identification of genes upregulated at any point in pseudotime. The results were visualized with heatmaps using the complexHeatmap (v2.6.2)89 and seriation (v1.3.0)90 packages, after smoothing gene expression with smoothing splines in R (smooth.spline; d.f. = 12). Velocity analysis91 was performed using scvelo92 (v0.2.3). Spliced and unspliced read counts were computed with velocyto (v0.17.17) from the unprocessed data, before using scvelo.pp.moments, scvelo.tl.velocity and scvelo.tl.velocity_graph to compute velocities for the preprocessed droplets. cytoTRACE93 was used (through the CellRank94 (v2.0.2) implementation) to obtain another prediction of directionality, independent of RNA velocity (based on the assumption that the number of expressed genes decreases throughout differentiation).
Cavitation enrichment score
To approximate the timing of cavitation onset, we computed a cavitation enrichment score using sc.tl.score_genes() in scanpy on a specific gene set within the mesenchymal and muscle compartments of the hip, shoulder and knee joints comprising CD44, HAS2, ABCC5, HMMR, MSN and UDPGD, derived from literature and Gene Ontology terms, which encompass hyaluronan biosynthetic processes and hyaluronan synthase activity. We excluded genes with low expression levels in our data, such as HAS3. For pathway analysis, we retrieved gene sets corresponding to all 18,640 Gene Ontology terms, and computed the correlation between their enrichment scores and cavitation enrichment scores.
In silico transcription factor perturbations
CellOracle95 (v0.12.0) was used with the osteogenesis trajectory created with scFates85, and the regulons predicted with SCENIC+83 for the same cells were imported into CellOracle as a base GRN. Cells were aggregated into meta-cells of 10–15 cells, and linear models explaining transcription factor from target gene expression were fitted with CellOracle per cell cluster. Regulon-based transcription factor perturbation vectors were inferred using the cell cluster-specific models to predict effects of transcription factor overexpression and knockout. Diffusion pseudotime96 was then computed for intramembranous and endochondral ossification lineages separately by selecting corresponding starting points. The pseudotime gradients were used to derive pseudotime-based differentiation vectors, and the pseudotime-perturbation vector cross-product was computed to obtain perturbation scores. These perturbation scores indicate whether the in silico perturbation of a transcription factor is consistent with or opposes differentiation along a lineage (osteogenesis). The simulations were carried out systematically, overexpressing and knocking out all transcription factors in the GRN. For each transcription factor and condition, the perturbation scores were then averaged per cell cluster and summarized in a table to screen for transcription factors promoting or inhibiting osteogenesis.
fGWAS analysis
fGWAS analysis97 was applied to identify disease-relevant cell clusters as described in detail55 (https://github.com/natsuhiko/PHM). The model makes use of full summary statistics from GWAS, linking single-nucleotide polymorphisms (SNPs) to genes, and captures a general trend between gene expression and disease association of linked loci for each cell cluster. At the same time, the model also corrects for linkage disequilibrium and other relevant factors. We used full GWAS summary statistics obtained from the EBI GWAS Catalog, open targets, and knee and hip osteoarthritis as well as total knee and hip replacement from ref. 62 (https://msk.hugeamp.org/downloads.html; Supplementary Table 8).
SNP2Cell
We used a network propagation98 approach to integrate GWAS summary statistics and cell cluster marker gene-based scores for prioritizing disease-relevant and cell cluster-specific subunits of our transcription factor network. First, scores per SNP were computed from downloaded summary statistics and weighted by linkage disequilibrium. Then, the scores were mapped to a GRN, here an enhancer-driven GRN computed with SCENIC+ for the corresponding lineage. As the used networks contain transcription factors and target genes, and also regions with transcription factor-binding sites as nodes, SNP scores were mapped to both genes and regions, representing distal regulatory elements. The scores were then propagated across the network using a random walk with restart (or personalized page rank) process. This integrates the contribution of individual SNPs, with signals converging around relevant network nodes. The procedure was repeated with 1,000 randomly permuted scores to compute permutation-test results and z-scores. Next, differential expression-based marker gene scores for each cell cluster were propagated in the same way, resulting in cell cluster specificity scores for each network node. The SNP and expression-based scores were then combined per cell cluster (as in ref. 99) by using the minimum for each node. The final scores were thresholded, and the resulting connected components were obtained as enriched subnetworks. The method has been compiled into a tool that we called SNP2Cell, which is available at https://github.com/Teichlab/snp2cell.
Cell–cell interactions
Ligand–receptor interactions were inferred using ‘cpdb_analysis_method.call’ in CellPhoneDB (v4.0.0). We included genes expressed in more than 10% of cells within each cluster. Inferred interactions with a P > 0.001 were removed. We used NicheNet (v1.1.1) to identify different interactions between endochondral and intramembranous niches. We first calculated DEGs of osteogenic clusters and tip cells across the two niches using the Wilcoxon test implemented in Seurat, and minimum log fold change per cluster was used to summarize the differentially expressed ligands and receptors. The top 1,000 DEGs were used to calculate ligand activities. We prioritized the ligand–receptor links using default settings. The top ten ligands and their top-scoring receptors were visualized using heatmaps.
Drug2Cell analysis
Drug and target gene information for humans (Homo sapiens) were gathered from the ChEMBL database (https://www.ebi.ac.uk/chembl/). For the teratogenic drugs targeting, we searched the clinically approved molecules that target genes encoding their reported targets and curated a list of 65 clinically approved drugs from the chEMBL database, which carried warnings of teratogenicity (Supplementary Table 6). Drug scores were calculated as previously described58. Subsequently, we introduced drug categories for each drug according to broad clinical utility. The Drug2Cell Python package is available at GitHub (https://github.com/Teichlab/drug2cell).
CellHint label harmonization
First, fastq files from the Zhang et al.5 dataset were remapped using STARSOLO to a common genome reference (GRCh38-2020-A-2.0.0) as per the workflow performed for the Multiome data. Cellbender was applied to remove background counts represented as simulated ambient RNA. We intersected this matrix with barcodes from the post-quality control counts matrix from Zhang et al., and scVI was then used to integrate this with our snRNA-seq data, accounting for categorical covariates of sample donor and droplet technology (cell or nuclei), as well as continuous variables of total counts, the percentage of ribosomal and mitochondrial counts, and cell cycle scores ‘S_score’ and ‘G2M_score’ computed using the scanpy package. Latent variables obtained from this were then used to determine neighbourhoods followed by dimensionality reduction in UMAP. Cluster labels from Zhang et al. were then used as labels for CellHint harmonization in the cellhint.harmonize() alignment function. Cellhint.treeplot() was used to examine and semi-automatically align the labels across the two datasets. Gene expression profiles of marker genes were used to verify alignment of clusters across the two datasets.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.