Welcome to the IKCEST
Reconstitution of prospermatogonial specification in vitro from human induced pluripotent stem cells

Characterization of prospermatogonia in human fetal testes at second trimester

To validate our in vitro reconstitution of human male GC development, we first required a precise understanding of the lineage trajectory of male GCs in vivo. During the 2nd trimester, human fetal testes consist of heterogenous cell types at different developmental stages, including M and T111,16. We therefore profiled human fetal testes obtained from three donors (Hs31, Hs26, and Hs27) with gestational ages of 17w3d, 18w0d, and 18w5d, respectively (Supplementary Fig. 1a–e). Histologic sections revealed compact, cylindrically shaped seminiferous cords embedded in the highly cellular stroma (Supplementary Fig. 1a). Seminiferous cords showed scattered GCs with large vesicular nuclei and prominent nucleoli. Stroma contained many cells with round nuclei and abundant eosinophilic cytoplasm, features characteristic of fetal Leydig cells (FLCs)13.

We determined the lineage trajectory of these cells by performing single-cell RNA-seq (scRNA-seq) using a 10x Genomics platform. Of the ~18,000 cells for which transcriptomes were available, 16,429 remained for downstream analysis after removing low-quality cells. These cells contained a median of ~1900–2700 genes/cell at a mean sequencing depth of 22–69k reads/cell (Supplementary Fig. 1b). By profiling the expression of known marker genes in a t-distributed stochastic neighbor embedding (tSNE) plot11,17,18,19, we identified clusters representing multiple known fetal testis cell types including DND1+ GCs, SOX9+ Sertoli cells (SCs), INSL3+/CYP17A1+ fetal Leydig cells (FLCs), TCF21+ stromal cells (ST, also described as Leydig precursors11) and KDR+ endothelial cells (ECs) (Supplementary Fig. 1c, d). We also identified minor cell types that were relatively uncharacterized in human fetal testis, including HLA-DRB1+ macrophages (MΦ) and MYH11+ smooth muscle cells (SMCs)20,21.

To identify marker genes characteristic of the annotated clusters, we tested for differentially expressed genes (DEGs) by comparing each cluster with the remaining clusters (Supplementary Fig. 1e, Supplementary Dataset 1). GCs contained the highest number of DEGs, consistent with their unique biological characteristics. These DEGs were enriched in genes bearing the Gene Ontology (GO) terms “DNA repair” and “spermatogenesis” (Supplementary Dataset 1). DEGs in Leydig cells were enriched in genes for cholesterol and steroid biosynthesis, suggesting that these cells may produce androgens22. DEGs in ST were enriched for the GO terms “extracellular matrix” and “collagen catabolic process,” which may indicate their role in scaffolding testicular tissue architecture (Supplementary Dataset 1). Additional marker genes identified by DEG analysis, and the GO terms enriched for those DEGs are shown in Supplementary Fig. 1d, e and Supplementary Dataset 1.

We next performed clustering analysis for only GCs, which revealed two distinct clusters (Fig. 1a). One cluster expressed known markers for M (mitotic FGCs) and PGCs (migrating FGCs), such as POU5F1, NANOS3, and TFAP2C (Fig. 1b)11,23,24. The other cluster expressed markers for T1 (mitotic-arrest FGCs), such as PIWIL4, TEX15, and RHOXF111,25. DDX4 expression was also upregulated in this cluster, which is consistent with the previous immunofluorescence (IF) studies that used DDX4 as a marker for human T111 although weaker expression was also seen in M (Fig. 1b). Our IF studies supported the transcriptome clustering results, showing two cell populations within the seminiferous cords, POU5F1+DDX4+ (388/853, 45.5%) and POU5F1DDX4+/++ (465/853, 54.5%) cells, that represent M and T1, respectively (Fig. 1d, e). T1 exhibited significantly lower transcript levels for proliferation markers, such as AURKB, CCNA2, MKI67, and TOP2A, and showed marked reduction of MKI67 protein expression by IF (Fig. 1c, d, e), confirming that these T1 were indeed at the mitotic-arrest stage. RNA velocity analysis using nascent transcripts further corroborated the overall lineage trajectory from M to T1 (Supplementary Fig. 1f)26.

Fig. 1: Characterization of germ cells in human fetal testis during the second trimester.
figure1

a tSNE plot of germ cells (GCs) defined in Supplementary Fig. 1c. Cluster identities were determined by projecting the expression of marker genes onto the same tSNE plot in (b). M, M-prospermatogonia (cyan); T1, T1-prospermatogonia (purple). b tSNE feature plots of known and newly identified markers for M and T1. c Violin plots of proliferation markers for M (1,419 cells) or T1 (1170 cells) defined in (a) and (b). FDR < 10−65 for all markers. d Percentages of MKI67+ cells among DDX4+POU5F1+ M or DDX4++POU5F1 T1 present in Hs26 (18w0d, black circle) or Hs27 (18w5d, black triangle) samples as assessed by immunofluorescence (IF) analysis. Each dot represents the count per tubular cross section (8 cross sections per sample). P-values for the comparison is determined by two-sided Fisher’s exact test (p = 4.25 ×  10−5). e IF images of a paraffin section of fetal testis (Hs27) targeted for MKI67 (green) and DDX4 (cyan), merged with DAPI (white) and/or POU5F1(red). White arrows indicate DDX4++POU5F1 T1 and yellow arrowheads indicate DDX4+POU5F1+ M co-expressing MKI67. Scale bar, 50 μm. f Scatter plot comparing averaged gene expression levels between M and T1 (top). Cyan, genes higher in M; purple, genes higher in T1 (more than fourfold difference [flanking diagonal lines], mean log2[TPM+1] > 2, FDR < 0.01). Key genes are annotated and the number of DEGs are indicated. g Gene Ontology (GO) analysis of the DEGs defined in (f). P-value (one-sided Fisher-exact test) and representative genes assigned to each GO term are shown. h IF images of paraffin sections of Hs31 for TFAP2C (red) and DDX4 (cyan) combined with MORC1 (green, top) or MAGEC2 (green, middle), and images for combined in situ hybridization (co-ISH) for PIWIL4 (red) with IF for TFAP2C (green) and MAGEC2 (cyan) (bottom). All images are merged with DAPI (white). Merged images for all four color channels are shown at far right. Scale bars, 25 μm. i IF images of paraffin sections of Hs26 for SOX9 (green) merged with DAPI (left) or for MAGEC2 (green) and DDX4 (cyan) merged with bright field (BF) (right). IF for SOX9 and BF highlight the border between tubules and the stroma. Scale bars, 50 μm. j Distances (μm) from the periphery of tubules for TFAP2C+MAGEC2 M or TFAP2CMAGEC2+ T1 as quantified by IF images for Hs26 (red), Hs27 (green), and Hs31 (purple). Bars indicate the median value for each cell type per sample. n = 78 (Hs26, 28 cells; Hs27, 30 cells; HS31, 20 cells) and 75 (Hs26, 27 cells; HS27, 22 cells; HS31, 26 cells) for TFAP2CMAGEC2+ T1 and TFAP2C+MAGEC2 M, respectively. See also Supplementary Fig. 1 and Supplementary Dataset 1.

We examined gene expression patterns across the M-to-T1 transition and found that GC specifier genes (SOX17, TFAP2C, PRDM1, SOX15, NANOS3) and pluripotency-associated genes (POU5F1, NANOG, TCL1B, TFCP2L1) were sharply downregulated as M differentiated into T1 (Fig. 1f, Supplementary Dataset 2). SOX2 was not expressed in either cell types23,24. The 311 DEGs identified in M were enriched for the GO terms “mitochondrial inner membrane” and “mitochondrial respiratory chain complex I assembly,” suggesting that, as in previous studies on mice, oxidative phosphorylation may be activated in M and downregulated as cells differentiate into T127. The 102 DEGs upregulated during the M-to-T1 transition included X-linked cancer-testis antigens belonging to the MAGE and PAGE gene families, including MAGEA4, MAGEB2, MAGEC2, PAGE1, PAGE2, and PAGE2B. Many genes previously recognized as markers for prospermatogonia (RHOXF1, NANOS2, DDX4) or adult spermatogonia (SIX1, DCAF4L1, PLPPR3, EGR4) were also upregulated during this transition11,25,28. GO terms in these DEGs correspondingly included “spermatogenesis” (Fig. 1g). Notably, the majority of genes involved in piRNA pathways (e.g., PIWIL4, TEX15, MORC1), which are key guardians of genomic integrity during spermatogenesis, were highly upregulated in T1 (Fig. 1g). Other T1 marker genes in our DEG analysis, such as MORC1 and MAGEC2, have not been previously recognized as markers of human T1.

IF analysis revealed discrete nuclear immunoreactivity for MORC1 and MAGEC2 only in peripherally located DDX4+ T1 (Fig. 1h–j), whereas TFAP2C exclusively marked centrally located M. In situ hybridization (ISH) analysis showed that signals for PIWIL4 were localized to the perinuclear regions of MAGEC2+ T1 (Fig. 1h). Overall, these findings clearly delineated M and T1 as two distinct male GC types in human fetal testes, each with distinct patterns of gene and protein expression.

Establishment of male hiPSCs bearing the TFAP2C-2A-EGFP (AG); DDX4/hVH-2A-tdTomato (VT); PIWIL4-2A-ECFP (PC) alleles (9A13 AGVTPC)

Using the information from our high-resolution transcriptomic characterization of prospermatogonial development, we attempted to reconstitute this process in vitro using hiPSCs as our starting material. Our transcriptomic analysis, coupled with previous reports in humans and non-human primates, indicated that DDX4 and PIWIL4 expression marks T1 and that the expression of both genes is maintained at least until spermatogenesis commences11,12,29. DDX4 expression is likely upregulated earlier than PIWIL4 given the weaker but significant expression of DDX4 in M (Fig. 1b)11. In addition, TFAP2C, a marker for PGCs, was swiftly downregulated upon differentiation into T1 (Fig. 1b, f, h). These observations led us to hypothesize that a combination of TFAP2C, DDX4, and PIWIL4 would serve as a powerful marker for visualizing the transition from hPGCLCs to the prospermatogonial stage.

To this end, we introduced targeted DDX4/hVH-2A-tdTomato (VT) and PIWIL4-2A-ECFP (PC) alleles into previously established TFAP2C-2A-EGFP (AG) hiPSCs (585B1 1-7, XY)14 to generate hiPSCs bearing triple knock-in fluorescence reporters (AGVTPC) (Supplementary Fig. 2a–g). One clone, 9A13, demonstrated successful biallelic targeting of both VT and PC (Supplementary Fig. 2c, d). 9A13 hiPSCs could be stably maintained under feeder-free conditions and exhibited a normal male karyotype (46, XY) (Supplementary Fig. 2e). They formed round, tightly packed colonies, characteristic of hiPSCs (Supplementary Fig. 2f), and expressed the pluripotency-associated markers, POU5F1, SOX2, and NANOG (Supplementary Fig. 2g). We also confirmed that 9A13 hiPSCs were able to differentiate into hPGCLCs through incipient mesoderm-like cells (iMeLCs) with an induction efficiency of ~53% of AG+ hPGCLCs (Supplementary Fig. 2h, i, j, k), consistent with a previous study14.

Establishment of xrTestis

A previous study successfully reconstituted mouse fetal prospermatogonia from mESC-derived PGC-like cells (mPGCLCs) using reconstituted testes, in which dissociated mouse fetal testicular somatic cells were mixed with mPGCLCs before culture30. As a first step in applying this methodology to humans, we examined whether dissociated cells from mouse fetal testes could be reassembled in the absence of mouse(m)PGCs or mPGCLCs (Fig. 2a).

Fig. 2: Optimization of rTestis and xrTestis culture.
figure2

a Scheme for rTestis culture using mouse testicular somatic cells at embryonic day (E) 12.5. Mouse PGCs (mPGCs) were depleted by MACS. ALI, air-liquid interphase; rTestis, reconstituted testis. b, c Bright-field (BF) images of d2 floating aggregate (b) and d14 xrTestis on ALI culture (c). Scale bars, 200 µm. d Immunofluorescence (IF) images of rTestes at d14 for GC markers (red: TFAP2C, DDX4), the human-specific marker human mitochondrial antigen (hMito) (cyan), and the Sertoli cell marker SOX9 (green), with merges with DAPI (white). Scale bars, 20 µm. e IF images of rTestes at d14 for somatic cell markers (green: SOX9; red: NR2F2, HSD3B1, ACTA2; cyan: CD31) with merges with DAPI (white). Scale bars, 20 µm. f BF images of d2 floating aggregates (left), and d7 and d14 xrTestes (right) cultured in KSR-based or FBS-based medium. Floating aggregates are cultured in the presence or absence of Y-27632 (left). Y-27632 is included in all xrTestis culture (right). Scale bars, 200 µm. g FACS analysis of d2 floating aggregates cultured in KSR or FBS-based medium to assess the number of total cells (in dot plot showing SSC [side scatter] and FSC [forward scatter], left) and the number of hPGCLC-derived cells (TFAP2C-EGFP [AG]-positive, right). The percentages of cells in P1 gates (living cells), the total cell numbers (left), and the numbers of AG-positive cells per floating aggregate (right) are shown. h IF images of d14 xrTestes cultured in KSR- or FBS-based medium for GFP (green), TFAP2C (red), SOX9 (cyan), and DAPI (white) with their merges. Scale bars, 20 µm. i IF images of d14 xrTestes for GC markers (red: TFAP2C, DDX4, SOX17, POU5F1 or NANOG), markers for hPGCLC-derived cells (green: GFP; cyan: hMito), a Sertoli cell marker (cyan: SOX9), and DAPI (white), with their merges. Scale bars, 20 µm. Note that DDX4 is not expressed in xrTestes at this stage. j IF images of d14 xrTestes for a GC marker (red: TFAP2C), a basement membrane marker (cyan: LAMININ), and somatic cell markers (green: CD31; red: NR2F2, ACTA2, HSD3B1; cyan: SOX9), with their merges with DAPI (white). Scale bars, 20 µm. See also Supplementary Figs. 2 and 3.

E12.5 mouse fetal testicular cells depleted of mPGCs readily formed tight aggregates, or reconstituted testis (rTestis), after floating culture for 2 days (Fig. 2b). After an additional 14 days of culture on an air-liquid interface (ALI), rTestes exhibited numerous anastomosing tubular structures comprised of SOX9+ SCs surrounded by ACTA2+ peritubular myoid cells (Fig. 2c, d, e). Neither TFAP2C+ nor DDX4+ GCs were present, indicating the successful depletion of mPGCs (Fig. 2d). Tubules were surrounded by NR2F2+ stroma reminiscent of mouse fetal testis, but CD31+ endothelial cells and HSD3B1+ Leydig cells were not observed (Fig. 2e).

We then identified culture conditions that would allow hPGCLCs to be integrated into rTestes while maintaining testicular tissue integrity and maximizing the hPGCLC recovery after 14 days of ALI culture. Floating aggregates cultured using Knockout Serum Replacement (KSR)-based media formed larger aggregates with less surrounding cell debris and enhanced the recovery of live cells compared with those cultured in fetal bovine serum (FBS)-based media (Fig. 2f, g). The addition of Y-27632, a potent inhibitor of Rho-associated-coiled-coil containing protein kinase (ROCK), during floating culture further enhanced the formation of tighter, slightly larger aggregates and increased the recovery of AG-positive cells (Fig. 2f, g)31. Moreover, IF analysis revealed that aggregates cultured for 14 days in ALI culture using KSR- but not FBS-based medium formed distinct tubules that integrated AG+TFAP2C+ hPGCLC-derived cells (Fig. 2f, h, i, j). IF analysis also confirmed that hPGCLCs maintained PGC markers (POU5F1, NANOG, SOX17, and TFAP2C) after 14 days of ALI culture (Fig. 2i). All GCs were uniformly labeled by AG and by a human mitochondrial antigen (hMito), suggesting that they were derived from hPGCLCs and not from endogenous mouse PGCs (Fig. 2h–j). Leydig cells and endothelial cells were not readily observed in aggregates, suggesting that these cell types might have different culture requirement for survival and/or proper differentiation (Fig. 2j). Overall, these findings confirmed that dissociated mouse testicular somatic cells and hPGCLCs can self-assemble to form mini-fetal testicular tissues, which we named xenogeneic reconstituted testis (xrTestis).

Extended culture of xrTestis

We cultured xrTestes for a prolonged period to determine if hPGCLCs could be further differentiated into more advanced male GCs (Fig. 3a). IF analysis of xrTestis cultured for 42 and 77 days revealed the persistence of tubular structures consisting of SOX9+ SCs and AG+ hPGCLC-derived cells that remained predominantly localized within tubules (Supplementary Fig. 3a, b). Although essentially all GCs in 42-day (d42) xrTestes showed an early PGC phenotype (AG+/TFAP2C+/DDX4/DAZL), many of the AG+TFAP2C+ GCs in d77 xrTestes were strongly immunoreactive for the M marker, DAZL and somewhat more weakly reactive for DDX4 and VT (Fig. 3b, c, d, Supplementary Fig. 3b, c). These cells exhibited slightly larger and more euchromatic nuclei with low DAPI intensity and prominent nucleoli, characteristic of primate PGCs (Fig. 3b)12. IF analysis targeting 5-methylcytosine (5mC) revealed that these hPGCLC-derived cells consistently exhibited low levels of global DNA methylation (Supplementary Fig. 3d). Furthermore, these cells also retained pluripotency-associated markers, such as POU5F1 and NANOG (Fig. 3d, e), suggesting that most hPGCLC-derived cells had progressed towards M but had not yet differentiated into the T1. Flow cytometry analysis at d81 showed the emergence of AG+VT+ and AGVT+ GCs within the dissociated xrTestes, representing 7.9 and 1.7% of total cells (62% or 13% of all human GCs), respectively (Fig. 3f).

Fig. 3: Establishment of xenogeneic reconstituted testis (xrTestis) and generation of human T1LCs.
figure3

a Scheme for T1LC induction by xrTestis culture. ALI, air-liquid interphase. b IF images of hPGCLC-derived cells in d14 and d77 xrTestes for DAPI (white) and TFAP2C (red) with their merges. Scale bars, 2 µm. c IF images of d77 or d120 xrTestes for their expression of indicated key GC markers (cyan: DAZL; green: TFAP2C, TFAP2C-EGFP [AG]; red: DDX4, DDX4-2A-tdTomato [VT]) and DAPI (white). Merged images are shown on the right. Scale bars, 20 µm. d IF images of d77 or d120 xrTestes for AG (green), VT (red), pluripotency-associated markers (cyan: POU5F1, NANOG), and DAPI (white), with their merges. Scale bars, 20 µm. e A dot plot showing the proportion of POU5F1+ or NANOG+ cells in hPGCLC-derived cells (TFAP2C+ or DDX4+) in d77 and d120 xrTestes as assessed by IF analysis on frozen sections. Each dot represents the proportion in one section, and the averages of 3 histologic sections are shown. The total number of POU5F1+ or NANOG+ cells in d77 (57 cells) or d120 xrTestes (11 cells) is also shown. Error bar, standard error of the mean (SEM). The statistical significance of the differences between d77 and d120 are evaluated by Fisher’s exact test, p = 0.002. f Bright field (BF) images (top) and FACS analysis for AGVT expression (bottom) of xrTestes at d42, d81, and d124. Bar, 200 µm. g IF images of d120 xrTestes for AG (green), VT (red), T1 markers (cyan: MAGEA3, MAGEC2), and DAPI (white), with their merges. Scale bars, 20 µm. h FACS histogram of d81 and d124 xrTestes for PIWIL4-ECFP (PC) expression in the respective fraction of hPGCLC-derived cells. The percentage of PC+ cells in the respective fraction are shown. i IF images of hPGCLC-derived cells (TFAP2C or DDX4, green) in d77 or d120 xrTestes for MKI67 (red). Merges with DAPI (white) are shown on the right. Scale bars, 20 µm. j A dot plot showing the proportion of MKI67+ cells in hPGCLC-derived cells (TFAP2C+ or DDX4+) in d77 and d120 xrTestes. Each dot represents the proportion in one section and the averages of three histologic sections are shown. The total number of MKI67+ cells in d77 (85 cells) or d120 xrTestes (15 cells) is also shown. Error bar, SEM. The statistical significance of the differences between d77 and d120 are evaluated by Fisher’s exact test, p = 0.0047. k IF analysis (left) and a box plot showing the fluorescence intensity of 5-methylcytosine (5mC, red) in hPGCLC-derived cells (green: DDX4) and other somatic cells counterstained with DAPI (white) in xrTestes at d120. Scale bar, 20 µm. hPGCLC-derived cells are outlined by white dotted lines. Center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range. The statistical significance of the differences between GCs (Germ) and somatic cells (Soma) are evaluated by two-sided t-test assuming unequal variances. p = 1.45  × 10−5; n, the number of respective cells counted. See also Supplementary Figs. 2 and 3.

We extended our ALI culture further and found that, at d120, xrTestes showed scattered DDX4+/DAZL+/VT+ cells with markedly reduced reactivity for AG, TFAP2C, POU5F1, and NANOG, suggesting that these cells had further differentiated beyond the M stage (Fig. 3c, d, e). We also noted strong expression of the T1 markers MAGEA3 and MAGEC2 at d120 (Fig. 3g)32. Flow cytometric analysis confirmed that most GCs had progressed to the AGVT+ fraction (4.2% of total cells, 58% of all human GCs) (Fig. 3f). Moreover, some of these AG+VT+ (14.3%) and AGVT+ cells (16.5%) expressed PC, another discriminating marker of T1 (Fig. 3h). IF analysis of xrTestes at d120 showed a marked reduction of MKI67+ GCs compared to d77 xrTestes, consistent with progression into the mitotically arrested T1 state (Fig. 3i, j). IF for 5mC revealed that the DNA methylation levels of DDX4+ GCs in d120 xrTestes were still lower than those in somatic cells, thus suggesting that overt de novo DNA methylation might not have commenced yet (Fig. 3k). Together, these findings support our conclusion that hPGCLCs differentiated into T1LCs after ALI culture with xrTestes.

Lineage trajectory leading to T1LCs

To fully capture the male germ lineage progression without the bias introduced by our a priori markers, we isolated GCs (AG+ or VT+) from xrTestes at d81 and d124 and evaluated their transcriptomes by scRNA-seq. Precursor cell types (hiPSCs, iMeLCs and hPGCLCs) were also examined by scRNA-seq. Between 344-1,251 cells for each population were used after excluding results from low-quality cells (Supplementary Fig. 4a). We detected ~3000–7000 median genes/cell at a mean sequencing depth of between 68-273k reads/cell (Supplementary Fig. 4a). After computationally aggregating high-quality cells and presenting the results in the same tSNE space (Fig. 4a, Supplementary Fig. 4b, c), we identified four main clusters (clusters 1–4) that largely separated by sample type, as expected (Supplementary Fig. 4b, g, h). Cluster annotation was further validated by the expression of the known markers for each cell type such as SOX2 (marker for hiPSCs and iMeLCs), EOMES (marker for iMeLCs), and NANOS3 (marker for hPGCLCs and later stage GCs) (Supplementary Fig. 4b)14,15.

Fig. 4: Single-cell transcriptome profiling of in vitro derived human GCs.
figure4

a A tSNE plot of all in vitro cells using computationally aggregated scRNA-seq data obtained from six different samples (hiPSCs; iMeLCs; hPGCLCs_1; hPGCLCs_2, d81 xrTestis, d124 xrTestis). These cells are colored based on clusters defined in Supplementary Fig. 4b and c: hiPSCs (1168 cells, pale green); iMeLCs (800 cells, green), hPGCLCs (1371 cells, brown), MLCs (1224, yellow), TCs (162 cells, purple) and T1LCs (150 cells, red). The cluster re-analyzed in (b) is outlined by a dotted line. b Velocity analysis focused on a cluster outlined by a dotted line in (a) projected on the tSNE space defined in Supplementary Fig. 4f. Black arrows indicate the lineage trajectory estimated using nascent transcripts from scRNA-seq data in (a). Dotted lines enclose cells representing MLCs (yellow), TCs (purple) or T1LCs (red) as defined in Supplementary Fig. 4f. Large red arrow indicates the overall direction of the lineage trajectory at the borders of cell clusters. c Violin plots showing the expression of representative markers in in vitro (left) or in vivo (right) cell clusters defined in Figs. 1a, 4a. d Violin plots showing the expression of representative proliferation markers (top) or pro-apoptotic markers (bottom) in the respective cell clusters. e Heatmap showing the expression pattern of DEGs identified from a multi-group comparison (FDR < 0.01, fold change > 2 compared to other clusters) among cell clusters, and the over-represented GO terms in these DEGs. Color bars on the left indicate DEGs for respective cell clusters. The number of DEGs for each cell cluster are shown at the side of the color bar. f Heatmap of the expression of 45 genes in which mutations are associated with “spermatogenic failure” or “male infertility” in humans. Genes are ordered by UHC (Ward’s method). See also Supplementary Figs. 47 and Supplementary Dataset 2.

Cluster 4 consisted of GCs from both d81 and d124 xrTestes that expressed DAZL, DND1, PIWIL2, which was consistent with their prospermatogonial status (Supplementary Fig. 4b, g). Further analysis for only cells in cluster 4 revealed three distinct subclusters (4-1, 4-2, and 4-3; Supplementary Fig. 4c). Markers for M that were defined previously in vivo, such as KHDC3L and POU5F1, were uniquely expressed in cluster 4-1. We therefore, identified cells from this cluster as “M-prospermatogonia-like cells” (MLCs). Markers for T1, such as MAGEC2 and PIWIL4 were exclusively expressed in cluster 4-3, suggesting that this cluster represented T1LCs (Supplementary Fig. 4c). Cluster 4-2 was located between MLCs and T1LCs, expressing unique markers, ASB9 and FZD8 (Supplementary Fig. 4c). We posit that these cells were transitional cells (TCs) because of their position in the tSNE plot and their intermediate expression levels of both MLC (POU5F1, NANOS3) and T1LC markers (TEX15) (Supplementary Fig. 4c). Consistent with this interpretation, strong ASB9-positive cells co-expressing intermediate levels of M and T1 markers were also seen at the M/T1 border in the tSNE plot for in vivo human testicular GC samples presented above, although these in vivo cells were not identified as a distinct cluster in tSNE analyses performed with higher clustering resolution (K-means up to 10) (Supplementary Fig. 4d, e).

To understand the lineage trajectory among cells in cluster 4, we performed RNA velocity analysis after re-clustering (Fig. 4b and Supplementary Fig. 4f). This analysis confirmed that a lineage progression from POU5F1+ MLCs to PIWIL4+ T1LCs occurred in xrTestis cultures (Fig. 4b). Corresponding to this progression, the proportion of cells derived from d124 xrTestis in each cluster increased from 10.2% in the MLC cluster to 74.1% in the TC cluster and 99.3% in the T1LC cluster (Supplementary Fig. 4h).

We also noted that two biological replicates for hPGCLCs (hPGCLCs_1, hPGCLCs_2) formed a single cluster in the tSNE plot (Supplementary Fig. 4g) and showed high concordance of both averaged gene expression using whole-genome data (r2 = 0.98) (Supplementary Fig. 4i) and pairwise DEGs between hPGCLCs and MLCs (Supplementary Fig. 4j). Dissociation-induced genes were not over-represented in any of the identified clusters, ruling out the possibility that clusters were made by any dissociation-induced artifacts (Supplementary Fig. 5a-c)33. These observations underscore the robustness of the present scRNA-seq platform.

To more fully understand the molecular mechanisms driving the progression from hPGCLCs through MLCs to T1LCs, we explored the gene expression dynamics of our in vitro-derived GCs. The core pluripotency-associated genes, POU5F1 and NANOG, along with GC specifier genes (PRDM1, SOX17, TFAP2C, SOX15), showed peak expression levels at the hPGCLC stage and declined thereafter (Fig. 4c). Interestingly, the expression of some genes involved in naïve pluripotency, such as TCL1B, TFCP2L1, and ZFP42, peaked at the MLC stage. Prospermatogonial (or oogonial) markers, such as DAZL, DDX4, and DPPA3 emerged in MLCs and persisted in later GCs (Fig. 4c)11,23, which was consistent with our IF and flow cytometric data. Moreover, proliferation markers such as CCNA2, CDK4, MKI67, and TOP2A were significantly downregulated in T1LCs (Fig. 4d), which was also consistent with IF analysis (Fig. 3i, j). The expression of proapoptotic marker genes was downregulated or unchanged during T1LC specification, suggesting that T1LCs were quiescent but not overtly apoptotic (Fig. 4d).

Pairwise and multi-group DEG analyses among the six cell types we identified (hiPSCs, iMeLCs, hPGCLCs, MLCs, TCs, and T1LCs) showed the expression dynamics of genes involved in biologic processes that are unique to each cell type (Fig. 4e, Supplementary Figs. 6, 7, Supplementary Dataset 2). For example, iMeLCs exhibited a higher abundance of mRNAs encoding gene products involved in early mesodermal specification, such as EOMES, MIXL1, CER1, and genes related to SMAD signaling (e.g., NODAL, LEFTY2). During the iMeLC-to-hPGCLC transition, hPGCLCs began to express GC specifier genes and genes related to the GO terms “inflammation” and “apoptosis” (Supplementary Fig. 6 and Supplementary Dataset 2). The expression of genes involved in “piRNA pathways” and “spermatogenesis” became upregulated after the transition to MLCs (Supplementary Fig. 6).

The largest number of DEGs was upregulated in T1LCs (1,268 DEGs in a multi-group comparison); many of these DEGs were upregulated gradually during the transition from MLCs to T1LCs (Fig. 4e and Supplementary Fig. 7). In particular, we found genes involved in piRNA pathways and previously recognized markers for spermatogonia in DEGs for T1LCs (Fig. 4e, Supplementary Figs. 6, 7, Supplementary Dataset 3). Numerous transcription factors were also upregulated in T1LCs, such as LMO4, ZNF451, TBP, NFXL1, EGR4, SCX, and EMX1. Many of these were previously uncharacterized in the context of male germline development and may play important roles in the specification of prospermatogonia from PGCs (Fig. 4e, Supplementary Figs. 6, 7, Supplementary Dataset 3). GO terms enriched in T1LCs correspondingly included “transcription factor activity”, “spermatogenesis”, and “DNA methylation involved in gamete generation” (Fig. 4e). We also found that a significant fraction of genes known to be associated with human “spermatogenic failure” or “male infertility” were in T1LCs, suggesting the potential utility of our in vitro platform for identifying the molecular mechanisms of human male infertility (Fig. 4f).

T1LCs exhibit a transcriptome similar to that of T1 in vivo

After establishing our in vitro platform for human male GC development, we compared our in vitro and in vivo transcriptome data to precisely define the status of our in vitro derived GCs along the developmental timeline. All three in vivo testicular samples were intermingled in a tSNE plot and segregated into clusters representing biologically meaningful cell types rather than sample identities (Supplementary Fig. 1c, g); however, a scatter plot comparing the averaged gene expression for either whole GCs or T1 across our three in vivo samples revealed a modest genome-wide reduction of mRNA abundance in Hs26 and Hs27 compared with Hs31 (Supplementary Fig. 1i). We believe this is a technical consequence of Hs26 and Hs27 being cryopreserved prior to analysis. Hs31 was prepared fresh and contained both M and T1 with differential expression patterns that matched the results obtained after aggregating all three samples (Supplementary Fig. 1h, j, k). To ensure the precision of gene expression comparisons with in vitro samples, which were all prepared fresh, we decided to use only Hs31 for downstream comparative studies.

We first performed unsupervised hierarchical clustering using averaged expression values for each cell type, which revealed two large clusters: a pre-germ cell/pre-gonadal phase cluster (hiPSCs, iMeLCs, hPGCLCs) and a prospermatogonial (gonadal) phase cluster (M, MLCs, TCs, T1, T1LCs) (Fig. 5a). Within the latter cluster, M/MLCs and T1/T1LCs each clustered together. Principal component analysis (PCA) on the prospermatogonial phase clusters revealed a gradual transition of cellular properties from M/MLCs to T1/T1LCs via TCs (Fig. 5b). Pairwise comparisons between cell types along this lineage trajectory demonstrated high concordance between in vivo and in vitro cell types; MLCs and T1LCs had the lowest number of DEGs and the highest coefficient of determination (r2) when compared with M or T1, respectively (Fig. 5c). These results highlight the marked similarities between in vivo fetal male germline development and our in vitro platform. Nonetheless, we also noted modest differences in gene expression between T1 and T1LCs. The expression of some HOXA and HOXB family members was significantly higher in T1LCs (Fig. 5d and Supplementary Dataset 3), and reciprocally, XIST, a master regulator of X chromosome inactivation, that is expressed in human oogonia and prospermatogonia34, was more highly expressed in T1, as were 18 other genes (Fig. 5d).

Fig. 5: Comparison of lineage projection in vitro with that of in vivo.
figure5

a UHC of the averaged transcriptomes of in vivo (M, T1) and in vitro cell clusters (hiPSCs, iMeLCs, hPGCLCs, MLCs, TCs, T1LCs) defined in Figs. 1a, 4a. b PCA for prospermatogonial stage GCs (M, MLCs, TCs, T1, and T1LCs). Color codes for cell clusters are indicated. c Number of DEGs and coefficient of determination (r2) for pairwise comparisons between in vivo (M [left] or T1 [right]) and in vitro clusters (hPGCLCs, MLCs, TCs, and T1LCs). DEGs are defined as genes with more than fourfold differences between two groups (mean log2[TPM+1] > 2, FDR < 0.01). r2 are calculated using all annotated genes (18447 genes). d Scatter plot comparison of the averaged values of gene expression between T1 and T1LCs. Blue, genes higher in T1; red, genes higher in T1LCs (more than 4-fold differences [flanking diagonal lines], mean log2[TPM+1] > 2, FDR < 0.01). Key genes are annotated and the number of DEGs are indicated. Representative genes and their GO enrichments for genes higher in T1LCs are shown on the right. e Heatmaps of the expression of markers for “migrating” (172 genes), “mitotic” (306 genes), and “mitotic-arrest” (1037 genes) human male FGCs (defined by Li et al.11) in the respective cell clusters defined in this study (left) and the indicated male FGC types defined by Li et al.11 (right). Using these markers, hierarchical clustering was performed for cell clusters defined in this study (left). GO terms for the respective markers are shown on the right. f Heatmap of the averaged expression values of 316 genes in indicated cell types defined by Yamashiro et al.15 (top). Among markers for T1LCs defined in Fig. 4e, 1177 genes were also annotated in the dataset by Yamashiro et al.15. Among them, 316 genes upregulated in ag120 AG+/−VT+ oogonia-like cells relative to all other cell types (more than twofold difference) are shown (top). GO analysis for these 316 genes are also shown (bottom). g Heatmap of the averaged expression in the indicated cell types for 110 genes highly expressed in T1LCs but with weak or no expression in ag120 AG+/−VT+ oogonia-like cells. To enable comparison between two different scRNA-seq platforms, RPM values of the data from Yamashiro et al. were adjusted using a polynomial regression curve (see “Methods”). Among 1,177 DEGs for T1LCs (Fig. 4e), 110 genes showing high levels of expression in T1LCs (mean log2[TPM+1] > 4) and low levels of expression in oogonia-like cells (adjusted log2[RPM+1] < 2) are shown. GO analysis for these 110 genes are also shown (bottom). h Violin plot showing the expression levels of genes at the male-specific regions of the Y chromosome (MSY) in indicated cell clusters. These genes were identified the by multi-group DEG analysis in Fig. 4e (without cut-off by fold-change > 2). See also Supplementary Figs. 4, 6, 7, Supplementary Dataset 35.

We additionally compared our results with published data from human male FGCs that were categorized into migrating, mitotic and mitotic-arrest FGCs11. In these datasets, migrating FGCs were collected from the aorta-gonad-mesonephros region of human male embryos at the age of 4 weeks. They were therefore categorized as PGCs in our study. The differentially expressed genes that distinguished these three cell types separated our hPGCLCs, M/MLCs, and T1/T1LCs/TCs largely into “migrating”, “mitotic” and “mitotic-arrest” stages of human FGCs, respectively, in agreement with direct comparison of our in vivo and in vitro specimens (Fig. 5e, Supplementary Dataset 4). We also performed PCA for all the cell clusters in our dataset alongside previously published data for neonatal human spermatogonia and found that T1LCs are slightly closer to neonatal spermatogonia than T1 (Supplementary Fig. 8a, b), which might partly explain the differences we observed between T1 and T1LCs (Fig. 5d).

Comparison of T1LCs with oogonia-like cells induced from hiPSCs

To highlight the similarities and differences between male and female GC development, we compared the transcriptome profiles of our in vitro T1LCs with published data for ag120 AG+/−VT+ oogonia-like cells induced from hiPSCs via intermediate states (i.e., iMeLCs, hPGCLCs and ag77 AG+VT+ cells)15. ag120 AG+/−VT+ and ag77 AG+VT+ cells in the previously published data roughly correspond to retinoic acid (RA)-responsive and mitotic in vivo female FGCs, respectively11,15. Among the 1177 DEGs that were unique to T1LCs and commonly annotated in both the male and female data sets, 316 DEGs were also similarly higher in oogonia-like cells (Fig. 5f and Supplementary Dataset 5), including many genes involved in piRNA biogenesis and de novo DNA methylation. We also identified 110 genes that were highly expressed in T1LCs but had weak or no expression in oogonia-like cells (Fig. 5g and Supplementary Dataset 5). These genes included many transcription factors, such as HOXB4, HOXC9, TBX3, ESX1, SCX, SIX1, and accordingly were enriched for GO terms such as “nucleus” and “sequence-specific DNA binding.” Of note, our T1LCs were induced from hiPSCs bearing an XY karyotype. Accordingly, we found that 6 Y-chromosomal genes at the male-specific region of the Y chromosome (MSY) were differentially expressed among our in vitro dataset (Fig. 5h). Four of these genes (DDX3Y, EIF1AY, KDM5D, TSPY2) were significantly upregulated along the lineage trajectory. A similar trend was observed between M and T1.

Expression dynamics of transposable elements (TEs) during human male germline development

A hallmark feature of GC development is the activation and subsequent repression of TEs, which ensures the genome integrity of GCs while still permitting co-evolution between TEs and their hosts16,35,36,37. However, the precise expression dynamics of TEs in human male GC development is not well characterized. We used our in vitro platform, which covers at least the first ~20 weeks of human male GC development, to comprehensively map the expression patterns of various TE categories. Dimension reduction analysis of TE expression profiles obtained from scRNA-seq data demonstrated that each differentiation stage exhibits unique TE expression dynamics (Fig. 6a). Notably, M/MLCs and T1/T1LCs clustered together in both a UMAP plot and a hierarchical clustering analysis performed on differentially expressed TEs, further underscoring the similarities between in vivo cells and our in vitro model (Fig. 6a, b). Based on total TE-derived transcript abundances, TE were globally upregulated over time, both in vivo and in vitro, with the highest expression levels observed in T1 and T1LCs (Fig. 6c).

Fig. 6: Dynamic regulation of TE expression during human male germline development.
figure6

a Dimension reduction analysis of scRNA-Seq data based on TE expression profile. The 100 most variably expressed TE subfamilies were used in the analysis. b Hierarchical clustering (Ward’s method using Euclidian distances) of the respective cell clusters (averaged expression values) using the 200 most variably expressed TEs. c Dynamics of the total expression level of TEs. d Heatmap showing the expression dynamics of respective TE subfamilies. The 200 most variably expressed TEs are shown. Classifications of TEs are shown in the left of heatmap. e Expression dynamics of respective TE families. f Expression dynamics of ERV subfamilies. ERVs exhibiting unique expression patterns are shown. g Enrichment of ATAC-Seq signals on specific ERV subfamilies. The fold-enrichment of the overlaps between ATAC-Seq peaks and a specific ERV subfamily was calculated on the random expectation. Publicly available ATAC-Seq data (Chen et al.41 and Guo et al.42) were used. h A model for T1LC induction from hiPSCs using xrTestis culture.

Most TE subtypes belonging to LINE1, SINE, DNA transposon, and SVA consistently showed gradual de-repression, presumably due to genome-wide DNA demethylation and other epigenetic reprogramming events that accompany FGC development (Fig. 6d, e)23,34,36. In contrast to these TEs, LTR-type retrotransposons (i.e., endogenous retroviruses; ERVs) exhibited a more stage-specific expression pattern (Fig. 6d–f). For example, among ERV1 families, HERV-H and LTR7 expression peaked at hiPSCs and then subsequently declined (Fig. 6f), consistent with their roles in regulating the pluripotency gene network38,39,40. Another group of ERV1 families, LTR12 (LTR12C, LTR12D, LTR12_) were significantly upregulated in T1 or T1LCs (Fig. 6f). On the other hand, some members of the ERVK family, such as HERVIP10FH, were more highly expressed in hPGCLCs (Fig. 6f).

The stage-specific expression dynamics of ERVs suggests that they are involved in regulating the expression of genes involved in GC development. In fact, the analysis of publicly available ATAC-seq data revealed the enrichment of HERVH/LTR7 and HERVIP10FH in the ATAC-seq peaks obtained from ESCs/iMeLCs or hPGCLCs, respectively (Fig. 6g)41,42, coinciding with their expression peaks in the respective cell types in our study (Fig. 6f). Moreover, LTR12 subfamilies were enriched in the ATAC-seq peaks from human male gonadal GCs and adult spermatogonia, suggesting that these TEs might be involved in gene regulation in human (pro)spermatogonia (Fig. 6g).

In summary, our observation that ERVs are regulated in a dynamic and stage-specific manner during male GC development raises the intriguing possibility that ERVs may be directly involved in the transcriptional activation of genes driving human prospermatogonial specification. Our observation additionally suggests that the identification of sequences downstream of these regulatory elements may reveal genes that play a causative role in this process (Fig. 6h).

Original Text (This is the original text for your reference.)

Characterization of prospermatogonia in human fetal testes at second trimester

To validate our in vitro reconstitution of human male GC development, we first required a precise understanding of the lineage trajectory of male GCs in vivo. During the 2nd trimester, human fetal testes consist of heterogenous cell types at different developmental stages, including M and T111,16. We therefore profiled human fetal testes obtained from three donors (Hs31, Hs26, and Hs27) with gestational ages of 17w3d, 18w0d, and 18w5d, respectively (Supplementary Fig. 1a–e). Histologic sections revealed compact, cylindrically shaped seminiferous cords embedded in the highly cellular stroma (Supplementary Fig. 1a). Seminiferous cords showed scattered GCs with large vesicular nuclei and prominent nucleoli. Stroma contained many cells with round nuclei and abundant eosinophilic cytoplasm, features characteristic of fetal Leydig cells (FLCs)13.

We determined the lineage trajectory of these cells by performing single-cell RNA-seq (scRNA-seq) using a 10x Genomics platform. Of the ~18,000 cells for which transcriptomes were available, 16,429 remained for downstream analysis after removing low-quality cells. These cells contained a median of ~1900–2700 genes/cell at a mean sequencing depth of 22–69k reads/cell (Supplementary Fig. 1b). By profiling the expression of known marker genes in a t-distributed stochastic neighbor embedding (tSNE) plot11,17,18,19, we identified clusters representing multiple known fetal testis cell types including DND1+ GCs, SOX9+ Sertoli cells (SCs), INSL3+/CYP17A1+ fetal Leydig cells (FLCs), TCF21+ stromal cells (ST, also described as Leydig precursors11) and KDR+ endothelial cells (ECs) (Supplementary Fig. 1c, d). We also identified minor cell types that were relatively uncharacterized in human fetal testis, including HLA-DRB1+ macrophages (MΦ) and MYH11+ smooth muscle cells (SMCs)20,21.

To identify marker genes characteristic of the annotated clusters, we tested for differentially expressed genes (DEGs) by comparing each cluster with the remaining clusters (Supplementary Fig. 1e, Supplementary Dataset 1). GCs contained the highest number of DEGs, consistent with their unique biological characteristics. These DEGs were enriched in genes bearing the Gene Ontology (GO) terms “DNA repair” and “spermatogenesis” (Supplementary Dataset 1). DEGs in Leydig cells were enriched in genes for cholesterol and steroid biosynthesis, suggesting that these cells may produce androgens22. DEGs in ST were enriched for the GO terms “extracellular matrix” and “collagen catabolic process,” which may indicate their role in scaffolding testicular tissue architecture (Supplementary Dataset 1). Additional marker genes identified by DEG analysis, and the GO terms enriched for those DEGs are shown in Supplementary Fig. 1d, e and Supplementary Dataset 1.

We next performed clustering analysis for only GCs, which revealed two distinct clusters (Fig. 1a). One cluster expressed known markers for M (mitotic FGCs) and PGCs (migrating FGCs), such as POU5F1, NANOS3, and TFAP2C (Fig. 1b)11,23,24. The other cluster expressed markers for T1 (mitotic-arrest FGCs), such as PIWIL4, TEX15, and RHOXF111,25. DDX4 expression was also upregulated in this cluster, which is consistent with the previous immunofluorescence (IF) studies that used DDX4 as a marker for human T111 although weaker expression was also seen in M (Fig. 1b). Our IF studies supported the transcriptome clustering results, showing two cell populations within the seminiferous cords, POU5F1+DDX4+ (388/853, 45.5%) and POU5F1DDX4+/++ (465/853, 54.5%) cells, that represent M and T1, respectively (Fig. 1d, e). T1 exhibited significantly lower transcript levels for proliferation markers, such as AURKB, CCNA2, MKI67, and TOP2A, and showed marked reduction of MKI67 protein expression by IF (Fig. 1c, d, e), confirming that these T1 were indeed at the mitotic-arrest stage. RNA velocity analysis using nascent transcripts further corroborated the overall lineage trajectory from M to T1 (Supplementary Fig. 1f)26.

Fig. 1: Characterization of germ cells in human fetal testis during the second trimester.
figure1

a tSNE plot of germ cells (GCs) defined in Supplementary Fig. 1c. Cluster identities were determined by projecting the expression of marker genes onto the same tSNE plot in (b). M, M-prospermatogonia (cyan); T1, T1-prospermatogonia (purple). b tSNE feature plots of known and newly identified markers for M and T1. c Violin plots of proliferation markers for M (1,419 cells) or T1 (1170 cells) defined in (a) and (b). FDR < 10−65 for all markers. d Percentages of MKI67+ cells among DDX4+POU5F1+ M or DDX4++POU5F1 T1 present in Hs26 (18w0d, black circle) or Hs27 (18w5d, black triangle) samples as assessed by immunofluorescence (IF) analysis. Each dot represents the count per tubular cross section (8 cross sections per sample). P-values for the comparison is determined by two-sided Fisher’s exact test (p = 4.25 ×  10−5). e IF images of a paraffin section of fetal testis (Hs27) targeted for MKI67 (green) and DDX4 (cyan), merged with DAPI (white) and/or POU5F1(red). White arrows indicate DDX4++POU5F1 T1 and yellow arrowheads indicate DDX4+POU5F1+ M co-expressing MKI67. Scale bar, 50 μm. f Scatter plot comparing averaged gene expression levels between M and T1 (top). Cyan, genes higher in M; purple, genes higher in T1 (more than fourfold difference [flanking diagonal lines], mean log2[TPM+1] > 2, FDR < 0.01). Key genes are annotated and the number of DEGs are indicated. g Gene Ontology (GO) analysis of the DEGs defined in (f). P-value (one-sided Fisher-exact test) and representative genes assigned to each GO term are shown. h IF images of paraffin sections of Hs31 for TFAP2C (red) and DDX4 (cyan) combined with MORC1 (green, top) or MAGEC2 (green, middle), and images for combined in situ hybridization (co-ISH) for PIWIL4 (red) with IF for TFAP2C (green) and MAGEC2 (cyan) (bottom). All images are merged with DAPI (white). Merged images for all four color channels are shown at far right. Scale bars, 25 μm. i IF images of paraffin sections of Hs26 for SOX9 (green) merged with DAPI (left) or for MAGEC2 (green) and DDX4 (cyan) merged with bright field (BF) (right). IF for SOX9 and BF highlight the border between tubules and the stroma. Scale bars, 50 μm. j Distances (μm) from the periphery of tubules for TFAP2C+MAGEC2 M or TFAP2CMAGEC2+ T1 as quantified by IF images for Hs26 (red), Hs27 (green), and Hs31 (purple). Bars indicate the median value for each cell type per sample. n = 78 (Hs26, 28 cells; Hs27, 30 cells; HS31, 20 cells) and 75 (Hs26, 27 cells; HS27, 22 cells; HS31, 26 cells) for TFAP2CMAGEC2+ T1 and TFAP2C+MAGEC2 M, respectively. See also Supplementary Fig. 1 and Supplementary Dataset 1.

We examined gene expression patterns across the M-to-T1 transition and found that GC specifier genes (SOX17, TFAP2C, PRDM1, SOX15, NANOS3) and pluripotency-associated genes (POU5F1, NANOG, TCL1B, TFCP2L1) were sharply downregulated as M differentiated into T1 (Fig. 1f, Supplementary Dataset 2). SOX2 was not expressed in either cell types23,24. The 311 DEGs identified in M were enriched for the GO terms “mitochondrial inner membrane” and “mitochondrial respiratory chain complex I assembly,” suggesting that, as in previous studies on mice, oxidative phosphorylation may be activated in M and downregulated as cells differentiate into T127. The 102 DEGs upregulated during the M-to-T1 transition included X-linked cancer-testis antigens belonging to the MAGE and PAGE gene families, including MAGEA4, MAGEB2, MAGEC2, PAGE1, PAGE2, and PAGE2B. Many genes previously recognized as markers for prospermatogonia (RHOXF1, NANOS2, DDX4) or adult spermatogonia (SIX1, DCAF4L1, PLPPR3, EGR4) were also upregulated during this transition11,25,28. GO terms in these DEGs correspondingly included “spermatogenesis” (Fig. 1g). Notably, the majority of genes involved in piRNA pathways (e.g., PIWIL4, TEX15, MORC1), which are key guardians of genomic integrity during spermatogenesis, were highly upregulated in T1 (Fig. 1g). Other T1 marker genes in our DEG analysis, such as MORC1 and MAGEC2, have not been previously recognized as markers of human T1.

IF analysis revealed discrete nuclear immunoreactivity for MORC1 and MAGEC2 only in peripherally located DDX4+ T1 (Fig. 1h–j), whereas TFAP2C exclusively marked centrally located M. In situ hybridization (ISH) analysis showed that signals for PIWIL4 were localized to the perinuclear regions of MAGEC2+ T1 (Fig. 1h). Overall, these findings clearly delineated M and T1 as two distinct male GC types in human fetal testes, each with distinct patterns of gene and protein expression.

Establishment of male hiPSCs bearing the TFAP2C-2A-EGFP (AG); DDX4/hVH-2A-tdTomato (VT); PIWIL4-2A-ECFP (PC) alleles (9A13 AGVTPC)

Using the information from our high-resolution transcriptomic characterization of prospermatogonial development, we attempted to reconstitute this process in vitro using hiPSCs as our starting material. Our transcriptomic analysis, coupled with previous reports in humans and non-human primates, indicated that DDX4 and PIWIL4 expression marks T1 and that the expression of both genes is maintained at least until spermatogenesis commences11,12,29. DDX4 expression is likely upregulated earlier than PIWIL4 given the weaker but significant expression of DDX4 in M (Fig. 1b)11. In addition, TFAP2C, a marker for PGCs, was swiftly downregulated upon differentiation into T1 (Fig. 1b, f, h). These observations led us to hypothesize that a combination of TFAP2C, DDX4, and PIWIL4 would serve as a powerful marker for visualizing the transition from hPGCLCs to the prospermatogonial stage.

To this end, we introduced targeted DDX4/hVH-2A-tdTomato (VT) and PIWIL4-2A-ECFP (PC) alleles into previously established TFAP2C-2A-EGFP (AG) hiPSCs (585B1 1-7, XY)14 to generate hiPSCs bearing triple knock-in fluorescence reporters (AGVTPC) (Supplementary Fig. 2a–g). One clone, 9A13, demonstrated successful biallelic targeting of both VT and PC (Supplementary Fig. 2c, d). 9A13 hiPSCs could be stably maintained under feeder-free conditions and exhibited a normal male karyotype (46, XY) (Supplementary Fig. 2e). They formed round, tightly packed colonies, characteristic of hiPSCs (Supplementary Fig. 2f), and expressed the pluripotency-associated markers, POU5F1, SOX2, and NANOG (Supplementary Fig. 2g). We also confirmed that 9A13 hiPSCs were able to differentiate into hPGCLCs through incipient mesoderm-like cells (iMeLCs) with an induction efficiency of ~53% of AG+ hPGCLCs (Supplementary Fig. 2h, i, j, k), consistent with a previous study14.

Establishment of xrTestis

A previous study successfully reconstituted mouse fetal prospermatogonia from mESC-derived PGC-like cells (mPGCLCs) using reconstituted testes, in which dissociated mouse fetal testicular somatic cells were mixed with mPGCLCs before culture30. As a first step in applying this methodology to humans, we examined whether dissociated cells from mouse fetal testes could be reassembled in the absence of mouse(m)PGCs or mPGCLCs (Fig. 2a).

Fig. 2: Optimization of rTestis and xrTestis culture.
figure2

a Scheme for rTestis culture using mouse testicular somatic cells at embryonic day (E) 12.5. Mouse PGCs (mPGCs) were depleted by MACS. ALI, air-liquid interphase; rTestis, reconstituted testis. b, c Bright-field (BF) images of d2 floating aggregate (b) and d14 xrTestis on ALI culture (c). Scale bars, 200 µm. d Immunofluorescence (IF) images of rTestes at d14 for GC markers (red: TFAP2C, DDX4), the human-specific marker human mitochondrial antigen (hMito) (cyan), and the Sertoli cell marker SOX9 (green), with merges with DAPI (white). Scale bars, 20 µm. e IF images of rTestes at d14 for somatic cell markers (green: SOX9; red: NR2F2, HSD3B1, ACTA2; cyan: CD31) with merges with DAPI (white). Scale bars, 20 µm. f BF images of d2 floating aggregates (left), and d7 and d14 xrTestes (right) cultured in KSR-based or FBS-based medium. Floating aggregates are cultured in the presence or absence of Y-27632 (left). Y-27632 is included in all xrTestis culture (right). Scale bars, 200 µm. g FACS analysis of d2 floating aggregates cultured in KSR or FBS-based medium to assess the number of total cells (in dot plot showing SSC [side scatter] and FSC [forward scatter], left) and the number of hPGCLC-derived cells (TFAP2C-EGFP [AG]-positive, right). The percentages of cells in P1 gates (living cells), the total cell numbers (left), and the numbers of AG-positive cells per floating aggregate (right) are shown. h IF images of d14 xrTestes cultured in KSR- or FBS-based medium for GFP (green), TFAP2C (red), SOX9 (cyan), and DAPI (white) with their merges. Scale bars, 20 µm. i IF images of d14 xrTestes for GC markers (red: TFAP2C, DDX4, SOX17, POU5F1 or NANOG), markers for hPGCLC-derived cells (green: GFP; cyan: hMito), a Sertoli cell marker (cyan: SOX9), and DAPI (white), with their merges. Scale bars, 20 µm. Note that DDX4 is not expressed in xrTestes at this stage. j IF images of d14 xrTestes for a GC marker (red: TFAP2C), a basement membrane marker (cyan: LAMININ), and somatic cell markers (green: CD31; red: NR2F2, ACTA2, HSD3B1; cyan: SOX9), with their merges with DAPI (white). Scale bars, 20 µm. See also Supplementary Figs. 2 and 3.

E12.5 mouse fetal testicular cells depleted of mPGCs readily formed tight aggregates, or reconstituted testis (rTestis), after floating culture for 2 days (Fig. 2b). After an additional 14 days of culture on an air-liquid interface (ALI), rTestes exhibited numerous anastomosing tubular structures comprised of SOX9+ SCs surrounded by ACTA2+ peritubular myoid cells (Fig. 2c, d, e). Neither TFAP2C+ nor DDX4+ GCs were present, indicating the successful depletion of mPGCs (Fig. 2d). Tubules were surrounded by NR2F2+ stroma reminiscent of mouse fetal testis, but CD31+ endothelial cells and HSD3B1+ Leydig cells were not observed (Fig. 2e).

We then identified culture conditions that would allow hPGCLCs to be integrated into rTestes while maintaining testicular tissue integrity and maximizing the hPGCLC recovery after 14 days of ALI culture. Floating aggregates cultured using Knockout Serum Replacement (KSR)-based media formed larger aggregates with less surrounding cell debris and enhanced the recovery of live cells compared with those cultured in fetal bovine serum (FBS)-based media (Fig. 2f, g). The addition of Y-27632, a potent inhibitor of Rho-associated-coiled-coil containing protein kinase (ROCK), during floating culture further enhanced the formation of tighter, slightly larger aggregates and increased the recovery of AG-positive cells (Fig. 2f, g)31. Moreover, IF analysis revealed that aggregates cultured for 14 days in ALI culture using KSR- but not FBS-based medium formed distinct tubules that integrated AG+TFAP2C+ hPGCLC-derived cells (Fig. 2f, h, i, j). IF analysis also confirmed that hPGCLCs maintained PGC markers (POU5F1, NANOG, SOX17, and TFAP2C) after 14 days of ALI culture (Fig. 2i). All GCs were uniformly labeled by AG and by a human mitochondrial antigen (hMito), suggesting that they were derived from hPGCLCs and not from endogenous mouse PGCs (Fig. 2h–j). Leydig cells and endothelial cells were not readily observed in aggregates, suggesting that these cell types might have different culture requirement for survival and/or proper differentiation (Fig. 2j). Overall, these findings confirmed that dissociated mouse testicular somatic cells and hPGCLCs can self-assemble to form mini-fetal testicular tissues, which we named xenogeneic reconstituted testis (xrTestis).

Extended culture of xrTestis

We cultured xrTestes for a prolonged period to determine if hPGCLCs could be further differentiated into more advanced male GCs (Fig. 3a). IF analysis of xrTestis cultured for 42 and 77 days revealed the persistence of tubular structures consisting of SOX9+ SCs and AG+ hPGCLC-derived cells that remained predominantly localized within tubules (Supplementary Fig. 3a, b). Although essentially all GCs in 42-day (d42) xrTestes showed an early PGC phenotype (AG+/TFAP2C+/DDX4/DAZL), many of the AG+TFAP2C+ GCs in d77 xrTestes were strongly immunoreactive for the M marker, DAZL and somewhat more weakly reactive for DDX4 and VT (Fig. 3b, c, d, Supplementary Fig. 3b, c). These cells exhibited slightly larger and more euchromatic nuclei with low DAPI intensity and prominent nucleoli, characteristic of primate PGCs (Fig. 3b)12. IF analysis targeting 5-methylcytosine (5mC) revealed that these hPGCLC-derived cells consistently exhibited low levels of global DNA methylation (Supplementary Fig. 3d). Furthermore, these cells also retained pluripotency-associated markers, such as POU5F1 and NANOG (Fig. 3d, e), suggesting that most hPGCLC-derived cells had progressed towards M but had not yet differentiated into the T1. Flow cytometry analysis at d81 showed the emergence of AG+VT+ and AGVT+ GCs within the dissociated xrTestes, representing 7.9 and 1.7% of total cells (62% or 13% of all human GCs), respectively (Fig. 3f).

Fig. 3: Establishment of xenogeneic reconstituted testis (xrTestis) and generation of human T1LCs.
figure3

a Scheme for T1LC induction by xrTestis culture. ALI, air-liquid interphase. b IF images of hPGCLC-derived cells in d14 and d77 xrTestes for DAPI (white) and TFAP2C (red) with their merges. Scale bars, 2 µm. c IF images of d77 or d120 xrTestes for their expression of indicated key GC markers (cyan: DAZL; green: TFAP2C, TFAP2C-EGFP [AG]; red: DDX4, DDX4-2A-tdTomato [VT]) and DAPI (white). Merged images are shown on the right. Scale bars, 20 µm. d IF images of d77 or d120 xrTestes for AG (green), VT (red), pluripotency-associated markers (cyan: POU5F1, NANOG), and DAPI (white), with their merges. Scale bars, 20 µm. e A dot plot showing the proportion of POU5F1+ or NANOG+ cells in hPGCLC-derived cells (TFAP2C+ or DDX4+) in d77 and d120 xrTestes as assessed by IF analysis on frozen sections. Each dot represents the proportion in one section, and the averages of 3 histologic sections are shown. The total number of POU5F1+ or NANOG+ cells in d77 (57 cells) or d120 xrTestes (11 cells) is also shown. Error bar, standard error of the mean (SEM). The statistical significance of the differences between d77 and d120 are evaluated by Fisher’s exact test, p = 0.002. f Bright field (BF) images (top) and FACS analysis for AGVT expression (bottom) of xrTestes at d42, d81, and d124. Bar, 200 µm. g IF images of d120 xrTestes for AG (green), VT (red), T1 markers (cyan: MAGEA3, MAGEC2), and DAPI (white), with their merges. Scale bars, 20 µm. h FACS histogram of d81 and d124 xrTestes for PIWIL4-ECFP (PC) expression in the respective fraction of hPGCLC-derived cells. The percentage of PC+ cells in the respective fraction are shown. i IF images of hPGCLC-derived cells (TFAP2C or DDX4, green) in d77 or d120 xrTestes for MKI67 (red). Merges with DAPI (white) are shown on the right. Scale bars, 20 µm. j A dot plot showing the proportion of MKI67+ cells in hPGCLC-derived cells (TFAP2C+ or DDX4+) in d77 and d120 xrTestes. Each dot represents the proportion in one section and the averages of three histologic sections are shown. The total number of MKI67+ cells in d77 (85 cells) or d120 xrTestes (15 cells) is also shown. Error bar, SEM. The statistical significance of the differences between d77 and d120 are evaluated by Fisher’s exact test, p = 0.0047. k IF analysis (left) and a box plot showing the fluorescence intensity of 5-methylcytosine (5mC, red) in hPGCLC-derived cells (green: DDX4) and other somatic cells counterstained with DAPI (white) in xrTestes at d120. Scale bar, 20 µm. hPGCLC-derived cells are outlined by white dotted lines. Center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range. The statistical significance of the differences between GCs (Germ) and somatic cells (Soma) are evaluated by two-sided t-test assuming unequal variances. p = 1.45  × 10−5; n, the number of respective cells counted. See also Supplementary Figs. 2 and 3.

We extended our ALI culture further and found that, at d120, xrTestes showed scattered DDX4+/DAZL+/VT+ cells with markedly reduced reactivity for AG, TFAP2C, POU5F1, and NANOG, suggesting that these cells had further differentiated beyond the M stage (Fig. 3c, d, e). We also noted strong expression of the T1 markers MAGEA3 and MAGEC2 at d120 (Fig. 3g)32. Flow cytometric analysis confirmed that most GCs had progressed to the AGVT+ fraction (4.2% of total cells, 58% of all human GCs) (Fig. 3f). Moreover, some of these AG+VT+ (14.3%) and AGVT+ cells (16.5%) expressed PC, another discriminating marker of T1 (Fig. 3h). IF analysis of xrTestes at d120 showed a marked reduction of MKI67+ GCs compared to d77 xrTestes, consistent with progression into the mitotically arrested T1 state (Fig. 3i, j). IF for 5mC revealed that the DNA methylation levels of DDX4+ GCs in d120 xrTestes were still lower than those in somatic cells, thus suggesting that overt de novo DNA methylation might not have commenced yet (Fig. 3k). Together, these findings support our conclusion that hPGCLCs differentiated into T1LCs after ALI culture with xrTestes.

Lineage trajectory leading to T1LCs

To fully capture the male germ lineage progression without the bias introduced by our a priori markers, we isolated GCs (AG+ or VT+) from xrTestes at d81 and d124 and evaluated their transcriptomes by scRNA-seq. Precursor cell types (hiPSCs, iMeLCs and hPGCLCs) were also examined by scRNA-seq. Between 344-1,251 cells for each population were used after excluding results from low-quality cells (Supplementary Fig. 4a). We detected ~3000–7000 median genes/cell at a mean sequencing depth of between 68-273k reads/cell (Supplementary Fig. 4a). After computationally aggregating high-quality cells and presenting the results in the same tSNE space (Fig. 4a, Supplementary Fig. 4b, c), we identified four main clusters (clusters 1–4) that largely separated by sample type, as expected (Supplementary Fig. 4b, g, h). Cluster annotation was further validated by the expression of the known markers for each cell type such as SOX2 (marker for hiPSCs and iMeLCs), EOMES (marker for iMeLCs), and NANOS3 (marker for hPGCLCs and later stage GCs) (Supplementary Fig. 4b)14,15.

Fig. 4: Single-cell transcriptome profiling of in vitro derived human GCs.
figure4

a A tSNE plot of all in vitro cells using computationally aggregated scRNA-seq data obtained from six different samples (hiPSCs; iMeLCs; hPGCLCs_1; hPGCLCs_2, d81 xrTestis, d124 xrTestis). These cells are colored based on clusters defined in Supplementary Fig. 4b and c: hiPSCs (1168 cells, pale green); iMeLCs (800 cells, green), hPGCLCs (1371 cells, brown), MLCs (1224, yellow), TCs (162 cells, purple) and T1LCs (150 cells, red). The cluster re-analyzed in (b) is outlined by a dotted line. b Velocity analysis focused on a cluster outlined by a dotted line in (a) projected on the tSNE space defined in Supplementary Fig. 4f. Black arrows indicate the lineage trajectory estimated using nascent transcripts from scRNA-seq data in (a). Dotted lines enclose cells representing MLCs (yellow), TCs (purple) or T1LCs (red) as defined in Supplementary Fig. 4f. Large red arrow indicates the overall direction of the lineage trajectory at the borders of cell clusters. c Violin plots showing the expression of representative markers in in vitro (left) or in vivo (right) cell clusters defined in Figs. 1a, 4a. d Violin plots showing the expression of representative proliferation markers (top) or pro-apoptotic markers (bottom) in the respective cell clusters. e Heatmap showing the expression pattern of DEGs identified from a multi-group comparison (FDR < 0.01, fold change > 2 compared to other clusters) among cell clusters, and the over-represented GO terms in these DEGs. Color bars on the left indicate DEGs for respective cell clusters. The number of DEGs for each cell cluster are shown at the side of the color bar. f Heatmap of the expression of 45 genes in which mutations are associated with “spermatogenic failure” or “male infertility” in humans. Genes are ordered by UHC (Ward’s method). See also Supplementary Figs. 47 and Supplementary Dataset 2.

Cluster 4 consisted of GCs from both d81 and d124 xrTestes that expressed DAZL, DND1, PIWIL2, which was consistent with their prospermatogonial status (Supplementary Fig. 4b, g). Further analysis for only cells in cluster 4 revealed three distinct subclusters (4-1, 4-2, and 4-3; Supplementary Fig. 4c). Markers for M that were defined previously in vivo, such as KHDC3L and POU5F1, were uniquely expressed in cluster 4-1. We therefore, identified cells from this cluster as “M-prospermatogonia-like cells” (MLCs). Markers for T1, such as MAGEC2 and PIWIL4 were exclusively expressed in cluster 4-3, suggesting that this cluster represented T1LCs (Supplementary Fig. 4c). Cluster 4-2 was located between MLCs and T1LCs, expressing unique markers, ASB9 and FZD8 (Supplementary Fig. 4c). We posit that these cells were transitional cells (TCs) because of their position in the tSNE plot and their intermediate expression levels of both MLC (POU5F1, NANOS3) and T1LC markers (TEX15) (Supplementary Fig. 4c). Consistent with this interpretation, strong ASB9-positive cells co-expressing intermediate levels of M and T1 markers were also seen at the M/T1 border in the tSNE plot for in vivo human testicular GC samples presented above, although these in vivo cells were not identified as a distinct cluster in tSNE analyses performed with higher clustering resolution (K-means up to 10) (Supplementary Fig. 4d, e).

To understand the lineage trajectory among cells in cluster 4, we performed RNA velocity analysis after re-clustering (Fig. 4b and Supplementary Fig. 4f). This analysis confirmed that a lineage progression from POU5F1+ MLCs to PIWIL4+ T1LCs occurred in xrTestis cultures (Fig. 4b). Corresponding to this progression, the proportion of cells derived from d124 xrTestis in each cluster increased from 10.2% in the MLC cluster to 74.1% in the TC cluster and 99.3% in the T1LC cluster (Supplementary Fig. 4h).

We also noted that two biological replicates for hPGCLCs (hPGCLCs_1, hPGCLCs_2) formed a single cluster in the tSNE plot (Supplementary Fig. 4g) and showed high concordance of both averaged gene expression using whole-genome data (r2 = 0.98) (Supplementary Fig. 4i) and pairwise DEGs between hPGCLCs and MLCs (Supplementary Fig. 4j). Dissociation-induced genes were not over-represented in any of the identified clusters, ruling out the possibility that clusters were made by any dissociation-induced artifacts (Supplementary Fig. 5a-c)33. These observations underscore the robustness of the present scRNA-seq platform.

To more fully understand the molecular mechanisms driving the progression from hPGCLCs through MLCs to T1LCs, we explored the gene expression dynamics of our in vitro-derived GCs. The core pluripotency-associated genes, POU5F1 and NANOG, along with GC specifier genes (PRDM1, SOX17, TFAP2C, SOX15), showed peak expression levels at the hPGCLC stage and declined thereafter (Fig. 4c). Interestingly, the expression of some genes involved in naïve pluripotency, such as TCL1B, TFCP2L1, and ZFP42, peaked at the MLC stage. Prospermatogonial (or oogonial) markers, such as DAZL, DDX4, and DPPA3 emerged in MLCs and persisted in later GCs (Fig. 4c)11,23, which was consistent with our IF and flow cytometric data. Moreover, proliferation markers such as CCNA2, CDK4, MKI67, and TOP2A were significantly downregulated in T1LCs (Fig. 4d), which was also consistent with IF analysis (Fig. 3i, j). The expression of proapoptotic marker genes was downregulated or unchanged during T1LC specification, suggesting that T1LCs were quiescent but not overtly apoptotic (Fig. 4d).

Pairwise and multi-group DEG analyses among the six cell types we identified (hiPSCs, iMeLCs, hPGCLCs, MLCs, TCs, and T1LCs) showed the expression dynamics of genes involved in biologic processes that are unique to each cell type (Fig. 4e, Supplementary Figs. 6, 7, Supplementary Dataset 2). For example, iMeLCs exhibited a higher abundance of mRNAs encoding gene products involved in early mesodermal specification, such as EOMES, MIXL1, CER1, and genes related to SMAD signaling (e.g., NODAL, LEFTY2). During the iMeLC-to-hPGCLC transition, hPGCLCs began to express GC specifier genes and genes related to the GO terms “inflammation” and “apoptosis” (Supplementary Fig. 6 and Supplementary Dataset 2). The expression of genes involved in “piRNA pathways” and “spermatogenesis” became upregulated after the transition to MLCs (Supplementary Fig. 6).

The largest number of DEGs was upregulated in T1LCs (1,268 DEGs in a multi-group comparison); many of these DEGs were upregulated gradually during the transition from MLCs to T1LCs (Fig. 4e and Supplementary Fig. 7). In particular, we found genes involved in piRNA pathways and previously recognized markers for spermatogonia in DEGs for T1LCs (Fig. 4e, Supplementary Figs. 6, 7, Supplementary Dataset 3). Numerous transcription factors were also upregulated in T1LCs, such as LMO4, ZNF451, TBP, NFXL1, EGR4, SCX, and EMX1. Many of these were previously uncharacterized in the context of male germline development and may play important roles in the specification of prospermatogonia from PGCs (Fig. 4e, Supplementary Figs. 6, 7, Supplementary Dataset 3). GO terms enriched in T1LCs correspondingly included “transcription factor activity”, “spermatogenesis”, and “DNA methylation involved in gamete generation” (Fig. 4e). We also found that a significant fraction of genes known to be associated with human “spermatogenic failure” or “male infertility” were in T1LCs, suggesting the potential utility of our in vitro platform for identifying the molecular mechanisms of human male infertility (Fig. 4f).

T1LCs exhibit a transcriptome similar to that of T1 in vivo

After establishing our in vitro platform for human male GC development, we compared our in vitro and in vivo transcriptome data to precisely define the status of our in vitro derived GCs along the developmental timeline. All three in vivo testicular samples were intermingled in a tSNE plot and segregated into clusters representing biologically meaningful cell types rather than sample identities (Supplementary Fig. 1c, g); however, a scatter plot comparing the averaged gene expression for either whole GCs or T1 across our three in vivo samples revealed a modest genome-wide reduction of mRNA abundance in Hs26 and Hs27 compared with Hs31 (Supplementary Fig. 1i). We believe this is a technical consequence of Hs26 and Hs27 being cryopreserved prior to analysis. Hs31 was prepared fresh and contained both M and T1 with differential expression patterns that matched the results obtained after aggregating all three samples (Supplementary Fig. 1h, j, k). To ensure the precision of gene expression comparisons with in vitro samples, which were all prepared fresh, we decided to use only Hs31 for downstream comparative studies.

We first performed unsupervised hierarchical clustering using averaged expression values for each cell type, which revealed two large clusters: a pre-germ cell/pre-gonadal phase cluster (hiPSCs, iMeLCs, hPGCLCs) and a prospermatogonial (gonadal) phase cluster (M, MLCs, TCs, T1, T1LCs) (Fig. 5a). Within the latter cluster, M/MLCs and T1/T1LCs each clustered together. Principal component analysis (PCA) on the prospermatogonial phase clusters revealed a gradual transition of cellular properties from M/MLCs to T1/T1LCs via TCs (Fig. 5b). Pairwise comparisons between cell types along this lineage trajectory demonstrated high concordance between in vivo and in vitro cell types; MLCs and T1LCs had the lowest number of DEGs and the highest coefficient of determination (r2) when compared with M or T1, respectively (Fig. 5c). These results highlight the marked similarities between in vivo fetal male germline development and our in vitro platform. Nonetheless, we also noted modest differences in gene expression between T1 and T1LCs. The expression of some HOXA and HOXB family members was significantly higher in T1LCs (Fig. 5d and Supplementary Dataset 3), and reciprocally, XIST, a master regulator of X chromosome inactivation, that is expressed in human oogonia and prospermatogonia34, was more highly expressed in T1, as were 18 other genes (Fig. 5d).

Fig. 5: Comparison of lineage projection in vitro with that of in vivo.
figure5

a UHC of the averaged transcriptomes of in vivo (M, T1) and in vitro cell clusters (hiPSCs, iMeLCs, hPGCLCs, MLCs, TCs, T1LCs) defined in Figs. 1a, 4a. b PCA for prospermatogonial stage GCs (M, MLCs, TCs, T1, and T1LCs). Color codes for cell clusters are indicated. c Number of DEGs and coefficient of determination (r2) for pairwise comparisons between in vivo (M [left] or T1 [right]) and in vitro clusters (hPGCLCs, MLCs, TCs, and T1LCs). DEGs are defined as genes with more than fourfold differences between two groups (mean log2[TPM+1] > 2, FDR < 0.01). r2 are calculated using all annotated genes (18447 genes). d Scatter plot comparison of the averaged values of gene expression between T1 and T1LCs. Blue, genes higher in T1; red, genes higher in T1LCs (more than 4-fold differences [flanking diagonal lines], mean log2[TPM+1] > 2, FDR < 0.01). Key genes are annotated and the number of DEGs are indicated. Representative genes and their GO enrichments for genes higher in T1LCs are shown on the right. e Heatmaps of the expression of markers for “migrating” (172 genes), “mitotic” (306 genes), and “mitotic-arrest” (1037 genes) human male FGCs (defined by Li et al.11) in the respective cell clusters defined in this study (left) and the indicated male FGC types defined by Li et al.11 (right). Using these markers, hierarchical clustering was performed for cell clusters defined in this study (left). GO terms for the respective markers are shown on the right. f Heatmap of the averaged expression values of 316 genes in indicated cell types defined by Yamashiro et al.15 (top). Among markers for T1LCs defined in Fig. 4e, 1177 genes were also annotated in the dataset by Yamashiro et al.15. Among them, 316 genes upregulated in ag120 AG+/−VT+ oogonia-like cells relative to all other cell types (more than twofold difference) are shown (top). GO analysis for these 316 genes are also shown (bottom). g Heatmap of the averaged expression in the indicated cell types for 110 genes highly expressed in T1LCs but with weak or no expression in ag120 AG+/−VT+ oogonia-like cells. To enable comparison between two different scRNA-seq platforms, RPM values of the data from Yamashiro et al. were adjusted using a polynomial regression curve (see “Methods”). Among 1,177 DEGs for T1LCs (Fig. 4e), 110 genes showing high levels of expression in T1LCs (mean log2[TPM+1] > 4) and low levels of expression in oogonia-like cells (adjusted log2[RPM+1] < 2) are shown. GO analysis for these 110 genes are also shown (bottom). h Violin plot showing the expression levels of genes at the male-specific regions of the Y chromosome (MSY) in indicated cell clusters. These genes were identified the by multi-group DEG analysis in Fig. 4e (without cut-off by fold-change > 2). See also Supplementary Figs. 4, 6, 7, Supplementary Dataset 35.

We additionally compared our results with published data from human male FGCs that were categorized into migrating, mitotic and mitotic-arrest FGCs11. In these datasets, migrating FGCs were collected from the aorta-gonad-mesonephros region of human male embryos at the age of 4 weeks. They were therefore categorized as PGCs in our study. The differentially expressed genes that distinguished these three cell types separated our hPGCLCs, M/MLCs, and T1/T1LCs/TCs largely into “migrating”, “mitotic” and “mitotic-arrest” stages of human FGCs, respectively, in agreement with direct comparison of our in vivo and in vitro specimens (Fig. 5e, Supplementary Dataset 4). We also performed PCA for all the cell clusters in our dataset alongside previously published data for neonatal human spermatogonia and found that T1LCs are slightly closer to neonatal spermatogonia than T1 (Supplementary Fig. 8a, b), which might partly explain the differences we observed between T1 and T1LCs (Fig. 5d).

Comparison of T1LCs with oogonia-like cells induced from hiPSCs

To highlight the similarities and differences between male and female GC development, we compared the transcriptome profiles of our in vitro T1LCs with published data for ag120 AG+/−VT+ oogonia-like cells induced from hiPSCs via intermediate states (i.e., iMeLCs, hPGCLCs and ag77 AG+VT+ cells)15. ag120 AG+/−VT+ and ag77 AG+VT+ cells in the previously published data roughly correspond to retinoic acid (RA)-responsive and mitotic in vivo female FGCs, respectively11,15. Among the 1177 DEGs that were unique to T1LCs and commonly annotated in both the male and female data sets, 316 DEGs were also similarly higher in oogonia-like cells (Fig. 5f and Supplementary Dataset 5), including many genes involved in piRNA biogenesis and de novo DNA methylation. We also identified 110 genes that were highly expressed in T1LCs but had weak or no expression in oogonia-like cells (Fig. 5g and Supplementary Dataset 5). These genes included many transcription factors, such as HOXB4, HOXC9, TBX3, ESX1, SCX, SIX1, and accordingly were enriched for GO terms such as “nucleus” and “sequence-specific DNA binding.” Of note, our T1LCs were induced from hiPSCs bearing an XY karyotype. Accordingly, we found that 6 Y-chromosomal genes at the male-specific region of the Y chromosome (MSY) were differentially expressed among our in vitro dataset (Fig. 5h). Four of these genes (DDX3Y, EIF1AY, KDM5D, TSPY2) were significantly upregulated along the lineage trajectory. A similar trend was observed between M and T1.

Expression dynamics of transposable elements (TEs) during human male germline development

A hallmark feature of GC development is the activation and subsequent repression of TEs, which ensures the genome integrity of GCs while still permitting co-evolution between TEs and their hosts16,35,36,37. However, the precise expression dynamics of TEs in human male GC development is not well characterized. We used our in vitro platform, which covers at least the first ~20 weeks of human male GC development, to comprehensively map the expression patterns of various TE categories. Dimension reduction analysis of TE expression profiles obtained from scRNA-seq data demonstrated that each differentiation stage exhibits unique TE expression dynamics (Fig. 6a). Notably, M/MLCs and T1/T1LCs clustered together in both a UMAP plot and a hierarchical clustering analysis performed on differentially expressed TEs, further underscoring the similarities between in vivo cells and our in vitro model (Fig. 6a, b). Based on total TE-derived transcript abundances, TE were globally upregulated over time, both in vivo and in vitro, with the highest expression levels observed in T1 and T1LCs (Fig. 6c).

Fig. 6: Dynamic regulation of TE expression during human male germline development.
figure6

a Dimension reduction analysis of scRNA-Seq data based on TE expression profile. The 100 most variably expressed TE subfamilies were used in the analysis. b Hierarchical clustering (Ward’s method using Euclidian distances) of the respective cell clusters (averaged expression values) using the 200 most variably expressed TEs. c Dynamics of the total expression level of TEs. d Heatmap showing the expression dynamics of respective TE subfamilies. The 200 most variably expressed TEs are shown. Classifications of TEs are shown in the left of heatmap. e Expression dynamics of respective TE families. f Expression dynamics of ERV subfamilies. ERVs exhibiting unique expression patterns are shown. g Enrichment of ATAC-Seq signals on specific ERV subfamilies. The fold-enrichment of the overlaps between ATAC-Seq peaks and a specific ERV subfamily was calculated on the random expectation. Publicly available ATAC-Seq data (Chen et al.41 and Guo et al.42) were used. h A model for T1LC induction from hiPSCs using xrTestis culture.

Most TE subtypes belonging to LINE1, SINE, DNA transposon, and SVA consistently showed gradual de-repression, presumably due to genome-wide DNA demethylation and other epigenetic reprogramming events that accompany FGC development (Fig. 6d, e)23,34,36. In contrast to these TEs, LTR-type retrotransposons (i.e., endogenous retroviruses; ERVs) exhibited a more stage-specific expression pattern (Fig. 6d–f). For example, among ERV1 families, HERV-H and LTR7 expression peaked at hiPSCs and then subsequently declined (Fig. 6f), consistent with their roles in regulating the pluripotency gene network38,39,40. Another group of ERV1 families, LTR12 (LTR12C, LTR12D, LTR12_) were significantly upregulated in T1 or T1LCs (Fig. 6f). On the other hand, some members of the ERVK family, such as HERVIP10FH, were more highly expressed in hPGCLCs (Fig. 6f).

The stage-specific expression dynamics of ERVs suggests that they are involved in regulating the expression of genes involved in GC development. In fact, the analysis of publicly available ATAC-seq data revealed the enrichment of HERVH/LTR7 and HERVIP10FH in the ATAC-seq peaks obtained from ESCs/iMeLCs or hPGCLCs, respectively (Fig. 6g)41,42, coinciding with their expression peaks in the respective cell types in our study (Fig. 6f). Moreover, LTR12 subfamilies were enriched in the ATAC-seq peaks from human male gonadal GCs and adult spermatogonia, suggesting that these TEs might be involved in gene regulation in human (pro)spermatogonia (Fig. 6g).

In summary, our observation that ERVs are regulated in a dynamic and stage-specific manner during male GC development raises the intriguing possibility that ERVs may be directly involved in the transcriptional activation of genes driving human prospermatogonial specification. Our observation additionally suggests that the identification of sequences downstream of these regulatory elements may reveal genes that play a causative role in this process (Fig. 6h).

Comments

    Something to say?

    Log in or Sign up for free

    Disclaimer: The translated content is provided by third-party translation service providers, and IKCEST shall not assume any responsibility for the accuracy and legality of the content.
    Translate engine
    Article's language
    English
    中文
    Pусск
    Français
    Español
    العربية
    Português
    Kikongo
    Dutch
    kiswahili
    هَوُسَ
    IsiZulu
    Action
    Related

    Report

    Select your report category*



    Reason*



    By pressing send, your feedback will be used to improve IKCEST. Your privacy will be protected.

    Submit
    Cancel