Inputs and Outputs

Inputs

The expected input to nuc2seg is the Xenium analysis directory.

We expect a single directory with all files having the default names with no prefix or suffix as described in the Xenium documentation here: https://www.10xgenomics.com/support/software/xenium-onboard-analysis/latest/analysis/xoa-output-at-a-glance

Specifically our algorithm uses the nucleus_boundaries.parquet and transcripts.parquet files, and we expect them to be named as such.

Outputs

nuc2seg produces several output data files:

  • preprocessed.h5: Rasterized transcript and nucleus data used for training the segmentation model.

  • weights.ckpt: Saved model weights. You can resume training from this point if not converged.

  • segmentation.h5: A h5 file containing the rasterized segmentation of the cells.

  • shapes.parquet: A parquet file containing the non-rasterized shapes of the cell segmentation.

  • anndata.h5ad: Anndata file based on the cell segmentation (will only contain transcripts that fall within cell segments)

  • spatialdata.zarr: SpatialData Zarr directory with the cell segmentation added as a new layer, can be used for visualization with napari-spatialdata,

or for plotting with spatialdata_plot

And several plots:

  • prediction_plots/: A directory containing plots of the model’s angle predictions on the input data.

    We create a hundred plots each covering a single tile just to allow for a quick visual inspection of the model’s predictions.

  • cell_typing_plots/: A directory containing plots of the model’s cell typing predictions on the input data.

    Includes AIC, BIC for all k-values, as well as cell type probabilities and relative gene expression for each k-value.

  • segmentation.png: A plot of the final segmentation. Red is the original nucleus, and blue is the predicted cell boundary.

  • class_assignment.png: A plot of the final cell typing. Each cell segment is colored according to its predicted cell type.