Chapter 2 Introduction

2.1 Why do you need `masmr`?

Existing MERFISH decoding workflows are typically designed to be “plug and play”: requiring minimal manual adjustment or re-parameterisation. These one-size-fits-all approaches work well when their assumptions about data hold true, but return dramatically reduced performance and erroneous decoding when not.

masmr was thus designed because we wanted more control over the MERFISH decoding, beyond what was offered by existing software. Our package is also designed to have an intuitive syntax, so that users can adapt the software without needing deep technical expertise.

2.2 Background

For most MERFISH pipelines, one wants to accomplish the following tasks:

Spotcalling: This involves reading each fluorescence image, performing the necessary image processing steps, decoding, and returning of spot coordinates. This is typically performed one image / field-of-view (FOV) at a time. Ideally, users optimise their spotcalling pipeline on one FOV before looping through all the other FOVs, applying the same spotcalling criteria.
Cell segmentation: Cell segmentation involves identifying nuclei or cells in an image and recording locations of each cell mask. Again, this is typically performed one FOV at a time.
Stitching: Stitching involves the alignment of partially overlapping FOVs, so that a coherent image and coordinate system is obtained.
Synthesis: Having processed every FOV, the final step is to consolidate all outputs into an easy-to-read format. Care must be taken to ensure coordinate information is comparable between spots and cell masks, and spots are assigned to the correct cells.

It is recommended that each of the above tasks be accomplished with separate scripts. Furthermore, time-saving is possible by performing tasks (1), (2), and (3) in parallel, before consolidation in step (4).

2.3 Quick start scripts

Below, we provide several quick start scripts for users who wish to immediately proceed with a default MERFISH pipeline. Details on what each step is doing is provided in the subsequent chapters.

Spotcalling and Cell segmentation scripts can be run in parallel to save time. Stitching can also be run alongside Spotcalling and Cell segmentation, or after the previous two scripts. Finally, the Synthesis script is run to consolidate outputs of the previous three scripts.

For 3D image stacks, our default processing is to perform a maximum intensity projection. But users may simply loop through each Z plane if they so wish. Kindly refer to Troubleshooting (Chapter 8) for details.

2.3.1 Quick start: Spotcalling

In the Spotcalling script, we loop through FOVs, performing default image processing (with getImageMetrics). Processed images are registered to a reference image (with registerImages) so that pixels from the same spots are maximally overlapped. Information from the same FOV are then consolidated into a dataframe (with consolidateImageMetrics) that has pixels in rows and metrics (e.g. pixel intensity per image) in columns. Pixels are then assigned an identity on decoding (with decode). This dataframe is then successively filtered through a series of recommended filters; users may add / drop filters as needed.

Per FOV, the final output is a dataframe containing spot locations as rows (a single pixel per spot), and accumulated meta data in columns.

library(masmr)

## Preparation ahead of looping through FOVs
establishParams( ... , verbose = T ) #User to input. Verbose = T returns messages keeping track of progress.
readImageMetaData()
prepareCodebook()
getAnchorParams()
establishCleanSlate()

## Alternatively, users can keep verbose = T but send output to messages.Rout
# notVerbose = F
# if( notVerbose ){
#   outMessageFile <- file( paste0(params$out_dir, 'messages.Rout'), open = 'wt' )
#   sink(outMessageFile, type = 'message')
#   message("\nRedirecting messages to messages.Rout...\n\n")
# }

## Start loop
for( fovName in params$fov_names ){
  
  ## Report current FOV / iteration 
  message( paste0(
    '\n', fovName, ': ', 
    which(params$fov_names==fovName), ' of ', 
    length(params$fov_names), '...' ) )
  
  ## Check if file has been created, skip if yes
  if( params$resumeMode ){
    checkFile <- file.exists( paste0(params$out_dir, 'SPOTCALL_', fovName, '.csv.gz') )
    if(checkFile){
      message('FOV has been processed...Skipping...')
      next
    }
  }
  
  ## Keep track of the current_fov
  params$current_fov <- fovName
  
  ## Load current FOV images
  imList <- readImageList( params$current_fov )
  
  ## If a Z-stack, maximum intensity projection
  imList <- maxIntensityProject( imList )
  
  ## Check if image is largely empty, if so, skip
  percBlank <- mean( sapply(1:length(imList), function(i){
    sum( imList[[i]] < params$brightness_min[i] ) / length(imList[[i]])
  }) )
  if( percBlank >= 0.95 ){ next }
  
  ## Get image metrics
  getImageMetrics( imList )
  
  ## Register
  forReg <- list()
  for(i in 1:length(imList)){
    forReg[[i]] <- imNormalise( imMetrics$COARSE[[i]] + imMetrics$DECODE[[i]] )
  }
  names(forReg) <- names(imList)
  registerImages(forReg)
  
  ## Consolidate into pixel wise metric dataframe
  spotcalldf <- consolidateImageMetrics()
  cleanUp( c('spotcalldf', cleanSlate) )
  
  ## Decode and annotate blanks
  spotcalldf <- decode(spotcalldf)
  spotcalldf$BLANK <- scoreBlanks( spotcalldf ) >0
  
  ## Binarise and threshold on Hamming distance
  spotcalldf <- bitsFromDecode(spotcalldf, trainIndex = !spotcalldf$BLANK)
  spotcalldf <- filterDF(spotcalldf, spotcalldf$HD>1)
  
  ## Distance-based filtering
  filtout <- rep(0, nrow(spotcalldf))
  for( distanceMetric in c('COS', 'EUC', 'L2E') ){
    thresh <- thresh_Quantile( 
      spotcalldf[,distanceMetric], 
      label = spotcalldf$BLANK, 
      quantileFalse = 0.5, quantileTrue = 0.5 )
    filtout <- filtout + spotcalldf[,distanceMetric] > thresh
  }
  spotcalldf <- filterDF(spotcalldf, filtout>0)
  
  ## Filter nonblanks that are next to blanks
  spotcalldf$NEXTTOBLANK <- spatialIsAdjacent( spotcalldf, 'BLANK' )
  spotcalldf <- filterDF(spotcalldf, spotcalldf$NEXTTOBLANK & !spotcalldf$BLANK)
 
  ## Filter using DFF0 metrics
  spotcalldf <- scoreDeltaF( spotcalldf )
  thresh <- thresh_Quantile( 
    spotcalldf[,'DFF0_DECODE'], 
    label = spotcalldf$BLANK, 
    quantileFalse = 0.25, quantileTrue = 0.25 )
  spotcalldf <- filterDF(spotcalldf, spotcalldf[,'DFF0_DECODE'] < thresh & spotcalldf$F0_DECODE!=0)
  
  ## Filter based on HNN distance
  spotcalldf$HNNDIST <- spatialHNNDist( spotcalldf )
  spotcalldf <- filterDF( spotcalldf, spotcalldf$HNNDIST > 5 | is.na(spotcalldf$HNNDIST) )
  
  ## Filter based on PBLANK  
  forExclusion <- c('Xm', 'Ym', 'fov', 'g', 'WX', 'WY', 'IDX', 
                    colnames(spotcalldf)[grepl('_\\d+$|^DFF0_|^F1_|CV_|LAB$', colnames(spotcalldf))])
  spotcalldf$PBLANK <- scoreClassifier(spotcalldf, 'BLANK', variablesExcluded = forExclusion)
  thresh <- thresh_Quantile(spotcalldf$PBLANK, spotcalldf$BLANK, quantileFalse = 0.5, quantileTrue = 0.25)
  spotcalldf <- filterDF(spotcalldf, spotcalldf$PBLANK > thresh)
  
  ## Cluster
  clusterInfo <- spatialClusterLeiden( spotcalldf, minNeighbours = 3, maxInterSpotDistancePixels = 3)
  spotcalldf[,colnames(clusterInfo)] <- clusterInfo
  maxClusterSize = 250
  spotcalldf <- filterDF(spotcalldf, !(spotcalldf$CENTROID & spotcalldf$CLUSTER_SIZE < maxClusterSize) )
  saveDF( spotcalldf )
  
  cleanUp()
}

2.3.2 Quick start: Cell segmentation

Cell segmentation loops through DAPI images. The default model specified by establishCellSegModel is cellpose (nuclei model).

Per FOV, the final output is a dataframe with pixels in rows and the cell mask that has been assigned to that pixel.

library(masmr)

## Preparation ahead of looping through FOVs
establishParams( ... , verbose = T ) #User to input
readImageMetaData( 'dapi_dir' ) #Switch to DAPI images and prepare meta data
establishCellSegModel()
establishCleanSlate()

## Start loop
for( fovName in params$fov_names ){
  
  ## Report current FOV / iteration 
  message( paste0(
    '\n', fovName, ': ', 
    which(params$fov_names==fovName), ' of ', 
    length(params$fov_names), '...' ) )
  
  ## Check if file has been created, skip if yes
  if( params$resumeMode ){
    checkFile <- file.exists( 
      paste0(params$out_dir, 'CELLSEG_',
             paste(sort(ls(cellSeg)), collapse=''), '_', 
             fovName, '.csv.gz') )
    if(checkFile){
      message('FOV has been processed...Skipping...')
      next
    }
  }
  
  ## Keep track of the current_fov
  params$current_fov <- fovName
  
  ## Load current FOV image
  im <- readImageList( params$current_fov )[[1]] #List of 1
  
  ## Cell segmentation and saving
  cellMask <- runCellSegModel(im)
  saveCellSegMasks(cellMask, im = im)
  
  cleanUp()
}

2.3.3 Quick start: Stitching

The Stitching script identifies the necessary rigid transformation required so that adjacent FOVs overlap appropriately.

Per FOV, the final output is a dataframe with neighbouring FOVs in rows, and the vectors needed for moving these FOVs to maximise overlap with the reference FOV. Two vectors are returned in the script below – one for stitching MERFISH images, and one for stitching DAPI images – but only one is required.

library(masmr)

## Preparation ahead of looping through FOVs
establishParams( ... , verbose = T ) #User to input
establishCleanSlate()

## Start loop
for( fovName in params$fov_names ){
  
  ## Report current FOV / iteration 
  message( paste0(
    '\n', fovName, ': ', 
    which(params$fov_names==fovName), ' of ', 
    length(params$fov_names), '...' ) )
  
  ## Check if file has been created, skip if yes
  if( params$resumeMode ){
    checkFile <- file.exists( paste0(params$out_dir, 'STITCH_', fovName, '.csv') )
    if(checkFile){
      cat('FOV has been processed...Skipping...')
      next
    }
  }
  
  ## Keep track of the current_fov
  params$current_fov <- fovName
  
  ## Stitch with respect to reference bit
  imList <- readImagesForStitch( subDirectory = 'IM', loadProcessedImages =  T ) #Assume spotcalling done
  stitchResults <- stitchImages( imList )
  
  ## Optional: Stitch with respect to DAPI image
  df <- stitchResults
  colnames(df)[2] <- paste0('merfish_', colnames(stitchResults)[2]) #Save the stitch results from merfish in its own column
  imList <- readImagesForStitch( subDirectory = 'DAPI', loadProcessedImages =  T ) #Assume cell segmentation done
  stitchResults <- stitchImages( imList )
  
  ## Optional: If have both merfish and cell segmentation stich vectors, consolidate into a single dataframe
  colnames(stitchResults)[2] <- paste0('dapi_', colnames(stitchResults)[2])
  stitchResults[,colnames(df)[2]] <- df[match(stitchResults[,1], df[,1]),2]
  
  ## Save stitch dataframe
  saveStitch( stitchResults )
  
  cleanUp()
}

2.3.4 Quick start: Synthesis

This is the last script. Synthesis combines all the outputs from the previous three scripts into a series of dataframes, across every processed FOV. Additionally, users can plotQC to return some default QC plots to evaluate their pipeline’s performance.

Hereafter, should users wish to save space, they can delete the intermediate results returned by the previous three scripts.

library(masmr)

## Establish params
establishParams( ... , verbose = T ) #User to input

## Function to synthesise data
synthesizeData()

## Function to generate QC plots
plotQC()