Chapter 2 Introduction
2.1 Why do you need masmr
?
Existing MERFISH decoding workflows are typically designed to be “plug and play”: requiring minimal manual adjustment or re-parameterisation. These one-size-fits-all approaches work well when their assumptions about data hold true, but return dramatically reduced performance and erroneous decoding when not.
masmr
was thus designed because we wanted more control over the MERFISH decoding, beyond what was offered by existing software. Our package is also designed to have an intuitive syntax, so that users can adapt the software without needing deep technical expertise.
2.2 Background
For most MERFISH pipelines, one wants to accomplish the following tasks:
Spotcalling: This involves reading each fluorescence image, performing the necessary image processing steps, decoding, and returning of spot coordinates. This is typically performed one image / field-of-view (FOV) at a time. Ideally, users optimise their spotcalling pipeline on one FOV before looping through all the other FOVs, applying the same spotcalling criteria.
Cell segmentation: Cell segmentation involves identifying nuclei or cells in an image and recording locations of each cell mask. Again, this is typically performed one FOV at a time.
Stitching: Stitching involves the alignment of partially overlapping FOVs, so that a coherent image and coordinate system is obtained.
Synthesis: Having processed every FOV, the final step is to consolidate all outputs into an easy-to-read format. Care must be taken to ensure coordinate information is comparable between spots and cell masks, and spots are assigned to the correct cells.
It is recommended that each of the above tasks be accomplished with separate scripts. Furthermore, time-saving is possible by performing tasks (1), (2), and (3) in parallel, before consolidation in step (4).
2.3 Quick start scripts
Below, we provide several quick start scripts for users who wish to immediately proceed with a default MERFISH pipeline. Details on what each step is doing is provided in the subsequent chapters.
Spotcalling and Cell segmentation scripts can be run in parallel to save time. Stitching can also be run alongside Spotcalling and Cell segmentation, or after the previous two scripts. Finally, the Synthesis script is run to consolidate outputs of the previous three scripts.
For 3D image stacks, our default processing is to perform a maximum intensity projection. But users may simply loop through each Z plane if they so wish. Kindly refer to Troubleshooting (Chapter 8) for details.
2.3.1 Quick start: Spotcalling
In the Spotcalling script, we loop through FOVs, performing default image processing (with getImageMetrics
). Processed images are registered to a reference image (with registerImages
) so that pixels from the same spots are maximally overlapped. Information from the same FOV are then consolidated into a dataframe (with consolidateImageMetrics
) that has pixels in rows and metrics (e.g. pixel intensity per image) in columns. Pixels are then assigned an identity on decoding (with decode
). This dataframe is then successively filtered through a series of recommended filters; users may add / drop filters as needed.
Per FOV, the final output is a dataframe containing spot locations as rows (a single pixel per spot), and accumulated meta data in columns.
library(masmr)
## Preparation ahead of looping through FOVs
establishParams( ... , verbose = T ) #User to input. Verbose = T returns messages keeping track of progress.
readImageMetaData()
prepareCodebook()
getAnchorParams()
establishCleanSlate()
## Alternatively, users can keep verbose = T but send output to messages.Rout
# notVerbose = F
# if( notVerbose ){
# outMessageFile <- file( paste0(params$out_dir, 'messages.Rout'), open = 'wt' )
# sink(outMessageFile, type = 'message')
# message("\nRedirecting messages to messages.Rout...\n\n")
# }
## Start loop
for( fovName in params$fov_names ){
## Report current FOV / iteration
message( paste0(
'\n', fovName, ': ',
which(params$fov_names==fovName), ' of ',
length(params$fov_names), '...' ) )
## Check if file has been created, skip if yes
if( params$resumeMode ){
<- file.exists( paste0(params$out_dir, 'SPOTCALL_', fovName, '.csv.gz') )
checkFile if(checkFile){
message('FOV has been processed...Skipping...')
next
}
}
## Keep track of the current_fov
$current_fov <- fovName
params
## Load current FOV images
<- readImageList( params$current_fov )
imList
## If a Z-stack, maximum intensity projection
<- maxIntensityProject( imList )
imList
## Check if image is largely empty, if so, skip
<- mean( sapply(1:length(imList), function(i){
percBlank sum( imList[[i]] < params$brightness_min[i] ) / length(imList[[i]])
}) )if( percBlank >= 0.95 ){ next }
## Get image metrics
getImageMetrics( imList )
## Register
<- list()
forReg for(i in 1:length(imList)){
<- imNormalise( imMetrics$COARSE[[i]] + imMetrics$DECODE[[i]] )
forReg[[i]]
}names(forReg) <- names(imList)
registerImages(forReg)
## Consolidate into pixel wise metric dataframe
<- consolidateImageMetrics()
spotcalldf cleanUp( c('spotcalldf', cleanSlate) )
## Decode and annotate blanks
<- decode(spotcalldf)
spotcalldf $BLANK <- scoreBlanks( spotcalldf ) >0
spotcalldf
## Binarise and threshold on Hamming distance
<- bitsFromDecode(spotcalldf, trainIndex = !spotcalldf$BLANK)
spotcalldf <- filterDF(spotcalldf, spotcalldf$HD>1)
spotcalldf
## Distance-based filtering
<- rep(0, nrow(spotcalldf))
filtout for( distanceMetric in c('COS', 'EUC', 'L2E') ){
<- thresh_Quantile(
thresh
spotcalldf[,distanceMetric], label = spotcalldf$BLANK,
quantileFalse = 0.5, quantileTrue = 0.5 )
<- filtout + spotcalldf[,distanceMetric] > thresh
filtout
}<- filterDF(spotcalldf, filtout>0)
spotcalldf
## Filter nonblanks that are next to blanks
$NEXTTOBLANK <- spatialIsAdjacent( spotcalldf, 'BLANK' )
spotcalldf<- filterDF(spotcalldf, spotcalldf$NEXTTOBLANK & !spotcalldf$BLANK)
spotcalldf
## Filter using DFF0 metrics
<- scoreDeltaF( spotcalldf )
spotcalldf <- thresh_Quantile(
thresh 'DFF0_DECODE'],
spotcalldf[,label = spotcalldf$BLANK,
quantileFalse = 0.25, quantileTrue = 0.25 )
<- filterDF(spotcalldf, spotcalldf[,'DFF0_DECODE'] < thresh & spotcalldf$F0_DECODE!=0)
spotcalldf
## Filter based on HNN distance
$HNNDIST <- spatialHNNDist( spotcalldf )
spotcalldf<- filterDF( spotcalldf, spotcalldf$HNNDIST > 5 | is.na(spotcalldf$HNNDIST) )
spotcalldf
## Filter based on PBLANK
<- c('Xm', 'Ym', 'fov', 'g', 'WX', 'WY', 'IDX',
forExclusion colnames(spotcalldf)[grepl('_\\d+$|^DFF0_|^F1_|CV_|LAB$', colnames(spotcalldf))])
$PBLANK <- scoreClassifier(spotcalldf, 'BLANK', variablesExcluded = forExclusion)
spotcalldf<- thresh_Quantile(spotcalldf$PBLANK, spotcalldf$BLANK, quantileFalse = 0.5, quantileTrue = 0.25)
thresh <- filterDF(spotcalldf, spotcalldf$PBLANK > thresh)
spotcalldf
## Cluster
<- spatialClusterLeiden( spotcalldf, minNeighbours = 3, maxInterSpotDistancePixels = 3)
clusterInfo colnames(clusterInfo)] <- clusterInfo
spotcalldf[,= 250
maxClusterSize <- filterDF(spotcalldf, !(spotcalldf$CENTROID & spotcalldf$CLUSTER_SIZE < maxClusterSize) )
spotcalldf saveDF( spotcalldf )
cleanUp()
}
2.3.2 Quick start: Cell segmentation
Cell segmentation loops through DAPI images. The default model specified by establishCellSegModel
is cellpose
(nuclei
model).
Per FOV, the final output is a dataframe with pixels in rows and the cell mask that has been assigned to that pixel.
library(masmr)
## Preparation ahead of looping through FOVs
establishParams( ... , verbose = T ) #User to input
readImageMetaData( 'dapi_dir' ) #Switch to DAPI images and prepare meta data
establishCellSegModel()
establishCleanSlate()
## Start loop
for( fovName in params$fov_names ){
## Report current FOV / iteration
message( paste0(
'\n', fovName, ': ',
which(params$fov_names==fovName), ' of ',
length(params$fov_names), '...' ) )
## Check if file has been created, skip if yes
if( params$resumeMode ){
<- file.exists(
checkFile paste0(params$out_dir, 'CELLSEG_',
paste(sort(ls(cellSeg)), collapse=''), '_',
'.csv.gz') )
fovName, if(checkFile){
message('FOV has been processed...Skipping...')
next
}
}
## Keep track of the current_fov
$current_fov <- fovName
params
## Load current FOV image
<- readImageList( params$current_fov )[[1]] #List of 1
im
## Cell segmentation and saving
<- runCellSegModel(im)
cellMask saveCellSegMasks(cellMask, im = im)
cleanUp()
}
2.3.3 Quick start: Stitching
The Stitching script identifies the necessary rigid transformation required so that adjacent FOVs overlap appropriately.
Per FOV, the final output is a dataframe with neighbouring FOVs in rows, and the vectors needed for moving these FOVs to maximise overlap with the reference FOV. Two vectors are returned in the script below – one for stitching MERFISH images, and one for stitching DAPI images – but only one is required.
library(masmr)
## Preparation ahead of looping through FOVs
establishParams( ... , verbose = T ) #User to input
establishCleanSlate()
## Start loop
for( fovName in params$fov_names ){
## Report current FOV / iteration
message( paste0(
'\n', fovName, ': ',
which(params$fov_names==fovName), ' of ',
length(params$fov_names), '...' ) )
## Check if file has been created, skip if yes
if( params$resumeMode ){
<- file.exists( paste0(params$out_dir, 'STITCH_', fovName, '.csv') )
checkFile if(checkFile){
cat('FOV has been processed...Skipping...')
next
}
}
## Keep track of the current_fov
$current_fov <- fovName
params
## Stitch with respect to reference bit
<- readImagesForStitch( subDirectory = 'IM', loadProcessedImages = T ) #Assume spotcalling done
imList <- stitchImages( imList )
stitchResults
## Optional: Stitch with respect to DAPI image
<- stitchResults
df colnames(df)[2] <- paste0('merfish_', colnames(stitchResults)[2]) #Save the stitch results from merfish in its own column
<- readImagesForStitch( subDirectory = 'DAPI', loadProcessedImages = T ) #Assume cell segmentation done
imList <- stitchImages( imList )
stitchResults
## Optional: If have both merfish and cell segmentation stich vectors, consolidate into a single dataframe
colnames(stitchResults)[2] <- paste0('dapi_', colnames(stitchResults)[2])
colnames(df)[2]] <- df[match(stitchResults[,1], df[,1]),2]
stitchResults[,
## Save stitch dataframe
saveStitch( stitchResults )
cleanUp()
}
2.3.4 Quick start: Synthesis
This is the last script. Synthesis combines all the outputs from the previous three scripts into a series of dataframes, across every processed FOV. Additionally, users can plotQC
to return some default QC plots to evaluate their pipeline’s performance.
Hereafter, should users wish to save space, they can delete the intermediate results returned by the previous three scripts.
library(masmr)
## Establish params
establishParams( ... , verbose = T ) #User to input
## Function to synthesise data
synthesizeData()
## Function to generate QC plots
plotQC()