Chapter 8 Troubleshooting

The masmr package is designed to enable users to build their own custom pipelines. Users are encouraged to play around with an FOV or two, to determine optimal parameter choices, before looping through their entire data set.

Below, we provide some suggestions for users seeking to improve their pipelines. This has been organised by potential problems, possible causes (and how to identify them), and potential solutions.

Too few spots

Poor choice of image processing. This is best identified during the spotcalling script itself, rather than the synthesis step at the end. Users are encouraged to plot their processed images using plot(imager::as.cimg( imageMatrix )). [The as.cimg offers considerable speed up.] Users may attempt manual decoding on a small region of one or two FOVs (i.e. look at images and determine the sequence of pixel ON/OFF), and ensure that their pipeline returns results that agree with their manual decoding. This pipeline – optimised on a subset of FOVs – can then be looped through all FOVs to apply consistent spotcalling standards.
- Potential solution: Changing the function used to return MASK in getImageMetrics. Larger masks means more pixels are brought forwards for subsequent decoding.
- Potential solution: Changing the function used to return DECODE in getImageMetrics. As far as possible, attempt to return a bimodal distribution where there is clear separation of OFF and ON bits by intensity.
Filtering criteria are too strict. This is also best identified during the spotcalling script itself. Users are encouraged to use the filterDF function to keep track of how spot profiles per FOV are changing with each filter step.
- Potential solution: Do not add more blanks during prepareCodebook – i.e. set exhaustiveBlanks = F – or increase strictness for criteria in calling blanks with scoreBlanks. This is because many subsequent steps rely on blank detection: more blanks likely increases strictness of filtering pixels.
- Potential solution: During bitsFromDecode, swapping from the default f1 to fb allows users to upweight / downweight recall with respect to precision using fBeta. E.g. fBeta >1 upweights recall and minimises false negatives. Alternatively, users may skip bitsFromDecode entirely (though this is not recommended).
- Potential solution: If filtering with thresh_Quantile, consider tuning quantileFalse and quantileTrue. E.g. if labels is a vector where a pixel being a blank is TRUE, and the metric is a vector of probability of being a blank (e.g. returned from scoreClassifier), decreasing quantileTrue from the default of 0.5 to 0.25 increases the tolerance for keeping in blanks. [The direction to move quantileFalse and quantileTrue depends on the metric in question, and how the labels are coded.]
- Potential solution: Reduce the number filtering steps. This is perhaps best done when several steps may be redundant given later filtering: e.g. filtering using DFF0 metrics may be redundant if there is a later scoreClassifier step that uses DFF0 as a covariate.
- Potential solution: Increase the maxInterSpotDistancePixels during spatialClusterLeiden so that more distal spots are considered per cluster. [In general, this has a larger impact than filtering using HNN distances – which similarly removes loner spots (though both can be tweaked).] Alternatively, users could also reduce the minimum number of pixels required per cluster (default = 3).

Too few cellular spots

See above recommendations for “Too few spots”.
Too few cell masks returned during cell segmentation. This can be evaluated from CellMetrics plots, and the percent of counts deemed cellular in FPKMCorrelation. Users should also plot cell masks for a chosen FOV.
- Potential solution: For cellpose, retrain models or relax cell segmentation parameters.
- Potential solution: For stardist – which returns masks faster than cellpose – input variously brightened images and consolidate cell masks using consolidateCellSegMasks. Cell segmentation parameters can also be relaxed for stardist, but brightness seems to be a larger contributor to detection rate.
Cells too small. In this scenario, we expect there to be low percent of counts deemed cellular in FPKMCorrelation, despite there being sufficient numbers of cells per unit area.
- Potential solution: Retrain cellpose models to return larger masks. Run cellpose with different diameter parameters, and consolidate with consolidateCellSegMasks.
- Potential solution: Dilate existing masks prior to saving using imager::dilate_square(cellMask, X). [X should be at least 3 for isotropic dilation by 1 pixel.] This works for both cell segmentation models.
- Potential solution: Instead of dilating cells, non-cellular spots may be assigned to cell centroids by proximity. Functionally similar to dilating cell masks.

Weak correlation to FPKM / high blank %

Suggests high false positive rate. Do the opposite of recommendations for “Too few spots”: i.e. increase strictness of filtering criteria. If possible, users may attempt manual decoding on a small region of one or two FOVs, and ensure that their pipeline returns similar results: make use of the troubleshoot plots returned by filterDF.
Bit-specific differences in detection rate. This can be evaluated from BitwiseErrorRates and BitDetectionRate plots after synthesis. [This may also cause too few spots to be returned – but in general, there tends to be variation across genes: where some genes are too few, and some too abundant.]
- Potential solution: Ensure that getAnchorParams is returning reasonable intensity ceilings and floors for each bit. Users should not use empty FOVs as anchor FOVs.
- Potential solution: Sometimes, bit-specific differences may be expected if there is uneven representation of counts across bits (e.g. when abundant genes have an ON first bit, while rare genes have OFF). [This is best avoided during data acquisition: by ensuring even bit representation when designing the probe library.] Users may need to have custom image processing per bit (as opposed to looping a consistent function across all bits).

Issues with image handling

java.lang.OutOfMemoryError. This will be likely flagged when reading images during spotcalling, but may crop up during cell segmentation. This is an error that accompanies the RBioFormats package; loading .dax files uses the inbuilt readBin function in R and should thus not return this error.
- Potential solution: Increasing memory prior to running your scripts with options(java.parameters = "-Xmx8g").
- Potential solution: Unloading the RBioFormats with unloadNamespace(RBioFormats) every so often, and avoid attaching or loading the package (e.g. with library(RBioFormats)).
- Potential solution: Ensure images are loaded one by one (done by specifying series parameter). Also ideally load images one channel or slice at a time (done by including a subset = list() parameter). [Details can be found in the Chapter on spotcalling, and the documentation for RBioFormats::read.image() and masmr::readImage.] In the worst case, images can be loaded into memory as tiles (by subsetting on x and y) for better memory management.
Images look like noise when loaded. We expect this to be more likely for .dax images and may be due to incorrect image size input, or incorrect endian-ness (default: little). Alternatively, user has attempted to load a 3D image stack as 2D image – which will cause problems regardless of image file format.
- Potential solution: Hard code image sizes by specifying nrows and ncols in readImage/readImageList. Similarly, set endian to big if needed (depends on instrument, and should be recorded in image metadata files).
- Potential solution: A less likely alternative reason is that the wrong number of bytes has been assigned to each element in the .dax file. This is default 2 (for 16 bit integers); but should be reduced to 1 if data was saved as an integer value between 0 and 255 (i.e. 8 bit integer). [Most microscopes no longer use 8 bit integers.]
Wanting to work with individual Z slices. For various reasons (e.g. Z slice distances are large, desiring high resolution spot localisation etc), users may wish to process individual Z slices rather than relying on the default maximum intensity projection (MIP).
- Potential solution: Z slices can be loaded one at a time by specifying the zIndex in readImage/readImageList. getAnchorParams – which aims to find reasonable thresholds across multiple FOVs – should have its imageFunctions modified to avoid MIP as a pre-processing step, and instead have a specific Z slice loaded (example code provided below). registerImages has a chosenZslice parameter that allows users to specify a specific Z slice to perform registration against (that said, MIP should still be optimal here because position shifts affect the whole Z stack, and not just a single Z slice). Downstream functions (e.g. getImageMetrics, consolidateImageMetrics, and functions that work on spotcalldf rather than images) should not be affected and can remain unaltered.

Below is an example of how to create a function that returns the brightness floor for a single Z slice, for input into imageFunctions of getAnchorParams:

## This function is looped across FOVs
## Its input is an image / image stack called im
## The ellipsis is used in R to allow your function to take in more unnamed parameters
imsBrightnessMin_OneZSlice <- function( im, userChosenZSlice = 1, ... ){
  
  ## Check that image is indeed a stack
  if( length(dim(im)) > 2 ){
    ## If yes, subset the chosen zSlice
    im <- im[,,userChosenZSlice]
  }
  
  ## Make use of the default way of getting minimum brightness
  ## Note that parameters specified by ellipsis are fed forwards into this function
  result <- imsBrightnessMin(im, ...)
  return(result)
}

## Do the same for max brightness, and input your new functions into getAnchorParams
zi = 1 # The first z Slice = 1, second = 2 etc
getAnchorParams(
  out_dir = paste0(params$out_dir, 'ZSlice', zi, '/') #Save results in its own folder
  imageFunctions = list(
    'brightness_min' = imsBrightnessMin_OneZSlice,
    'brightness_max' = imsBrightnessMax_OneZSlice
  ),
  userChosenZSlice = zi #Parameters for your image function can be fed from getAnchorParams itself
)