Chapter 8 Troubleshooting
The masmr
package is designed to enable users to build their own custom pipelines. Users are encouraged to play around with an FOV or two, to determine optimal parameter choices, before looping through their entire data set.
Below, we provide some suggestions for users seeking to improve their pipelines. This has been organised by potential problems, possible causes (and how to identify them), and potential solutions.
Too few spots
- Poor choice of image processing. This is best identified during the spotcalling script itself, rather than the synthesis step at the end. Users are encouraged to plot their processed images using
plot(imager::as.cimg( imageMatrix ))
. [Theas.cimg
offers considerable speed up.] Users may attempt manual decoding on a small region of one or two FOVs (i.e. look at images and determine the sequence of pixel ON/OFF), and ensure that their pipeline returns results that agree with their manual decoding. This pipeline – optimised on a subset of FOVs – can then be looped through all FOVs to apply consistent spotcalling standards.- Potential solution: Changing the function used to return
MASK
ingetImageMetrics
. Larger masks means more pixels are brought forwards for subsequent decoding. - Potential solution: Changing the function used to return
DECODE
ingetImageMetrics
. As far as possible, attempt to return a bimodal distribution where there is clear separation of OFF and ON bits by intensity.
- Potential solution: Changing the function used to return
- Filtering criteria are too strict. This is also best identified during the spotcalling script itself. Users are encouraged to use the
filterDF
function to keep track of how spot profiles per FOV are changing with each filter step.- Potential solution: Do not add more blanks during
prepareCodebook
– i.e. setexhaustiveBlanks = F
– or increase strictness for criteria in calling blanks withscoreBlanks
. This is because many subsequent steps rely on blank detection: more blanks likely increases strictness of filtering pixels.
- Potential solution: During
bitsFromDecode
, swapping from the defaultf1
tofb
allows users to upweight / downweight recall with respect to precision usingfBeta
. E.g.fBeta
>1 upweights recall and minimises false negatives. Alternatively, users may skipbitsFromDecode
entirely (though this is not recommended). - Potential solution: If filtering with
thresh_Quantile
, consider tuningquantileFalse
andquantileTrue
. E.g. if labels is a vector where a pixel being a blank isTRUE
, and the metric is a vector of probability of being a blank (e.g. returned fromscoreClassifier
), decreasingquantileTrue
from the default of 0.5 to 0.25 increases the tolerance for keeping in blanks. [The direction to movequantileFalse
andquantileTrue
depends on the metric in question, and how the labels are coded.] - Potential solution: Reduce the number filtering steps. This is perhaps best done when several steps may be redundant given later filtering: e.g. filtering using
DFF0
metrics may be redundant if there is a laterscoreClassifier
step that usesDFF0
as a covariate. - Potential solution: Increase the
maxInterSpotDistancePixels
duringspatialClusterLeiden
so that more distal spots are considered per cluster. [In general, this has a larger impact than filtering using HNN distances – which similarly removes loner spots (though both can be tweaked).] Alternatively, users could also reduce the minimum number of pixels required per cluster (default = 3).
- Potential solution: Do not add more blanks during
Too few cellular spots
See above recommendations for “Too few spots”.
Too few cell masks returned during cell segmentation. This can be evaluated from
CellMetrics
plots, and the percent of counts deemed cellular inFPKMCorrelation
. Users should also plot cell masks for a chosen FOV.- Potential solution: For
cellpose
, retrain models or relax cell segmentation parameters. - Potential solution: For
stardist
– which returns masks faster thancellpose
– input variously brightened images and consolidate cell masks usingconsolidateCellSegMasks
. Cell segmentation parameters can also be relaxed forstardist
, but brightness seems to be a larger contributor to detection rate.
- Potential solution: For
Cells too small. In this scenario, we expect there to be low percent of counts deemed cellular in
FPKMCorrelation
, despite there being sufficient numbers of cells per unit area.- Potential solution: Retrain
cellpose
models to return larger masks. Runcellpose
with different diameter parameters, and consolidate withconsolidateCellSegMasks
.
- Potential solution: Dilate existing masks prior to saving using
imager::dilate_square(cellMask, X)
. [X
should be at least 3 for isotropic dilation by 1 pixel.] This works for both cell segmentation models. - Potential solution: Instead of dilating cells, non-cellular spots may be assigned to cell centroids by proximity. Functionally similar to dilating cell masks.
- Potential solution: Retrain
Weak correlation to FPKM / high blank %
Suggests high false positive rate. Do the opposite of recommendations for “Too few spots”: i.e. increase strictness of filtering criteria. If possible, users may attempt manual decoding on a small region of one or two FOVs, and ensure that their pipeline returns similar results: make use of the troubleshoot plots returned by
filterDF
.Bit-specific differences in detection rate. This can be evaluated from
BitwiseErrorRates
andBitDetectionRate
plots after synthesis. [This may also cause too few spots to be returned – but in general, there tends to be variation across genes: where some genes are too few, and some too abundant.]- Potential solution: Ensure that
getAnchorParams
is returning reasonable intensity ceilings and floors for each bit. Users should not use empty FOVs as anchor FOVs. - Potential solution: Sometimes, bit-specific differences may be expected if there is uneven representation of counts across bits (e.g. when abundant genes have an ON first bit, while rare genes have OFF). [This is best avoided during data acquisition: by ensuring even bit representation when designing the probe library.] Users may need to have custom image processing per bit (as opposed to looping a consistent function across all bits).
- Potential solution: Ensure that
Issues with image handling
java.lang.OutOfMemoryError
. This will be likely flagged when reading images during spotcalling, but may crop up during cell segmentation. This is an error that accompanies theRBioFormats
package; loading.dax
files uses the inbuiltreadBin
function in R and should thus not return this error.- Potential solution: Increasing memory prior to running your scripts with
options(java.parameters = "-Xmx8g")
. - Potential solution: Unloading the
RBioFormats
withunloadNamespace(RBioFormats)
every so often, and avoid attaching or loading the package (e.g. withlibrary(RBioFormats)
). - Potential solution: Ensure images are loaded one by one (done by specifying
series
parameter). Also ideally load images one channel or slice at a time (done by including asubset = list()
parameter). [Details can be found in the Chapter on spotcalling, and the documentation forRBioFormats::read.image()
andmasmr::readImage
.] In the worst case, images can be loaded into memory as tiles (by subsetting onx
andy
) for better memory management.
- Potential solution: Increasing memory prior to running your scripts with
- Images look like noise when loaded. We expect this to be more likely for
.dax
images and may be due to incorrect image size input, or incorrect endian-ness (default:little
). Alternatively, user has attempted to load a 3D image stack as 2D image – which will cause problems regardless of image file format.- Potential solution: Hard code image sizes by specifying
nrows
andncols
inreadImage
/readImageList
. Similarly, setendian
tobig
if needed (depends on instrument, and should be recorded in image metadata files). - Potential solution: A less likely alternative reason is that the wrong number of bytes has been assigned to each element in the
.dax
file. This is default2
(for 16 bit integers); but should be reduced to1
if data was saved as an integer value between 0 and 255 (i.e. 8 bit integer). [Most microscopes no longer use 8 bit integers.]
- Potential solution: Hard code image sizes by specifying
- Wanting to work with individual Z slices. For various reasons (e.g. Z slice distances are large, desiring high resolution spot localisation etc), users may wish to process individual Z slices rather than relying on the default maximum intensity projection (MIP).
- Potential solution: Z slices can be loaded one at a time by specifying the
zIndex
inreadImage
/readImageList
.getAnchorParams
– which aims to find reasonable thresholds across multiple FOVs – should have itsimageFunctions
modified to avoid MIP as a pre-processing step, and instead have a specific Z slice loaded (example code provided below).registerImages
has achosenZslice
parameter that allows users to specify a specific Z slice to perform registration against (that said, MIP should still be optimal here because position shifts affect the whole Z stack, and not just a single Z slice). Downstream functions (e.g.getImageMetrics
,consolidateImageMetrics
, and functions that work onspotcalldf
rather than images) should not be affected and can remain unaltered.
- Potential solution: Z slices can be loaded one at a time by specifying the
Below is an example of how to create a function that returns the brightness floor for a single Z slice, for input into imageFunctions
of getAnchorParams
:
## This function is looped across FOVs
## Its input is an image / image stack called im
## The ellipsis is used in R to allow your function to take in more unnamed parameters
<- function( im, userChosenZSlice = 1, ... ){
imsBrightnessMin_OneZSlice
## Check that image is indeed a stack
if( length(dim(im)) > 2 ){
## If yes, subset the chosen zSlice
<- im[,,userChosenZSlice]
im
}
## Make use of the default way of getting minimum brightness
## Note that parameters specified by ellipsis are fed forwards into this function
<- imsBrightnessMin(im, ...)
result return(result)
}
## Do the same for max brightness, and input your new functions into getAnchorParams
= 1 # The first z Slice = 1, second = 2 etc
zi getAnchorParams(
out_dir = paste0(params$out_dir, 'ZSlice', zi, '/') #Save results in its own folder
imageFunctions = list(
'brightness_min' = imsBrightnessMin_OneZSlice,
'brightness_max' = imsBrightnessMax_OneZSlice
),userChosenZSlice = zi #Parameters for your image function can be fed from getAnchorParams itself
)