Introduction to spatialTIME

Create Multiplex ImmunoFlourescent (mif) object

spatialTIME functions use a custom mif object which can be created using create_mif.


x <- create_mif(clinical_data = example_clinical,
                sample_data = example_summary,
                spatial_list = example_spatial,
                patient_id = "deidentified_id", sample_id = "deidentified_sample",
                clean_columns = TRUE)
x
#> 229 patients spanning 229 samples and 5 spatial data frames were found

Four slots are available in an mif object. spatial is a list of data frames with each data frame containing the spatial information for a single sample. If this list is unnamed, then names with be assigned based off the image_tag value. clinical is a data frame that contains relevant patient level data - at a minimum this data frame should contain patient level IDs (one row per patient). sample is a data frame containing any sample level values and contain sample names and patient IDs at a minimum (one row per sample). An empty list for derived is created as a place to store results from the spatialTIME package within the object (this it the temporary usage - need to implement R6 classes to automatically update the argument supplied to the function).

Plotting Cores

An individual plot for each core (each sample) is created. Plots can be assigned to an R object, such as within the empty derived slot and printed to a PDF if a file name is provided.

mnames <- c("foxp3_opal_620_positive", "cd3_opal_570_positive", "cd8_opal_520_positive",
            "pd1_opal_650_positive", "pdl1_opal_540_positive")

mlabels <- c("FOXP3", "CD3", "CD8", "PD1", "PDL1")

x[["derived"]][["spatial_plots"]] <- plot_immunoflo(x, plot_title = "deidentified_sample", 
                                                    mnames = mnames, mlabels = mlabels, 
                                                    cell_type = "classifier_label")

x[["derived"]][["spatial_plots"]][[4]]

Estimating the degree of spatial clustering

Ripley’s \(K\) measures the average number of neighboring cells across each cell, that is the average number of cells within a specified radius of a cell. Ripley’s \(K\) is computed as follows:

\[K(r) = \frac{1}{n}\sum_{i=1}^{n}w_{ij}{\bf 1}_{(d(x_i,x_j)\le r)}\],

where \(r\) is the specified radius, \(d(x_i,x_j)\) is the distance between the \(i^{th}\) and \(j^{th}\) cell, \({\bf 1}_A\) is indicator function of event \(A\), and \(w_{ij}\) is the weights that are assigned for border corrections. The expected value of \(K(r)\) is \(\pi r^2\), thus \(K\) is expected to grow as a quadratic function of \(r\).

There are several edge correction, our studies has included a small number of cells and we recommend using the isotropic or transnational edge correction, as opposed to the ‘border’ edge correction. The main goal of the edge correction is to account for the fact that there are unobserved points outside of the region, and the assumption is that the location of these cells has the same distribution as the study region.

There are several methods for normalization with help improve the interpretation of Ripley’s \(K\). We include Besag’s L which is computed by \(L(r) = \sqrt{\frac{K(r)}{\pi}}\), and the expect value of \(r\). Hence, \(L\) is expected to grow proportionally with \(r\). The expected value \(K\) and \(L\) both hinge on the assumption that the underlying point process follows the so called complete spatial randomness assumption which asserts that the locations of cells can neither form a regular patterns nor clusters.

Damage can occur to TMAs, due to how they are collect. This damage can lead to rips and tears in the TMA which results in regions where it appears that cells cannot be locations, which is not actually the case. Thus the theoretical estimate for CSR may not be accurate, to address this the cell locations can be permuted and the permutation distribution of \(K\) or \(L\) is a TMA specific measure of CSR. The ripleys_k function reports a permuted and theoretical estimate of CSR, the observed value for \(K\) (kestimation = TRUE) or \(L\) (kestimation = FALSE), and the full permutation distribution of \(K\) or \(L\) if keep_perm_dis = TRUE.

Currently, the number of permutations is 1, but this should be increased to at least 100.


x[["derived"]][["ripleys"]] <- ripleys_k(mif = x, id = "deidentified_sample", 
                                         mnames = mnames[1], num_permutations = 1,
                                         edge_correction = 'translation', r = seq(0,100,10),
                                         kestimation = TRUE, keep_perm_dis = FALSE, 
                                         mlabels = mlabels[1])

x[["derived"]][["spatial_plotsfoxp3"]] <- plot_immunoflo(x, plot_title = "deidentified_sample", 
                                                    mnames = mnames[1], mlabels = mlabels[1], 
                                                    cell_type = "classifier_label", 
                                                    mcolors = '#522D80')


#For additional plots, uncomment out this section
#library(rlist)
#library(tidyr)
# library(ggplot2)
# library(dplyr)
# library(grid)

# plot_data = x[["derived"]][["ripleys"]] %>%
#   select(deidentified_sample, marker, r_value,
#          `Observed K`, `Theoretical CSR`, `Permuted CSR`) %>%
#   pivot_longer(cols = 4:6, names_to = 'Estimate', values_to = 'K')
# 
# plotlist = lapply(setNames(as.character(unique(x[["derived"]][["ripleys"]]$deidentified_sample)),
#                            as.character(unique(x[["derived"]][["ripleys"]]$deidentified_sample))),
#                   function(a){
#                     left = x[["derived"]][["spatial_plotsfoxp3"]][[a]] +
#                       theme(legend.position = 'none',
#                             title = element_blank(),
#                             axis.text = element_blank(),
#                             axis.ticks = element_blank(),
#                             aspect.ratio = 1)
# 
#                     right = ggplot(data = plot_data %>% filter(deidentified_sample == a),
#                                    aes(x = r_value, K, color = Estimate)) +
#                       geom_line() +
#                       theme_bw()
#                     return(ggarrange(plotlist = list(left, right), align = 'h'))
# })
# 
# 
# 
# ggarrange(plotlist = plotlist, ncol = 1, common.legend = TRUE, legend = 'bottom',
#           labels = names(plotlist), hjust = 0.0)

Estimating the colocalization of cell types

fda

#Define the makerers on interest
mnames_pairs <- list(list("cd8_opal_520_positive", "foxp3_opal_620_positive"))

mlabels_pairs <- list(list("CD8+", "FOXP3"))

x[["derived"]][["bi-ripleys"]] <- bi_ripleys_k(mif = x, id = "deidentified_sample", 
                                               mnames = mnames_pairs, num_permutations = 1,
                                               r_range = seq(0, 50, 5), edge_correction = "translation", 
                                               kestimation = TRUE, keep_perm_dis = FALSE,
                                               mlabels = mlabels_pairs)


x[["derived"]][["spatial_bivariate"]] <- plot_immunoflo(x, plot_title = "deidentified_sample", 
                                                    mnames = unlist(mnames_pairs[[1]]), mlabels = unlist(mlabels_pairs[[1]]), 
                                                    cell_type = "classifier_label", 
                                                    mcolors = c('#522D80','#F56600'))

#For additonal plots uncomment this section

#library(rlist)
#library(tidyr)
# library(ggplot2)
# library(dplyr)
# library(grid)
# plot_data = x[["derived"]][["bi-ripleys"]] %>% 
#   select(deidentified_sample, anchor_marker, comparison_marker, r_value,
#          `Observed K`, `Theoretical CSR`, `Permuted CSR`) %>%
#   pivot_longer(cols = 5:7, names_to = 'Estimate', values_to = 'K')
# 
# plotlist = lapply(setNames(as.character(unique(x[["derived"]][["bi-ripleys"]]$deidentified_sample)),
#                            as.character(unique(x[["derived"]][["bi-ripleys"]]$deidentified_sample))),
#                   function(a){
#                     left = x[["derived"]][["spatial_bivariate"]][[a]] +
#                       theme(legend.position = 'none',
#                             title = element_blank(),
#                             axis.text = element_blank(),
#                             axis.ticks = element_blank(),
#                             aspect.ratio = 1)
# 
#                     right = ggplot(data = plot_data %>% filter(deidentified_sample == a),
#                                    aes(x = r_value, K, color = Estimate)) +
#                       geom_line() +
#                       theme_bw()
#                     return(ggarrange(plotlist = list(left, right),align = 'h'))
# })
# 
# 
# 
# ggarrange(plotlist = plotlist, ncol = 1, common.legend = TRUE, legend = 'bottom',
#           labels = names(plotlist), hjust = 0)

Comparing the Theoretical and Permuted estimates of CSR


x[["derived"]][["ripleys"]] <- ripleys_k(mif = x, id = "deidentified_sample", 
                                         mnames = mnames[1], num_permutations = 1,
                                         edge_correction = 'translation', r = seq(0,100,10),
                                         kestimation = TRUE, keep_perm_dis = TRUE, 
                                         mlabels = mlabels[1])

#library(rlist)
#library(tidyr)
# library(ggplot2)
# library(dplyr)
# library(grid)

# hist_data = x[["derived"]][["ripleys"]] %>% 
#   select(deidentified_sample, marker, r_value,`Permuted CSR`) %>%
#   group_by(deidentified_sample, r_value) %>% 
#   summarize(`Permuted Mean` = mean(`Permuted CSR`)) %>%
#   mutate(`Theoretical CSR` = pi*r_value^2) %>%
#   right_join(x[["derived"]][["ripleys"]]) %>%
#   pivot_longer(cols = c(`Observed K`, `Permuted Mean`, `Theoretical CSR`),
#                values_to = 'vert_line', names_to = 'Estimate') %>%
#   filter(r_value >= 50) %>%
#   mutate(r_value = factor(r_value))
#   
# 
# ggplot(data = hist_data, aes(x = `Permuted CSR`, y=..count../3)) + 
#   geom_histogram(col = 'white') + facet_grid(r_value~deidentified_sample, 
#                                              scale = 'fixed') + 
#   geom_vline(aes(xintercept = vert_line, color = Estimate)) +
#   ylab('Count')