Spatial Association Detector(SPADE)

Load data and package

library(sf)
library(tidyverse)
library(magrittr)
library(gdverse)

ushi = read_sf(system.file('extdata/USHI.gpkg',package = 'gdverse'))
ushi
## Simple feature collection with 1037 features and 19 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 295713 ymin: 3784916 xmax: 324202.2 ymax: 3804669
## Projected CRS: WGS 84 / UTM zone 49N
## # A tibble: 1,037 × 20
##      LPI   LSI   GAR    DAR      WAR   TER  NDVI   NDWI   DTH   DTP    DTR   DTW    BH    BD
##    <dbl> <dbl> <dbl>  <dbl>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
##  1  57.2 10.5  0.630 0.273  0.00219  0.979 0.497 -0.356  5.66 1.08  0.934   1.16 3.16  0.635
##  2  85.7  2.35 0.905 0.0952 0        0.977 0.514 -0.288  5.66 1.41  0.0807  1.57 0.310 0.139
##  3  41.2  7.85 0.436 0.449  0.000672 0.972 0.365 -0.296  4.38 1.88  0.516   2.37 3.99  0.671
##  4  39.1  5.40 0.534 0.377  0        0.966 0.395 -0.299  4.25 1.97  0.442   2.81 4.63  0.634
##  5  32.0  8.04 0.405 0.437  0        0.968 0.354 -0.302  3.65 1.34  0.615   2.55 4.07  0.642
##  6  58.3  7.91 0.332 0.615  0.000377 0.966 0.312 -0.271  3.69 0.700 0.592   2.25 7.91  0.777
##  7  98.4  3.34 0.984 0.0161 0        0.948 0.451 -0.248  5.39 1.36  0.124   3.64 0.216 0.111
##  8  44.6 17.6  0.416 0.502  0.000672 0.949 0.365 -0.295  5.11 1.20  0.926   2.50 6.49  0.747
##  9  35.9  9.81 0.464 0.467  0.00134  0.936 0.379 -0.309  7.10 2.01  1.03    2.75 5.49  0.692
## 10  44.3  9.49 0.403 0.504  0.000547 0.926 0.310 -0.278  8.51 2.68  0.658   3.20 3.27  0.601
## # ℹ 1,027 more rows
## # ℹ 6 more variables: RL <dbl>, RFD <dbl>, SVF <dbl>, FAR <dbl>, SUHI <dbl>,
## #   geom <POLYGON [m]>
Overview Map of the Study Area Regarding SUHII

Overview Map of the Study Area Regarding SUHII

This spatial polygon vector data suhii is the streets divided based on roads within the Ring expressway of Xi’an City, and the attribute data are the SUHII(Intensity of surface urban heat island effect) and its influence factors.

The meanings of each explanatory variable are as follows,and can be divided into three categories:

  1. Landscape pattern and Land cover characteristic indicators
  1. Distances from cold and heat sources indicators
  1. Urban built space form indicators

And the SUHI stands for surface urban heat island effect intensity.

Spatial pattern of SUHI

global spatial autocorrelation of SUHI

here I use geocomplexity to calculate the global Moran’s I:

set.seed(123456789)

gmi = geocomplexity::moran_test(ushi)
gmi
## ***                 global spatial autocorrelation test
Variable MoranI EI VarI zI pI
LPI 0.298685*** -0.0009653 0.0002938 17.48 9.972e-69
LSI 0.370536*** -0.0009653 0.0002938 21.67 1.853e-104
GAR 0.421646*** -0.0009653 0.0002938 24.65 1.648e-134
DAR 0.493366*** -0.0009653 0.0002938 28.84 3.526e-183
WAR 0.222443*** -0.0009653 0.0002938 13.03 3.955e-39
TER 0.969507*** -0.0009653 0.0002938 56.62 0
NDVI 0.335154*** -0.0009653 0.0002938 19.61 6.518e-86
NDWI 0.24996*** -0.0009653 0.0002938 14.64 7.965e-49
DTH 0.975702*** -0.0009653 0.0002938 56.98 0
DTP 0.763939*** -0.0009653 0.0002938 44.62 0
DTR 0.81955*** -0.0009653 0.0002938 47.87 0
DTW 0.826076*** -0.0009653 0.0002938 48.25 0
BH 0.419635*** -0.0009653 0.0002938 24.54 2.964e-133
BD 0.310015*** -0.0009653 0.0002938 18.14 7.414e-74
RL 0.29453*** -0.0009653 0.0002938 17.24 6.796e-67
RFD 0.296268*** -0.0009653 0.0002938 17.34 1.171e-67
SVF 0.439656*** -0.0009653 0.0002938 25.71 5.121e-146
FAR 0.413345*** -0.0009653 0.0002938 24.17 2.287e-129
SUHI 0.535956*** -0.0009653 0.0002938 31.32 1.128e-215

The global Moran’I Index of SUHI is 0.535956 and the P value is 1.128e-215,which shows that SUHI in the main urban area of Xi’an has significant positive spatial autocorrelation in the global scale.

local spatial autocorrelation of SUHI

We will use tidyrgeoda to run the LISA, more details see here

library(tidyrgeoda)

ushi %>%
  mutate(lisa = st_local_moran(.,'SUHI',
                               wt = st_contiguity_weights(.),
                               significance_cutoff = 0.05)) %>%
  select(lisa) %>%
  ggplot() +
  geom_sf(aes(fill = lisa),lwd = .1,color = 'grey') +
  scale_fill_lisa(name = 'SUHI-LISA') +
  theme_bw() +
  theme(
    axis.text = element_blank(),
    axis.ticks = element_blank(),
    axis.title = element_blank(),
    panel.grid = element_blank(),
    legend.title = element_text(size = 7.5),
    legend.text = element_text(size = 7.5),
    legend.key.size = unit(.5, 'cm')
  )

The global and local spatial autocorrelation shows that SUHI’s strong spatial dependence.

Spatial dependence was neglected in native geodetector, which leds to the development of SPADE(spatial association detector).

OPGD modeling

ushi_opgd = opgd(SUHI ~ ., data = st_drop_geometry(ushi),
                 discvar = names(select(st_drop_geometry(ushi),-SUHI)),
                 cores = 6)
ushi_opgd
##                 OPGD Model                  
## ***          Factor Detector            
## 
## | variable | Q-statistic |   P-value    |
## |:--------:|:-----------:|:------------:|
## |   DAR    | 0.43965943  | 9.280000e-10 |
## |   NDVI   | 0.40392817  | 4.020000e-10 |
## |   GAR    | 0.38280266  | 3.850000e-10 |
## |   RFD    | 0.27730957  | 1.540000e-10 |
## |    BD    | 0.26966147  | 5.330000e-10 |
## |    BH    | 0.25846868  | 2.610000e-10 |
## |   NDWI   | 0.21994544  | 8.280000e-10 |
## |   FAR    | 0.20832309  | 2.230000e-10 |
## |   SVF    | 0.15333436  | 6.410000e-10 |
## |    RL    | 0.13261626  | 4.350000e-10 |
## |   TER    | 0.12144556  | 3.750000e-10 |
## |   WAR    | 0.10718665  | 7.480000e-10 |
## |   DTP    | 0.09015746  | 6.150000e-10 |
## |   DTW    | 0.08565851  | 1.184870e-07 |
## |   DTH    | 0.08466484  | 1.470018e-03 |
## |   LSI    | 0.08447881  | 2.057000e-09 |
## |   DTR    | 0.07530100  | 2.973400e-08 |
## |   LPI    | 0.03320575  | 2.950997e-02 |

You can access the detailed q statistics by ushi_opgd$factor.

ushi_opgd$factor
## # A tibble: 18 × 3
##    variable `Q-statistic` `P-value`
##    <chr>            <dbl>     <dbl>
##  1 DAR             0.440   9.28e-10
##  2 NDVI            0.404   4.02e-10
##  3 GAR             0.383   3.85e-10
##  4 RFD             0.277   1.54e-10
##  5 BD              0.270   5.33e-10
##  6 BH              0.258   2.61e-10
##  7 NDWI            0.220   8.28e-10
##  8 FAR             0.208   2.23e-10
##  9 SVF             0.153   6.41e-10
## 10 RL              0.133   4.35e-10
## 11 TER             0.121   3.75e-10
## 12 WAR             0.107   7.48e-10
## 13 DTP             0.0902  6.15e-10
## 14 DTW             0.0857  1.18e- 7
## 15 DTH             0.0847  1.47e- 3
## 16 LSI             0.0845  2.06e- 9
## 17 DTR             0.0753  2.97e- 8
## 18 LPI             0.0332  2.95e- 2

SPADE modeling

SPADE explicitly considers the spatial variance by assigning the weight of the influence based on spatial distribution and also minimizes the influence of the number of levels on PD values by using the multilevel discretization and considering information loss due to discretization.

When response variable has a strong spatial dependence, maybe SPADE is a best choice.

The biggest difference between SPADE and native GD and OPGD in actual modeling is that SPADE requires a spatial weight matrix to calculate spatial variance.

In spade function, when you not provide a spatial weight matrix, it will use 1st order inverse distance weight by default, which can be created by inverse_distance_weight().

coords = ushi |>
  st_centroid() |>
  st_coordinates()

wt1 = inverse_distance_weight(coords[,1],coords[,2])

You can also use gravity model weight by assigning the power parameter in inverse_distance_weight() function.

wt2 = inverse_distance_weight(coords[,1],coords[,2],power = 2)

using spatial weight matrix from sdsfun

I have also developed the sdsfun package to facilitate the construction of spatial weight matrices, which requires an input of an sf object.

wt3 = sdsfun::spdep_contiguity_swm(ushi)

Or using a spatial weight matrix based on distance kernel functions.

wt4 = sdsfun::spdep_distance_swm(ushi,k = 3, kernel = 'gaussian')

In the following section we will execute SPADE model using spatial weight matrix wt3 which is constructed by queen contiguity.

The test of SPADE model significance in gdverse is achieved by randomization null hypothesis use a pseudo-p value, this calculation is very time-consuming. Default gdverse sets the permutations parameter to 0 and does not calculate the pseudo-p value. If you want to calculate the pseudo-p value, specify the permutations parameter to a number such as 99,999,9999, etc.

run SPADE

ushi_spade = spade(SUHI ~ ., data = ushi, wt = wt3, cores = 12)
ushi_spade
## ***         Spatial Association Detector         
## 
## | variable | Q-statistic |      P-value      |
## |:--------:|:-----------:|:-----------------:|
## |   DAR    | 0.51354520  | No Pseudo-P Value |
## |   NDVI   | 0.47435470  | No Pseudo-P Value |
## |   NDWI   | 0.43652650  | No Pseudo-P Value |
## |   GAR    | 0.42037121  | No Pseudo-P Value |
## |   RFD    | 0.35902741  | No Pseudo-P Value |
## |    BD    | 0.35397385  | No Pseudo-P Value |
## |    BH    | 0.32947163  | No Pseudo-P Value |
## |   TER    | 0.31556969  | No Pseudo-P Value |
## |   FAR    | 0.22828609  | No Pseudo-P Value |
## |   DTR    | 0.22245435  | No Pseudo-P Value |
## |   DTW    | 0.19009188  | No Pseudo-P Value |
## |   DTP    | 0.14935992  | No Pseudo-P Value |
## |   LSI    | 0.14875273  | No Pseudo-P Value |
## |    RL    | 0.13422740  | No Pseudo-P Value |
## |   SVF    | 0.13294874  | No Pseudo-P Value |
## |   LPI    | 0.13054984  | No Pseudo-P Value |
## |   DTH    | 0.08346346  | No Pseudo-P Value |
## |   WAR    |     NaN     | No Pseudo-P Value |
plot(ushi_spade,slicenum = 8)

You can also access the detailed q statistics by ushi_spade$factor

ushi_spade$factor
## # A tibble: 18 × 3
##    variable `Q-statistic` `P-value`        
##    <chr>            <dbl> <chr>            
##  1 DAR             0.514  No Pseudo-P Value
##  2 NDVI            0.474  No Pseudo-P Value
##  3 NDWI            0.437  No Pseudo-P Value
##  4 GAR             0.420  No Pseudo-P Value
##  5 RFD             0.359  No Pseudo-P Value
##  6 BD              0.354  No Pseudo-P Value
##  7 BH              0.329  No Pseudo-P Value
##  8 TER             0.316  No Pseudo-P Value
##  9 FAR             0.228  No Pseudo-P Value
## 10 DTR             0.222  No Pseudo-P Value
## 11 DTW             0.190  No Pseudo-P Value
## 12 DTP             0.149  No Pseudo-P Value
## 13 LSI             0.149  No Pseudo-P Value
## 14 RL              0.134  No Pseudo-P Value
## 15 SVF             0.133  No Pseudo-P Value
## 16 LPI             0.131  No Pseudo-P Value
## 17 DTH             0.0835 No Pseudo-P Value
## 18 WAR           NaN      No Pseudo-P Value

The result of WAR is NA and you can also see a big difference between the OPGD and SPADE model. The results of SPADE are more reliable in most cases.

We demonstrate the rationality of NA values calculated of WAR in the following section:

SPADE NA result explained

WAR = ushi$WAR
skimr::skim(WAR)
Data summary
Name WAR
Number of rows 1037
Number of columns 1
_______________________
Column type frequency:
numeric 1
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
data 0 1 0.01 0.04 0 0 0 0 0.73 ▇▁▁▁▁
ggplot(data = ushi) +
  geom_histogram(aes(WAR),
               color='white',
               fill='gray60') +
  scale_y_continuous(expand = c(0,0)) +
  theme_classic()

moments::skewness(WAR)
## [1] 11.6917
shapiro.test(WAR)
## 
##  Shapiro-Wilk normality test
## 
## data:  WAR
## W = 0.14693, p-value < 2.2e-16

From the histogram of WAR,its skewness and the result of shapiro.test, you will find that the WAR variable is heavily skewed, with a large number of zeros, which do not provide sufficient information for modeling SUHI.

In fact, the WAR variable represents the proportion of the water area of each block, and the WAR value is NA in the SPADE result probably because it can’t provide sufficient information for modeling SUHI.

Let’s look at the specific calculation process of the PSMD(power of spatial and multilevel discretization determinant) value corresponding to WAR:

3:22 %>%
  purrr::map_dbl(\(.k) st_unidisc(ushi$WAR,.k) %>%
                  factor_detector(ushi$WAR, .) %>%
                  {.[[1]]})
##  [1] 0.00000000 0.05795105 0.08069138 0.10177158 0.12117267 0.13904284 0.15732689 0.17377009
##  [9] 0.19092712 0.20432876 0.22190151 0.23780494 0.25057152 0.26498423 0.28119643 0.29570781
## [17] 0.30353784 0.32068449 0.33494156 0.34531717

3:22 %>%
  purrr::map_dbl(\(.k) st_unidisc(ushi$WAR,.k) %>%
                   psd_spade(ushi$WAR, ., wt3))
##  [1]  0.0000000 -0.4065480 -0.3942882 -0.4219084 -0.6231376 -0.7273617 -0.7462245 -0.5883158
##  [9] -0.4720283 -0.4902740 -0.3850580 -0.3185052 -0.3402303 -0.3119740 -0.3468160 -0.3244028
## [17] -0.3453713 -0.3221619 -0.2500152 -0.2302331

3:22 %>%
  purrr::map_dbl(\(.k) st_unidisc(ushi$WAR,.k) %>%
                   cpsd_spade(ushi$SUHI, ushi$WAR, ., wt3))
##  [1]         NaN -0.15912828 -0.21327657 -0.27536506 -0.09128105 -0.03375476 -0.15168436
##  [8] -0.18804177 -0.23465648 -0.29911716 -0.35586536 -0.41570044 -0.43172541 -0.41268215
## [15] -0.52127019 -0.57350020 -0.48780407 -0.34204498 -0.60269473 -1.11353340

In most cases, we use linear regression to explore linear relationships between variables and select appropriate variables for subsequent modeling through methods such as VIF. Here we examine the effect of removing the WAR variable on VIF and linear models:

lm.modelOne = lm(SUHI ~ ., data = st_drop_geometry(ushi))
summary(lm.modelOne)
## 
## Call:
## lm(formula = SUHI ~ ., data = st_drop_geometry(ushi))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1991 -0.6899 -0.0366  0.6837  3.7666 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  22.156854   5.003326   4.428 1.05e-05 ***
## LPI          -0.002280   0.002701  -0.844  0.39872    
## LSI          -0.004190   0.023320  -0.180  0.85746    
## GAR           6.869974   0.902865   7.609 6.27e-14 ***
## DAR           6.746594   0.829310   8.135 1.19e-15 ***
## WAR           5.283058   1.784052   2.961  0.00313 ** 
## TER          -1.388985   0.670337  -2.072  0.03851 *  
## NDVI        -15.268701   1.620501  -9.422  < 2e-16 ***
## NDWI        -15.408610   2.127737  -7.242 8.72e-13 ***
## DTH           0.015604   0.012255   1.273  0.20318    
## DTP           0.091178   0.098705   0.924  0.35584    
## DTR          -0.069387   0.064638  -1.073  0.28331    
## DTW          -0.195546   0.067418  -2.901  0.00381 ** 
## BH            0.197986   0.034505   5.738 1.26e-08 ***
## BD            6.151962   2.181999   2.819  0.00490 ** 
## RL            0.008593   0.228118   0.038  0.96996    
## RFD           0.117208   0.741697   0.158  0.87447    
## SVF         -22.397352   4.954590  -4.521 6.89e-06 ***
## FAR          -1.260657   0.112759 -11.180  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.06 on 1018 degrees of freedom
## Multiple R-squared:  0.6173, Adjusted R-squared:  0.6106 
## F-statistic: 91.23 on 18 and 1018 DF,  p-value: < 2.2e-16

car::vif(lm.modelOne) |>
  tibble::as_tibble_row() |>
  tidyr::pivot_longer(everything(),
                      names_to = 'variable',
                      values_to = 'VIF')
## # A tibble: 18 × 2
##    variable    VIF
##    <chr>     <dbl>
##  1 LPI        1.68
##  2 LSI        1.50
##  3 GAR       31.1 
##  4 DAR       28.5 
##  5 WAR        5.85
##  6 TER        1.18
##  7 NDVI      20.5 
##  8 NDWI      13.2 
##  9 DTH        1.15
## 10 DTP        2.06
## 11 DTR        1.15
## 12 DTW        2.00
## 13 BH        31.9 
## 14 BD       105.  
## 15 RL         2.96
## 16 RFD       89.4 
## 17 SVF        3.02
## 18 FAR       31.3
lm.modelTwo = lm(SUHI ~ ., data = st_drop_geometry(ushi) |> select(-WAR))
summary(lm.modelTwo)
## 
## Call:
## lm(formula = SUHI ~ ., data = select(st_drop_geometry(ushi), 
##     -WAR))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1752 -0.7118 -0.0227  0.6896  3.6784 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  21.011090   5.007323   4.196 2.95e-05 ***
## LPI          -0.000917   0.002671  -0.343 0.731475    
## LSI          -0.002172   0.023399  -0.093 0.926046    
## GAR           5.283103   0.729402   7.243 8.64e-13 ***
## DAR           5.300795   0.672909   7.877 8.53e-15 ***
## TER          -1.174510   0.668949  -1.756 0.079431 .  
## NDVI        -13.026040   1.438097  -9.058  < 2e-16 ***
## NDWI        -10.693806   1.416860  -7.548 9.82e-14 ***
## DTH           0.011362   0.012217   0.930 0.352593    
## DTP           0.115158   0.098747   1.166 0.243809    
## DTR          -0.058443   0.064778  -0.902 0.367161    
## DTW          -0.226617   0.066850  -3.390 0.000726 ***
## BH            0.192503   0.034587   5.566 3.34e-08 ***
## BD            5.951972   2.189252   2.719 0.006665 ** 
## RL            0.061282   0.228288   0.268 0.788413    
## RFD           0.100502   0.744497   0.135 0.892644    
## SVF         -19.414910   4.869603  -3.987 7.17e-05 ***
## FAR          -1.227348   0.112624 -10.898  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.064 on 1019 degrees of freedom
## Multiple R-squared:  0.614,  Adjusted R-squared:  0.6076 
## F-statistic: 95.36 on 17 and 1019 DF,  p-value: < 2.2e-16

car::vif(lm.modelTwo) |>
  tibble::as_tibble_row() |>
  tidyr::pivot_longer(everything(),
                      names_to = 'variable',
                      values_to = 'VIF')
## # A tibble: 17 × 2
##    variable    VIF
##    <chr>     <dbl>
##  1 LPI        1.63
##  2 LSI        1.50
##  3 GAR       20.2 
##  4 DAR       18.6 
##  5 TER        1.16
##  6 NDVI      16.0 
##  7 NDWI       5.83
##  8 DTH        1.13
##  9 DTP        2.05
## 10 DTR        1.15
## 11 DTW        1.96
## 12 BH        31.8 
## 13 BD       104.  
## 14 RL         2.94
## 15 RFD       89.4 
## 16 SVF        2.90
## 17 FAR       31.0

You can see that the model does not change much before and after removing the WAR variable.In other words, WAR does not do much to model SUHI.

This suggests another use of SPADE model for selecting modeling variables.