Compact letter displays from the output of any statistical package

06 Jul, 2023

In my field, we are often interested in comparing central tendencies (e.g., means) between multiple discrete groups. Statistical tests like Tukey’s test can be useful, and assigning letters to statistical groups are commonly done according to the algorithm of Piepho (2004, Journal of Computational and Graphical Statistics). There are many options for running statistical tests, including base R functions, but when using tidyverse packages I prefer rstatix. However, there is no package that currently generates compact letter displays (CLD) from the outputs of the rstatix function. It turns out that it is not that difficult, if we use the functions already available for getting CLDs from base R functions (multcompView::multcompLetters()). Below code works for rstatix, but theoretically this function can work with any matrix of p-values (e.g., from phytools::phylANOVA), as long as we put it in a named list format.

We load dplyr.

library(dplyr)

    ##
    ## Attaching package: 'dplyr'

    ## The following objects are masked from 'package:stats':
    ##
    ##     filter, lag

    ## The following objects are masked from 'package:base':
    ##
    ##     intersect, setdiff, setequal, union

Quick look at the dataset.

head(iris)

    ##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    ## 1          5.1         3.5          1.4         0.2  setosa
    ## 2          4.9         3.0          1.4         0.2  setosa
    ## 3          4.7         3.2          1.3         0.2  setosa
    ## 4          4.6         3.1          1.5         0.2  setosa
    ## 5          5.0         3.6          1.4         0.2  setosa
    ## 6          5.4         3.9          1.7         0.4  setosa

Use rstatix to run a statistical test.

test <- iris %>% rstatix::tukey_hsd(Sepal.Length ~ Species)
test

    ## # A tibble: 3 × 9
    ##   term    group1     group2     null.value estimate conf.low conf.high    p.adj
    ## * <chr>   <chr>      <chr>           <dbl>    <dbl>    <dbl>     <dbl>    <dbl>
    ## 1 Species setosa     versicolor          0    0.93     0.686     1.17  3.39e-14
    ## 2 Species setosa     virginica           0    1.58     1.34      1.83  3   e-15
    ## 3 Species versicolor virginica           0    0.652    0.408     0.896 8.29e- 9
    ## # ℹ 1 more variable: p.adj.signif <chr>

You need a named list with hyphenated groups as names and p-values as values.

p_values <- test %>% dplyr::mutate(groups = stringr::str_c(group1, group2, sep = "-")) %>% # concatenate group1 and group2 with hyphen
  dplyr::select(groups, p.adj) %>% # get just the groups and p values
  tibble::deframe() # turn into named list
p_values

    ##    setosa-versicolor     setosa-virginica versicolor-virginica
    ##             3.39e-14             3.00e-15             8.29e-09

Get letters, then clean up the dataframe.

letters <- multcompView::multcompLetters(p_values)$Letters
letters

    ##     setosa versicolor  virginica
    ##        "a"        "b"        "c"

letters <- letters %>% # outputs a named list tibble::as_tibble(rownames = "group") %>% # turn into tibble, keeping rownames
  tibble::as_tibble(rownames = "group") %>%
  dplyr::rename(letter = value) # give a more meaningful name
letters

    ## # A tibble: 3 × 2
    ##   group      letter
    ##   <chr>      <chr>
    ## 1 setosa     a     
    ## 2 versicolor b     
    ## 3 virginica  c

#Code #Science