ylliX - Online Advertising Network
How to build GenAI mock server?

Tidyverse’s group_by() does not work as expected


For a project that involves signal detection theory, I want to calculate discernment scores for different combinations of Hit Rates (HR) and False Alarm Rates (FAR).

I use the group_by() and summarise() function to check how often each discernment value occurs in my data.

However, the summarized data frame contains several rows with the same value for discernment. Maybe I’m doing something obvious wrong, I try to provide a reproducible example below.

# minimally reproducible example
library(tidyverse)

# Generate all possible combinations of HR and FAR
HR_values <- seq(0, 1, by = 0.01)  # From 0 to 1 in increments of 0.01
FAR_values <- seq(0, 1, by = 0.01)

# Create a grid of all combinations of HR and FAR
combinations <- expand.grid(HR = HR_values, FAR = FAR_values)

# Calculate discernment
combinations <- combinations %>% 
  mutate(discernment = HR - FAR)

# check how often each discernment value occurs
occurences <- combinations %>%
  group_by(discernment) %>%
  summarise(n_occurrences = n()) %>% 
  arrange(discernment)

# weirdly, several supposedly unique discernment values appear in different rows
# e.g. we'd expect only one line for the value '-0.95', but get two lines
occurences %>% 
  slice(1:7) 

# but if we call 
occurences %>% 
  filter(discernment == -0.95) 
# we only get one of the two lines returned

# this suggests that they might in fact not the same value. 

# if we add a factor version, we can see that it doesn't change anything for the '-0.95' 
occurences <- occurences %>% 
  mutate(discernment_as_factor = as.factor(discernment))

occurences %>% 
  slice(6:7) 

# but it does change for other values, e.g. '-0.01' turns out to be e.g. '-0.00999999999999998' when converted to factor
occurences %>% 
  slice(310:317) 

I use:
R version 4.4.1 (2024-06-14)
Platform: aarch64-apple-darwin20
Running under: macOS 15.0

and
tidyverse_2.0.0



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *