Coin example: is it fair or not?
knitr::include_graphics(path = 'img/coin.jpeg')
Let’s conduct an experiment => throw \(100\) times and count #heads. Let’s say we observed an extreme \(64\) heads. Do we have enough evidence to say coin if unfair?
4 things to remember when calculating p-values:
\(Prob(Obs|H_0) = Prob(64 \text{ heads}|H_0:\text{coin is fair})\)
Let’s do a simulation:
set.seed(42)
repeats = 10000
res = sapply(1:repeats, function(num) {
stat = sample(x = c('heads', 'tails'), replace = T, size = 100)
sum(stat == 'heads')
})
Draw the distribution of #heads:
main1 = '10000 x throw coin 100 times experiment! (coin is FAIR)'
hist(res, main = main1, xlab = 'Number of heads')
# plot(density(res), main = main1) # smoothed out
Where does the observation (\(64\) heads) ‘sit’ in this distribution?
hist(res, xlim = c(20,80), main = 'Null distr', xlab = 'Number of heads')
abline(v = 64, col = 'red')
The empirical p-value is the proportion of experiments that yielded equal or more than the observed #heads (\(64\)):
p_val = sum(sort(res) >= 64)/length(res) # length(res) = 10000
p_val
[1] 0.0029
Since the above p-value is lower than a predefined threshold (e.g. \(0.05\)) I can deduce that the coin is unfair. More extreme observations would yield smaller p-values, e.g. for #heads = \(80\):
sum(sort(res) > 80)/length(res) # length(res) = 10000
[1] 0
For this simple example, we could use statistical theory to calculate exact p-value (fair coin toss follows binomial distribution):
\[P(\text{#heads} \ge 64 | 100 \text{ tosses of a fair coin}) = \\ 1 - P(\text{#heads} \le 63)\]
In R it’s easy to calculate using the (cumulative)
distribution function of the binomial pbinom:
1 - pbinom(q = 63, size = 100, prob = 0.5)
[1] 0.00331856
For \(80\) heads the p-value can be exactly found now:
1 - pbinom(q = 79, size = 100, prob = 0.5)
[1] 5.579545e-10
ranks = readRDS(file = 'ranks.rds')
We have conducted a benchmark as follows:
E.g. for the CoxNet model the performance across all
omic combinations is:
coxnet_scores =
ranks$mat['coxnet',] %>%
tibble::enframe(name = 'Task', value = 'Cindex') %>%
mutate(Task = forcats::fct_reorder(Task, Cindex, .desc = TRUE))
coxnet_scores %>%
ggplot(aes(x = Task, y = Cindex)) +
geom_point() +
geom_hline(yintercept = 0.5, linetype = 'dotted', color = 'red') +
labs(x = 'Omic Combinations', y = 'Harrell\'s C-index', title = 'CoxNet') +
ylim(c(0.4,0.6)) +
geom_text(aes(label = rank(-Cindex)), hjust = 0.5, vjust = -1) +
theme_bw(base_size = 14) +
theme(axis.text.x = element_text(angle = 60, hjust = 1))
Can I define a statistic/score per omic that will tell me how important that particular omic is (and strength of importance with the p-value)?
How to define importance in the above context?
Let’s use some toy data with \(3\) omics and all combos (\(2^3 - 1 = 7\)) - why not \(8\)?:
set1 = LETTERS[1:3]
set1
[1] "A" "B" "C"
get_subsets = function(set) {
lapply(1:length(set), combinat::combn, x = set, simplify = FALSE) %>%
unlist(recursive = FALSE) %>%
lapply(function(sub_set) {
paste0(sub_set, collapse = '-')
}) %>%
unlist(use.names = FALSE)
}
subsets = get_subsets(set1)
subsets
[1] "A" "B" "C" "A-B" "A-C" "B-C" "A-B-C"
How many times every omic appears in the above list? For \(n\) omics?
# Answer: $2^{(n-1)}$
# In the example: n = 3 => 2^2 = 4 times
Example rankings - 3 x same figure, change color if omic combo has:
set.seed(42)
scores = rnorm(1:7, mean = 0.5, sd = 0.1)
tbl = tibble(Task = subsets, meas = scores) %>%
mutate(Task = forcats::fct_reorder(Task, meas, .desc = TRUE))
p = tbl %>%
ggplot(aes(x = Task, y = meas)) +
geom_point(size = 3) +
geom_hline(yintercept = 0.5, linetype = 'dotted', color = 'red') +
labs(x = 'Omic Combinations', y = 'Harrell\'s C-index', title = 'Toy Example') +
ylim(c(0.4,0.7)) +
geom_text(aes(label = rank(-meas)), size = 10, hjust = 0.5, vjust = -1) +
theme_bw(base_size = 20)
mycol1 = ifelse(grepl(pattern = 'A', x = levels(tbl$Task)), 'red', 'black')
mycol2 = ifelse(grepl(pattern = 'B', x = levels(tbl$Task)), 'blue', 'black')
mycol3 = ifelse(grepl(pattern = 'C', x = levels(tbl$Task)), 'purple', 'black')
# A more important
p + theme(axis.text.x = element_text(color = mycol1))
# B next
p + theme(axis.text.x = element_text(color = mycol2))
# C last
p + theme(axis.text.x = element_text(color = mycol3))
A more important omic will be in more combos towards the left of the above figures!
Let’s define based on the more left-idea!
Define as score the sum of ranks (the lower the better):
A => \(1+2+3+4=10\)
(best possible, \(4\) most-left
ranks)B => \(1+3+6+7=17\)C => \(1+4+5+6=16\)
(a little better than \(8\))
Worst score would be?
\[score = 1 - \frac{sum(ranks) - best(score)}{worst(score) - best(score)}\]
Therefore we have:
A: \(score = 1 - \frac{10 -
10}{22 - 10}=1\)B: \(score = 1 - \frac{17 -
10}{22 - 10}=0.41\)C: \(score = 1 - \frac{16 -
10}{22 - 10}=0.5\) Observation: every omic in the toy example will take \(4\) values from the below rankings:
nomics = 3
ranks = 1:(2^nomics-1)
ranks
[1] 1 2 3 4 5 6 7
We just need to randomly select \(4\) of these and add them (+normalization step) to get a possible rank-sum score. Of course we will repeat this procedure multiple times:
nsets = 2^(nomics-1) # 4
# smaller possible rank sum (all left ranks)
best_rs = sum(1:2^(nomics-1)) # 10
# larger possible rank sum (all right ranks)
worst_rs = sum(seq(to = 2^nomics-1, length.out = 2^(nomics-1))) # 22
# generate null dist - same for each omic
# use simple Monte Carlo random sampling
set.seed(42)
nsamples = 1000
null_nrs = sapply(1:nsamples, function(s) {
rank_sum = sum(sample(x = ranks, size = nsets, replace = FALSE))
1 - (rank_sum - best_rs)/(worst_rs - best_rs)
})
hist(null_nrs)
A omic is important at the \(0.05\) significance level:
scoreA = 1
pval_A = (sum(null_nrs >= scoreA)+1)/(length(null_nrs)+1)
pval_A
[1] 0.03996004
scoreB = 0.41
pval_B = (sum(null_nrs >= scoreB)+1)/(length(null_nrs)+1)
pval_B
[1] 0.7052947
scoreC = 0.5
pval_C = (sum(null_nrs >= scoreC)+1)/(length(null_nrs)+1)
pval_C
[1] 0.5864136
Another hypothetical example with a \(score = 0.92\):
score = 0.92
pval = (sum(null_nrs >= score)+1)/(length(null_nrs)+1)
pval
[1] 0.03996004
Unique scores in the null distribution very few:
unique(null_nrs)
[1] 0.25000000 0.33333333 0.41666667 0.50000000 1.00000000 0.91666667
[7] 0.16666667 0.58333333 0.83333333 0.00000000 0.66666667 0.75000000
[13] 0.08333333
Let’s focus on A omic and focus not on rankings but on
the actual performance scores:
tbl %>%
ggplot(aes(x = Task, y = meas)) +
geom_point(size = 3) +
geom_hline(yintercept = 0.5, linetype = 'dotted', color = 'red') +
labs(x = 'Omic Combinations', y = 'Harrell\'s C-index', title = 'Toy Example') +
ylim(c(0.4,0.7)) +
geom_text(aes(label = sprintf("%0.3f", round(meas, digits = 3))), size = 7, hjust = 0.5, vjust = -1) +
theme_bw(base_size = 20) +
theme(axis.text.x = element_text(color = mycol1))
tbl = tbl %>%
mutate(grp = case_when(
grepl(pattern = 'A', x = Task) ~ 'A',
TRUE ~ 'not A')
)
tbl %>%
ggplot(aes(x = grp, y = meas, fill = grp)) +
geom_boxplot() +
theme_bw(base_size = 20)
A = tbl %>% filter(grp == 'A') %>% pull(meas)
notA = tbl %>% filter(grp == 'not A') %>% pull(meas)
# is performance scores from 'A' distribution less than 'notA'?
wilcox.test(x = A, y = notA, alternative = 'greater')
Wilcoxon rank sum exact test
data: A and notA
W = 12, p-value = 0.02857
alternative hypothesis: true location shift is greater than 0
Results are close - the calculation of W statistic is
similar to our logic (sum of ranks)
The idea for this statistic came from viewing this as a gene enrichment problem (see References).
Let’s view an example using the research results
and consider the Clinical as the omic of interest:
coxnet_scores = coxnet_scores %>%
mutate(grp = case_when(
grepl(pattern = 'Clinical', x = Task) ~ 'Clin',
TRUE ~ 'not-Clin')
)
clin = coxnet_scores %>% filter(grp == 'Clin') %>% pull(Cindex)
not_clin = coxnet_scores %>% filter(grp == 'not-Clin') %>% pull(Cindex)
Histograms/Density of the performance scores (C-indexes), comparing omic-combos that had Clinical features included vs those that did not:
ggplot(coxnet_scores, aes(y = Cindex, fill = grp)) +
geom_histogram(bins = 20) +
geom_density(alpha = 0.3) +
ylim(c(0.4, 0.6)) +
coord_flip() +
theme_bw(base_size = 14)
Warning: Removed 4 rows containing missing values (`geom_bar()`).
The empirical Cumulative Distribution Function for each group is:
ggplot(coxnet_scores, aes(Cindex, colour = grp)) +
stat_ecdf() +
labs(y = 'Empirical CDF') +
theme_bw(base_size = 14)
ks.test(x = clin, y = not_clin, alternative = 'less')
Exact two-sample Kolmogorov-Smirnov test
data: clin and not_clin
D^- = 0.62083, p-value = 0.0009361
alternative hypothesis: the CDF of x lies below that of y
Usually it’s better to not re-invent the wheel but sometimes we need to!