jnmaciuch
Senior Member (Voting Rights)
Your intercept only comparison is not correct:Here is some R code simulating this with random data where NII-RF is correlated to age and to ME/CFS status:
Code:total_trials <- 1000 more_significant_count <- 0 for (i in 1:total_trials) { age <- rnorm(50, mean = 45, sd = 10) mecfs_status <- rbinom(50, 1, prob = 0.5) niirf <- age + mecfs_status + rnorm(50, mean = 0, sd = 1) model1 <- lm(niirf ~ 1) # Intercept only model2 <- lm(niirf ~ mecfs_status) anova_1_2 <- anova(model1, model2) anova_1_2_pval <- anova_1_2[['Pr(>F)']][[2]] model3 <- lm(niirf ~ age) model4 <- lm(niirf ~ mecfs_status + age) anova_3_4 <- anova(model3, model4) anova_3_4_pval <- anova_3_4[['Pr(>F)']][[2]] if (anova_1_2_pval > anova_3_4_pval) { more_significant_count <- more_significant_count + 1 } } print(paste0("In ", more_significant_count, " out of ", total_trials, " total trials (", 100*more_significant_count/total_trials, "%), the p-value when adding ME/CFS status to an intercept-only model was higher (less significant) than the p-value when adding ME/CFS status to a model with an age covariate.")) anova_1_2 anova_3_4
Output:
To get an idea of what the p-values are, I looked at the ANOVA results for the last trial:
When comparing the models without age, the p-value was 0.48. With age, it was 0.03.Code:> anova_1_2 Analysis of Variance Table Model 1: niirf ~ 1 Model 2: niirf ~ mecfs_status Res.Df RSS Df Sum of Sq F Pr(>F) 1 49 3947.8 2 48 3906.9 1 40.879 0.5022 0.4819 > anova_3_4 Analysis of Variance Table Model 1: niirf ~ age Model 2: niirf ~ mecfs_status + age Res.Df RSS Df Sum of Sq F Pr(>F) 1 48 52.230 2 47 47.129 1 5.1011 5.0872 0.0288 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Result:total_trials <- 1000
more_significant_count <- 0
for (i in 1:total_trials) {
age <- rnorm(50, mean = 45, sd = 10)
mecfs_status <- rbinom(50, 1, prob = 0.5)
niirf <- age + mecfs_status + rnorm(50, mean = 0, sd = 1)
model1 <- lm(niirf ~ 0) # Intercept only
model2 <- lm(niirf ~ 0 + mecfs_status)
anova_1_2 <- anova(model1, model2)
anova_1_2_pval <- anova_1_2[['Pr(>F)']][[2]]
model3 <- lm(niirf ~ age)
model4 <- lm(niirf ~ mecfs_status + age)
anova_3_4 <- anova(model3, model4)
anova_3_4_pval <- anova_3_4[['Pr(>F)']][[2]]
if (anova_1_2_pval > anova_3_4_pval) {
more_significant_count <- more_significant_count + 1
}
}
print(paste0("In ", more_significant_count, " out of ", total_trials, " total trials (", 100*more_significant_count/total_trials, "%), the p-value when adding ME/CFS status to an intercept-only model was higher (less significant) than the p-value when adding ME/CFS status to a model with an age covariate."))
anova_1_2
anova_3_4
"In 13 out of 1000 total trials (1.3%), the p-value when adding ME/CFS status to an intercept-only model was higher (less significant) than the p-value when adding ME/CFS status to a model with an age covariate."
or, allowing intercept fitting (the better comparison):
Result:for (i in 1:total_trials) {
age <- rnorm(50, mean = 45, sd = 10)
mecfs_status <- rbinom(50, 1, prob = 0.5)
niirf <- age + mecfs_status + rnorm(50, mean = 0, sd = 1)
model1 <- lm(niirf ~ 0) # Intercept only
model2 <- lm(niirf ~ mecfs_status)
anova_1_2 <- anova(model1, model2)
anova_1_2_pval <- anova_1_2[['Pr(>F)']][[2]]
model3 <- lm(niirf ~ age)
model4 <- lm(niirf ~ mecfs_status + age)
anova_3_4 <- anova(model3, model4)
anova_3_4_pval <- anova_3_4[['Pr(>F)']][[2]]
if (anova_1_2_pval > anova_3_4_pval) {
more_significant_count <- more_significant_count + 1
}
}
"In 0 out of 1000 total trials (0%), the p-value when adding ME/CFS status to an intercept-only model was higher (less significant) than the p-value when adding ME/CFS status to a model with an age covariate."
[note: sorry for all the editing, was writing quickly. I don't think the intercept is the major problem here--I figured it would be faster to write code that tests the issue I brought up directly (see post #86)]
Last edited: