Patriarchy makes women ill

Post by **bjn** » Thu Nov 19, 2020 8:35 pm

Ars has an interesting review of two papers.

The first of which compares the health of women in the matrilineal Mosuo people of China. Some families are patriarchal, equivalent to most patriarchal societies with men as the head of the house hold. Some are still traditionally matriarchal, where the woman is the head of the household, the husband lives with his original family and the children stay with the mother. Otherwise the households are part of the same culture and environment.

They compared the health of the women in the two types of households and it was markedly different. With hypertension and chronic inflammation significantly higher among the patriarchal households. For men, it was pretty much the same between the two types of households.

Patriarchy sucks.

sTeamTraen · Post by **sTeamTraen** » Fri Nov 20, 2020 1:04 am

bjn wrote: Thu Nov 19, 2020 8:35 pm They compared the health of the women in the two types of households and it was markedly different. With hypertension and chronic inflammation significantly higher among the patriarchal households. For men, it was pretty much the same between the two types of households.

Well, not quite. There are very few cases and not a very large sample size. The authors made their data available (hooray!), so we can check.

For chronic inflammation, in the patriarchal communities I get:

Code: Select all

> fisher.test(matrix(c(98,9,61,2), ncol=2))
p-value = 0.2155
95 percent confidence interval:
 0.03654554 1.81669405

and in the matriarchal communities:

Code: Select all

> fisher.test(matrix(c(134,5,58,4), ncol=2))
p-value = 0.4618
95 percent confidence interval:
 0.3523594 8.9022510

For hypertension, in the patriarchal communities I get:

Code: Select all

> fisher.test(matrix(c(164,104,103,51), ncol=2))
p-value = 0.2507
95 percent confidence interval:
 0.5028465 1.2065136

and in the matriarchal communities:

Code: Select all

> fisher.test(matrix(c(243,102,123,68), ncol=2))
p-value = 0.1748
95 percent confidence interval:
 0.8871498 1.9490081

In those matrix(c()) things, the values from left to right are the counts of, respectively, women, no case; women, case; men, no case; men, case. ("Case" means elevated CRP for inflammation, or hypertension.)

None of these would normally be considered a significant result, nor would we have great confidence that they would replicate with a different sample. Sadly, PNAS publishes a lot of papers like this. Their review process for the social sciences is woefully inadequate.

So while patriarchy does indeed suck, this article doesn't provide very good evidence for what it's claiming about its physiological effects.

And FWIW, the second paper mentioned in the Ars article seems to be even worse. We really should not be publishing this kind of mediation analyses any more. Back in April of this year, I blogged here about why not.

Post by **bjn** » Fri Nov 20, 2020 9:22 am

Thanks for the analysis and doing what you do best.

Post by **Bird on a Fire** » Fri Nov 20, 2020 11:03 am

sTeamTraen wrote: Fri Nov 20, 2020 1:04 am
bjn wrote: Thu Nov 19, 2020 8:35 pm They compared the health of the women in the two types of households and it was markedly different. With hypertension and chronic inflammation significantly higher among the patriarchal households. For men, it was pretty much the same between the two types of households.
Well, not quite. There are very few cases and not a very large sample size. The authors made their data available (hooray!), so we can check.

<snip>

Hmm - not sure about this analysis. You show that there's no statistically significant difference between sexes within each community, but the article's claim is that the sex ratio of cases changes between communities, which is a different question. It's entirely plausible that the change in ratio could be statistically detectable even if the sex difference itself isn't.

Annoyingly I don't have access to the article and can't find a pdf anywhere to see what analysis the authors do.

sTeamTraen · Post by **sTeamTraen** » Fri Nov 20, 2020 3:01 pm

Bird on a Fire wrote: Fri Nov 20, 2020 11:03 am Hmm - not sure about this analysis. You show that there's no statistically significant difference between sexes within each community, but the article's claim is that the sex ratio of cases changes between communities, which is a different question. It's entirely plausible that the change in ratio could be statistically detectable even if the sex difference itself isn't.

Annoyingly I don't have access to the article and can't find a pdf anywhere to see what analysis the authors do.

To my surprise the PDF wasn't even available on Sci-Hub, so I've put it here.

My analyses are an attempt to reproduce the claims made in both the Ars Technica piece and the article itself, that women had significantly worse outcomes in patriarchal communities and significantly better outcomes in matriarchal ones. However, you are probably right in suggesting that whether men or women do better or worse in each type of community ought to be the outcome of interest. In the PNAS article, the authors presented the first type of comparison only narratively (with no chi-square test statistics), and they didn't mention the second.

I did the analyses from my previous post in a hurry last night using Excel to count the cases, but today I wrote some proper code to do it, so it was easy to extend that to the question of whether women and/or men do better across types of community. Here's what I got. The first four results are the same as in my previous post.

Code: Select all

Inflammation (CRP) analysis for patriarchal groups
Females: total=107 cases=9 (8.41%)
Males: total=63 cases=2 (3.17%)
p-value for difference by Fisher's exact test=0.216

Hypertension analysis for patriarchal groups
Females: total=268 cases=104 (38.81%)
Males: total=154 cases=51 (33.12%)
p-value for difference by Fisher's exact test=0.251

Inflammation (CRP) analysis for matriarchal groups
Females: total=139 cases=5 (3.60%)
Males: total=62 cases=4 (6.45%)
p-value for difference by Fisher's exact test=0.462

Hypertension analysis for matriarchal groups
Females: total=345 cases=102 (29.57%)
Males: total=191 cases=68 (35.60%)
p-value for difference by Fisher's exact test=0.175

Inflammation (CRP) analysis for females across groups
Patriarchal: total=107 cases=9 (8.41%)
Matriarchal: total=139 cases=5 (3.60%)
p-value for difference by Fisher's exact test=0.163

Inflammation (CRP) analysis for males across groups
Patriarchal: total=63 cases=2 (3.17%)
Matriarchal: total=62 cases=4 (6.45%)
p-value for difference by Fisher's exact test=0.440

Hypertension analysis for females across groups
Patriarchal: total=268 cases=104 (38.81%)
Matriarchal: total=345 cases=102 (29.57%)
p-value for difference by Fisher's exact test=0.020

Hypertension analysis for males across groups
Patriarchal: total=154 cases=51 (33.12%)
Matriarchal: total=191 cases=68 (35.60%)
p-value for difference by Fisher's exact test=0.650

So out of eight 2x2 analyses, one (hypertension for females; better outcome in the matriarchal communities) has a "conventionally significant" p value; this is interesting, but some way from conclusive.

I used the dataset file supplied by the authors here, and my code is ready to run from here.

Normally the correct way to test this kind of thing is with an interaction (gender x community type x outcome). By coincidence I was reading another paper just a week ago which claimed to have done a 2x2x2 interaction of this kind, but didn't give any details (probably because the research was entirely fake --- one of Nicolas Guéguen's), so I asked around and it turns out that interactions are quite difficult to do in a logistic regression/contingency table framework; several people who know their stuff gave me different answers (see thread here).

The PNAS authors have built something using a Bayesian modelling package which they claim shows an effect; this may be as good a way as any other. I need to write to them anyway, because the numbers that they reported in each condition for hypertension don't match what my code calculated, so I will ask them about their reasoning and keep this thread informed. I will also ask one of my Bayesian friends what they think of the analyses (Bayesians *love* to explain other people's Bayesian models).

But in the case of inflammation, at least, I still don't think you can do much with 9 cases in one group and 2 in the other; sampling error is going to kill any chance of generalizability. I also really want to know why the authors took their continuous measures of CRP and hypertension and categorised them. This is not generally considered a good practice, as it necessarily discards information and loses statistical power.

Post by **Bird on a Fire** » Fri Nov 20, 2020 4:19 pm

sTeamTraen wrote: Fri Nov 20, 2020 3:01 pm To my surprise the PDF wasn't even available on Sci-Hub, so I've put it here.

Thanks!

sTeamTraen wrote: Fri Nov 20, 2020 3:01 pmHowever, you are probably right in suggesting that whether men or women do better or worse in each type of community ought to be the outcome of interest. In the PNAS article, the authors presented the first type of comparison only narratively (with no chi-square test statistics), and they didn't mention the second.

I interpreted the following from the abstract

We tested the hypothesis that female-biased gender norms ameliorate gender disparities in health by comparing gender differences in inflammation and hypertension among the matrilineal and patrilineal Mosuo of China. Widely reported gender disparities in health were reversed among matrilineal Mosuo compared with patrilineal Mosuo, due to substantial improvements in women’s health, with no concomitant detrimental effects on men.

as meaning that they had compared the frequency of each condition between the sexes in both kinds of society, and done that comparison. Now I can see the paper, it seems that indeed they have:

Using Bayesian logistic regression, we modeled an interaction between gender and kinship to test whether men and women experience differences in chronic inflammation and hypertension in matrilineal and patrilineal communities, controlling forage (Table 1). Fig. 1 shows the predicted probabilities of elevated CRP and hypertension resulting from this model. We find reversed gender disparities in both inflammation and hypertension under matriliny, driven primarily by reduced probabilities of both chronic inflammation (0.02) and hypertension (0.17) for women in matriliny compared with patriliny (inflammation: 0.05,∆= 0.03; hypertension: 0.32,∆= 0.15). These effects were robust to controls for both age and body mass index (SI Appendix,Table S1).

So they're use a logistic regression rather than a series of chi-square tests, which seems like a better use of the data to me, maximising the information from relatively rare observations. Chi-squared is a particularly low-powered test so it's not necessarily surprising that it doesn't find effects that regression can, especially breaking it down by sex and community instead of doing an omnibus analysis.

sTeamTraen wrote: Fri Nov 20, 2020 3:01 pmNormally the correct way to test this kind of thing is with an interaction (gender x community type x outcome).

That's pretty much what their analysis is: a logistic regression for each health condition, with gender x community type as predictors.

sTeamTraen wrote: Fri Nov 20, 2020 3:01 pmThe PNAS authors have built something using a Bayesian modelling package which they claim shows an effect; this may be as good a way as any other. I need to write to them anyway, because the numbers that they reported in each condition for hypertension don't match what my code calculated, so I will ask them about their reasoning and keep this thread informed. I will also ask one of my Bayesian friends what they think of the analyses (Bayesians *love* to explain other people's Bayesian models).

It's just a standard formulation of a logistic regression:

To model the effect of matriliny on elevated inflammation and hyper-tension, we estimated the logit function using a binomial prior. Predictor variables were given weakly regularizing priors (μ= 0,σ= 10), which make the model more skeptical of nonzero parameter estimates than the flat priors assumed by a frequentist approach. Patrilineal women were used as the reference category for all models. Age was included as a control in all models because the data show a monotonically increasing prevalence of elevated inflammation and hypertension with age for all groups. The models were specified as follows:
Pr(outcome)∼Binomial(1,p)
Logit(p)∼α+β1×age+β2×men+β3×matriliny+β4×men×matriliny
β1∼Normal(0, 10)
β2∼Normal(0, 10)
β3∼Normal(0, 10)
β4∼Normal(0, 10).

The 'regularising priors' thing just means that the estimation procedure starts off by assuming the effect of sex or matriliny is 0, that it's equally likely to be positive or negative, and that the procedure will need a lot of evidence to change its estimate from 0 (the second parameter of the normal distribution given is Bayesian analyses is generally the precision, which is the reciprocal of the variance you might be expecting - in other words ~N(0,10) is a big spike at 0).

The package they used is the one from Richard McElreath's Statistical Rethinking book, which I've seen highly recommended but not used myself. (It uses Stan for the Bayesian programming bits, whereas most of the work on the obscure classes of hierarchical models I use has been done in BUGS or now JAGS).

I'd expect you ought to be able to get pretty similar results using a frequentist logistic regression, though, especially as the posterior probability distributions look relatively symmetrical.

sTeamTraen · Post by **sTeamTraen** » Fri Nov 20, 2020 4:31 pm

Bird on a Fire wrote: Fri Nov 20, 2020 4:19 pm I'd expect you ought to be able to get pretty similar results using a frequentist logistic regression, though, especially as the posterior probability distributions look relatively symmetrical.

If you have pointers for doing a logistic regression with interaction terms using categorical predictors, I'm all ears!

Post by **Bird on a Fire** » Fri Nov 20, 2020 5:36 pm

sTeamTraen wrote: Fri Nov 20, 2020 4:31 pm
Bird on a Fire wrote: Fri Nov 20, 2020 4:19 pm I'd expect you ought to be able to get pretty similar results using a frequentist logistic regression, though, especially as the posterior probability distributions look relatively symmetrical.
If you have pointers for doing a logistic regression with interaction terms using categorical predictors, I'm all ears!

What are the issues with fitting a glm in R?

There may well be a conceptual reason I'm unaware of, which could have motivated the authors' choice to go Bayesian

sTeamTraen · Post by **sTeamTraen** » Fri Nov 20, 2020 5:40 pm

Bird on a Fire wrote: Fri Nov 20, 2020 5:36 pm What are the issues with fitting a glm in R?

There may well be a conceptual reason I'm unaware of, which could have motivated the authors' choice to go Bayesian

No, it's that I don't know how to do it. But I suspect this is a good opportunity to learn.

(I mean, it's no problem to assemble the glm() call, but there are always so many options that one could consider. Whenever I confidently present almost any model made with a technique that I'm using for the first time, people tend to point out "obvious" options that I didn't even know existed.)

So, with that out of the way:

Code: Select all

> crp.logit <- glm(elevatedCRP ~ Male + Matriliny + Male*Matriliny, family = "binomial", data=df.crp)
> summary(crp.logit)

Call:
glm(formula = elevatedCRP ~ Male + Matriliny + Male * Matriliny, 
    family = "binomial", data = df.crp)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.4192  -0.4192  -0.2707  -0.2707   2.6268  

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -2.3877     0.3483  -6.855 7.11e-12 ***
Male            -1.0300     0.7986  -1.290    0.197    
Matriliny       -0.9007     0.5734  -1.571    0.116    
Male:Matriliny   1.6442     1.0547   1.559    0.119    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 155.72  on 370  degrees of freedom
Residual deviance: 152.25  on 367  degrees of freedom
AIC: 160.25

Number of Fisher Scoring iterations: 6

> ht.logit <- glm(htn ~ Male + Matriliny + Male*Matriliny, family = "binomial", data=df.ht)
> summary(ht.logit)

Call:
glm(formula = htn ~ Male + Matriliny + Male * Matriliny, family = "binomial", 
    data = df.ht)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.9911  -0.9382  -0.8372   1.3759   1.5611  

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -0.4555     0.1254  -3.634  0.00028 ***
Male            -0.2474     0.2122  -1.166  0.24361    
Matriliny       -0.4126     0.1721  -2.397  0.01653 *  
Male:Matriliny   0.5228     0.2860   1.828  0.06751 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1227.3  on 957  degrees of freedom
Residual deviance: 1221.2  on 954  degrees of freedom
AIC: 1229.2

Number of Fisher Scoring iterations: 4

(The lines starting with ">" assume you have run my Gist script.)

Post by **Bird on a Fire** » Fri Nov 20, 2020 5:58 pm

sTeamTraen wrote: Fri Nov 20, 2020 5:40 pm
Bird on a Fire wrote: Fri Nov 20, 2020 5:36 pm What are the issues with fitting a glm in R?

There may well be a conceptual reason I'm unaware of, which could have motivated the authors' choice to go Bayesian
No, it's that I don't know how to do it. But I suspect this is a good opportunity to learn.

<snip>

Nice one! I was just generating a tutorial using random data and that's the method I would have chosen too.

There's some useful stuff on getting different test statistics and model tables out of the results on this page https://stats.idre.ucla.edu/r/dae/logit-regression/ e.g. if you want a table of ORs.

As an aside, if you've figured out logistic regressions in R that quickly I don't think you'd have much trouble at least trying to read/run through the code for their Bayesian analysis - the rethinking package is designed for beginners and specifies the model explicitly, so you might be able to at least check if their analysis replicates. It's on github apparently.

sTeamTraen · Post by **sTeamTraen** » Fri Nov 20, 2020 6:09 pm

Bird on a Fire wrote: Fri Nov 20, 2020 5:58 pm As an aside, if you've figured out logistic regressions in R that quickly I don't think you'd have much trouble at least trying to read/run through the code for their Bayesian analysis - the rethinking package is designed for beginners and specifies the model explicitly, so you might be able to at least check if their analysis replicates. It's on github apparently.

I already have their code. It's a bit clunky because it doesn't format the output other than charts, so I think I will have to examine lots of variables to rebuild the tables. But the model specification for the logit looks fairly straightforward.

Their analyses do require a fuckton of software, though, which for me is also --- along with the non-significant interactions in the GLM model --- a sign that a great many simpler options may have been tried and rejected before they settled on this model (which, AFAIK, was not pre-registered anywhere). Fortunately I already had Stan installed from a couple of months ago --- it needed a trip to StackExchange because of an R4.x issue, IIRC. R never feels very solid to me with all these package dependencies, and the occasional message like "Do you want to compile this package from source? WELL, DO YA, PUNK?"

Post by **Bird on a Fire** » Fri Nov 20, 2020 6:30 pm

I do find the diversity of packages for R a bit of a blessing and a curse - on the one hand, there's lots of different ways to do things, and loads of different things you can do. On the other, it's hard to keep track of everything, and sometimes you end up needing to install a bajillion package dependencies.

FWIW though the analysis I'm working on at the moment uses maybe 10 different packages, and I promise I'm not up to anything sinister. There's a few I like for different data-munging methods, a few for plotting, a few needed to do particular analyses in particular ways.

For Bayesian stuff I often find I like one package's way of handling data, another's of visualising priors, another for fitting and converging the model, and then a couple more for posterior checks, plus whatever I'm doing for plotting (usually a few hundred different packages that all start with 'gg').

I think 'loads of packages' is quite normal for the generation of analysts who've been brought up using R.

sTeamTraen · Post by **sTeamTraen** » Fri Nov 20, 2020 10:17 pm

Wow --- adding age to the logistic regressions (as the authors did with their Bayesian models) makes a huge difference to the estimates. For example, for hypertension:

Code: Select all

> summary(ht.logit)

Call:
glm(formula = htn ~ Male + Matriliny + Male * Matriliny, family = "binomial", 
    data = df.ht)

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -0.4555     0.1254  -3.634  0.00028 ***
Male            -0.2474     0.2122  -1.166  0.24361    
Matriliny       -0.4126     0.1721  -2.397  0.01653 *  
Male:Matriliny   0.5228     0.2860   1.828  0.06751 .  

> summary(ht.logit.a)

Call:
glm(formula = htn ~ Male + Matriliny + Male * Matriliny + Age, 
    family = "binomial", data = df.ht)

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -4.052324   0.332626 -12.183  < 2e-16 ***
Male           -0.329885   0.235330  -1.402  0.16098    
Matriliny      -0.806737   0.196961  -4.096  4.2e-05 ***
Age             0.072951   0.006073  12.012  < 2e-16 ***
Male:Matriliny  0.842498   0.319289   2.639  0.00832 **

All of the p values go down, and the estimates go up, when age is added, which to me implies that there is some kind of suppression effect going on (because normally adding age would leave less variance to be explained by the other predictors). It's not clear why they added age to the model, or why they didn't at least give numbers for models without age (cf. here). It's normal to control for age and sex when running an RCT, but that's about improving the precision of the estimates. Miller and Chapman's (2001, 10.1037/0021-843x.110.1.40) observations about the limits of what you can meaningfully do with covariates still hold.

Post by **Bird on a Fire** » Sat Nov 21, 2020 2:40 am

They do say in the methods that

Age was included as a control in all models because the data show a monotonically increasing prevalence of elevated inflammation and hypertension with age for all groups.

which seems reasonable - those are conditions that become more common with age. I assume adjusting for age is quite common in studies of inflammation or hypertension, as it is with things like cancer or bone density.

If the samples have a different age they'd need to remove that source of variance for the rest of the analysis to be meaningful, for instance.

sTeamTraen · Post by **sTeamTraen** » Sat Nov 21, 2020 12:52 pm

Bird on a Fire wrote: Sat Nov 21, 2020 2:40 am They do say in the methods that
Age was included as a control in all models because the data show a monotonically increasing prevalence of elevated inflammation and hypertension with age for all groups.
which seems reasonable - those are conditions that become more common with age. I assume adjusting for age is quite common in studies of inflammation or hypertension, as it is with things like cancer or bone density.

If the samples have a different age they'd need to remove that source of variance for the rest of the analysis to be meaningful, for instance.

That's true, but when adding a covariate results in the magnitude of your coefficients going up, you stop and take a look at the data.

Here, I think the problem could be that the Male*Matriliny interaction term is quite strongly correlated with other predictors (i.e., its own components), but not the outcome. Specifically, across the full dataset and per-outcome subsets, it correlates .62-.67 with Male and .41-.44 with Matriliny, but only .02 with the outcome variable. Adding variables that are correlated with other predictors but not with the outcome leads to classical suppression (increased estimates) in continuous regression; suppression has been poorly studied in logistic regression (I have thought about doing a Master's in Statistics and making this my research topic, but I'm old and my maths probably isn't good enough), but I think it would be brave to imagine that something similar does not occur simply because you have a logit link function.

sTeamTraen · Post by **sTeamTraen** » Wed Nov 25, 2020 1:48 pm

I wrote to the authors to point out that a few of the reported percentages of people in each category in the article don't match the dataset:

Code: Select all

CRP/patrilineal/women, dataset says 8.3%, article says 8.4%
HT/patrilineal/women, dataset says 38.8%, article says 33.3%
HT/patrilineal/men, dataset says 33.1%, article says 26.1%
HT/matrilineal/women, dataset says 29.6%, article says 25.6%
HT/matrilineal/men, dataset say 35.6%, article says 27.8%

which seemed like a reasonable way to get a conversation started, in that there's not really any possibility for interpretation. They wrote back to say "Oops, we'll issue a correction". I'm not sure if these discrepancies affect the results, in that they don't seem to change the relations across groups (and in any case the authors didn't actually apply any statistical test to their claims about percentages, although those --- rather than the regessions --- did become the basis of the Ars Technica piece). I have asked a follow-up question about whether they are concerned about the small number of cases for inflammation, and (obliquely) why they presented the percentages without a statistical test.

I asked a Bayesian-minded friend what he thought of the numbers for the regressions and he said:
"As far as I can see this is weak evidence for the hypertension data" (he didn't comment on the inflammation data)
"It's quite possible that they did Bayesian regression because Chi2 didn't show significance, but IMO a Bayesian regression is generally more informative than a Chi2, so it's hard to prove that that is the reason."

The "weak evidence" comment makes sense, given what the non-Bayesian regressions seem to be showing. But I guess if you go to collect data from nigh on a thousand people in a remote part of China you're probably going to want to get a publication in a good journal out of it, and sadly, you don't get that by reporting that you only found weak evidence.

Scrutable

Patriarchy makes women ill

Patriarchy makes women ill

Re: Patriarchy makes women ill

Re: Patriarchy makes women ill

Re: Patriarchy makes women ill

Re: Patriarchy makes women ill

Re: Patriarchy makes women ill

Re: Patriarchy makes women ill

Re: Patriarchy makes women ill

Re: Patriarchy makes women ill

Re: Patriarchy makes women ill

Re: Patriarchy makes women ill

Re: Patriarchy makes women ill

Re: Patriarchy makes women ill

Re: Patriarchy makes women ill

Re: Patriarchy makes women ill

Re: Patriarchy makes women ill