The language of race, ethnicity, and ancestry in human genetic research

Fishnut · Post by **Fishnut** » Mon Jun 21, 2021 9:40 pm

A new paper by Adam Rutherford and colleagues looks at the language used in genetic research and discusses how that language is being used to promote racist ideologies by some, and makes suggestions of better terminology which will prevent this misappropriation.

I know this has been the topic of discussion in several threads but I've got very ropey wifi at the moment and am too lazy to search for the most appropriate one so figured I'd start a new thread.

The paper gives a couple of examples where ethnic labels group people together erroneously, including this one,

Another example in frequent use is “Bantu”, which effectively refers to a very broad linguistic grouping comprising hundreds of millions of people in Africa, speaking over 400 distinct languages or dialects. There is some overlap between genetic clusters and Bantu-speaking dialects or languages, but not across the whole group. Furthermore, the word Bantu was used as a catch-all term in apartheid-era South Africa for many different Black African peoples, including groups that were not Bantu-speaking, and had widespread derogatory use in that society. The continued use of these and other similar terms is particularly a problem for longitudinal studies that stretch back decades into the past when such terminologies were current. But even contemporary public health and governmental datasets use census terms which are often arbitrary, outmoded and inconsistent.

It explains why these groupings are so important in genetic research,

...human genetics is an inherently statistical science, one that describes correlations between genomic and phenotypic variation, and attempts to distinguish genetic and environmental effects on phenotypes. The way we group people plays a central role in these analyses, and in many cases, categories enable us to compare and contrast phenotypes and genotypes, and use simpler and more interpretable statistical models, which add to our power of discovery.

In other words, if you group people one way you'll get different results to if you group them another way. And if you're grouping them based on racist catch-all terms for "not like us" then don't be that surprised if you get racist results.

They make it clear that while these groupings may be spurious, they still matter,

These categories [of race and ethnicity], although socially constructed, can have profound biological effects. For example, by influencing a person’s geographical surroundings, their levels of chronic stress, their access to resources, and other aspects of their life history, they may have a major impact on prenatal and childhood development and the expression of human traits.

Thus, the social categories and other groupings that individuals belong to are inescapable components of genetics research. However, within the human genetics community, some aspects of the academic language used to describe groups and subsets of people may foster erroneous beliefs beyond academia about human biology and the nature of these categories. Such descriptions frequently invoke concepts of ancestry and population structure, for reasons we will discuss below. But ancestry itself is often a poorly understood concept, and its relationship to genetic data is not straightforward.

...it is sometimes assumed, incorrectly, that these labels, self-identified or otherwise assigned, can be used straightforwardly as a proxy for genetic ancestry. This reinforces the commonly-held but simplistic assumption that differences between ethnic groups are substantially due to genetic differences, and are in some sense innate.

This is a key point,

framing social environments in purely genetic terms may seem to imply a genetic explanation, rather than societal, cultural and historical explanations, for the existence of these categories and the differences between them. Moreover, some of the labels used are simply archaic and discredited terms from earlier eras of anthropology and human genetics, and therefore inappropriate as descriptors of human groupings today.

They then provide a list of terms they recommend against using and provide explanations for this conclusion and alternative suggestions while emphasising that they expect people to disagree with them and are writing to stimulate discussion.

Post by **Bird on a Fire** » Tue Jun 22, 2021 4:54 pm

Yes, I think this is one of those cases where academics are chatting to each other in their silos without always thinking too hard about how their work may be misused or misunderstood, and it's certainly worth raising the point.

In terms of the use of "Bantu", I've only seen it in this context to refer to the "Bantu expansion" - an inferred wave of migration across sub-Saharan Africa by speakers of the ancient proto-Bantu language. It's generally assumed that languages are originally spread primarily by movement of people (which is of course a hypothesis that can be tested with genetics to some extent). So their suggested replacement of "Bantu-speaking" wouldn't really cover it - they'd need something longwinded like "descended from speakers of proto-Bantu", which 2000+ years later won't overlap that well with the modern group of Bantu-speakers.

As with the example of "Native American", geneticists are often trying to understand historic geneflow rather than define present-day groups. I don't think any geneticists are intending to suggest that people with some European (etc.) ancestry can't or shouldn't identify as "Native American" nowadays; they're just using it as a convenient label to describe the genes native to the Americas before European colonisation. Again, their suggestion of "member of X tribe" works when describing the present-day origins of samples, but the historic groups researchers are intending to learn about may have had a completely different tribal identity. And obviously it wouldn't make sense to include European-descended genetic samples to study pre-Colombian geneflow.

I suppose the way to get around this would be "We used samples from present-day group X with Y,Z characteristics as an attempt to isolate genes from historic group A," and then to be carefully clear that the paper is about A and not X.

The underlying problem, of course, is people erroneously thinking that somebody's genetic background gives you much information about their present identity, or that population-level variation in traits is significant enough to justify treating individuals from different backgrounds differently. I'd hope that people doing genetic research would be keen to avoid having their work misused in that way.

Thanks for sharing.

Scrutable

The language of race, ethnicity, and ancestry in human genetic research

The language of race, ethnicity, and ancestry in human genetic research

Re: The language of race, ethnicity, and ancestry in human genetic research