Race, ethnicity and ancestry. How are they different? And, as testing and cell therapy science evolves, do they still impact human leukocyte antigen (HLA) genotyping? That was the focus of discussion on an American Society for Histocompatibility & Immunogenetics (ASHI) podcast.
The ASHI hosts—Kelley Hitchman, PhD, Eric Weimer, PhD, and Jeremy Sherrill—dug into the details with Abeer Madbouly, PhD, CIBMTR® principal bioinformatics scientist. The content has been edited for length and clarity.
Hosts: We’ve been using HLA typing data hand-in-hand with ancestry, ethnicity and race for a long time, especially in imputation. What is the difference between race, ethnicity, and ancestry and which should we be using when we talk about HLA typing?
Dr. Madbouly: This is a very important question. When we talk about race, it is mostly the color of a person’s skin, their ancestral heritage and there is some association with the geographic ancestry. There is a big social construct that is associated with race and some people may identify differently as time goes by.
Ethnicity is more related to shared culture. So, for example, when we say Hispanic ethnicity, this is linked to Spanish speaking populations. If we’re including individuals from populations in Latin America, however, I like to use the term Latino. That’s because you have individuals who speak Portuguese, not Spanish, like Brazilians who don’t identify as Hispanic. So, ethnicity is more tied into cultural aspects.
Ancestry is tied to genetics. This is what relates the most with HLA, particularly geographical ancestry. HLA has been shaped over generations depending on what different populations have gone through—from natural disasters to forced migrations to wars to pandemics. This is what really matters when we’re talking about HLA, and particularly imputation.
Based on where we are in the field of HLA typing, how important do you see ancestry being going forward?
Dr. Madbouly: I get that question a lot. People think that because we’re doing high-resolution HLA typing for the NMDP RegistrySM, for example, that we potentially won’t need imputation and, therefore, won’t need the ancestry information for HLA anymore. I know the solid organ field is also moving in the direction of high-resolution typing.
However, we will always need this information. NMDPSM has been in existence for more than 35 years and typing technologies have evolved. The NMDP Registry still has more than half of our members typed with some ambiguity.
That means we still have potential donors who have suboptimal typing or significant gaps. Only one-third of registry members are typed at DPB1, for example. We still need ancestry data quite a bit for imputation and matching in the hematopoietic stem cell transplant field.
Outside the realm of matching models, we always need to collect this data for planning. We use this information to identify populations that are underserved and need more recruitment. This helps us plan donor recruitment strategies accordingly.
We also must report match rates for multiple populations in the United States to the Health Resources & Services Administration—known as HRSA—because we have a contractual agreement with the U.S. government.
Hosts: A lot of the race, ethnicity and ancestry information we receive is self-reported by the person we are HLA typing. How does self-reporting change the way we can look at this data and use this data?
Dr. Madbouly: Self-reporting race and ethnicity is not immune from errors. But we can’t do genome-wide snipping for all 8 million-plus donors on the NMDP Registry. That’s cost prohibitive and it still won’t solve some of our problems.
Self-reporting race and ethnicity is really our segue to the genetic information we need to better serve patients. We’ve tried to improve how we collect this information and do analysis around the HLA data so we can group people in an efficient way in populations that will be most representative of the HLA variation.
We now ask people to self-report beyond the five broad census groups. They can choose Ashkenazi, Farsi, Iranian, Central Asian, Puerto Rican and so on. We’re collecting all this information to improve the way we match donors and patients from these ethnicities.
We need this information because haplotype frequencies are used to generate predictions and haplotypes change for every population. A haplotype common in one population might not be in another and that impacts imputation. We need this information to generate more accurate predictions.
Hosts: You talked about how having race and ethnicity is important to increase the granularity for the less common race and ethnicity groups. Isn’t some of the race and ethnicity information encoded into the genetics itself?
Dr. Madbouly: Yes, you have the genetic marker in the actual HLA, but you still need the haplotype frequencies. Here’s why. If you look at the composition of the NMDP Registry, two-thirds of our donors are of European ancestry. The remaining third are from all other areas of the world.
That means we have groups—like Alaskan Natives, South Central American Black or Caribbean Hispanic—for whom we don’t have a huge sample size of donors on the registry. However, we try as much as possible to formulate haplotype frequencies from these samples to be representative of these populations. The larger the sample, the better the frequencies will be, and the more represented of the population it will be.
Let’s say you have a relatively rare HLA type like the one I have. I don’t have a match on the registry. I’m Egyptian by ancestry and I have the malaria-protective alleles B51 and B53. You don’t find those a lot and there aren’t a lot of Egyptians on the registry. But when you’re imputing, you don’t want to use the European haplotype frequencies in imputation. You want to use the Northern African ones or the other populations that have these alleles.
You’re probably not going to get a very high frequency of my genotype anywhere, so you want to have the population representation as much as possible, particularly for these underrepresented types.
Hosts: It seems that it would be most beneficial if we could get a huge HLA database that has some of the ethnicity information encoded into it so it starts to address some of the finer granularity you’ve discussed. What are your thoughts on that and what is step one?
Dr. Madbouly: We’ve actually already taken step one. We make our frequencies publicly available for research because the NMDP Registry arguably holds the most diverse HLA dataset in the world. We make our frequencies available so researchers can benefit from that diversity and get more knowledge of HLA in different populations.
We published the most recent frequencies in early 2023 and the match projections and frequencies are also available with this publication. We’re constantly striving towards this goal.
There are other organizations that are also trying to focus on other groups like Caribbean groups or Latino groups, for example, because we know these populations are underrepresented in multiple databases.
Let’s face it. When you’re doing health care research, most of the samples are white. We know that and there are many reasons for it. But we are always trying to get more diverse data for research and for better matching. It’s a complex problem, but we try to make the data publicly available whenever we have it.
Access the publicly available dataset from NMDP: HLA frequencies dataset
Models, analyses and interpretations that offer answers
With HLA and KIR work that spans decades, our researchers and scientists are uniquely qualified to support your allogeneic cell therapy needs. This expertise allows us to develop methods that support proper donor/patient histocompatibility for allogeneic cell therapies. Discover how our team uses data-driven insights to help you accelerate your allogeneic cell therapy development and manage risk.