Interview with Kelly Wright, Sociolinguist

Alex Mitchell
5 min readJun 19, 2020

Kelly Wright is a linguistics researcher at the University of Michigan, using algorithms to uncover unconscious bias. I interviewed her about data, race and language.

Kelly, can you tell us a bit about yourself and your work?

I call myself an experimental sociolinguist. I’m primarily interested in the intersection of cognition and social structure; how we make sense of the world as we’re moving through it. That’s how we speak, how we write, and what we hear when we listen to someone speak.

Part of what I look at is how texts, which have been created by people with implicit biases, show how those biases come through. To take an everyday example, when you see someone walking towards you, before they say hello you have a sense of how they will sound based on how they look, how they carry themselves and so on. And when they don’t sound like you expect it surprises you, it troubles you.

In practical terms, this means that if people don’t associate Blackness with competence — when they see a Black person at the absolute top of their game, talking with expertise — they still question them or think they don’t have that expertise.

My work is on sports journalism. I started by noticing that people spoke about Serena Williams in such a different way than they spoke about her contemporaries, so I built a corpus of articles on Serena Williams. Just counting the words shows that people are not using the same words with the same frequency to describe her as her contemporaries.

That led to the corpus I have now. I’m using 100 years of American sports journalism to see how Black people are described.

What did you find about how are Black people described differently?

One word really stands out: athletic. All the people these articles describe are athletes at the highest level of their sports! But Black athletes are described using this work by a 3:1 ratio. It plays into the stereotype that Black people are naturally athletic.

Then you’ve got words like insatiable, unstoppable, using these animalistic terms to describe Black athletes. On the other side, you’ve got a phrase like “class act”, which works the same way as “articulate”; you saw me and you didn’t expect me to sound smart, but I sounded smart. In the same way, it’s communicating that on some level the writer expected these sportspeople to behave like trash, but they didn’t.

Can you talk us through building the corpus?

The corpus was easier to get that you might think. If you find this interesting, you can build a corpus for yourself in a weekend. It starts out with an advanced search on Google. You put in your search terms, put in restrictions and use a piece of software that pulls all the URLs from a Google search. Then you take out the noisy results, and after that there are couple of pieces of code that strip all the text out, sort the text and count the words.

The information is there! The ability to do this should not be confined to the academy.

Can you talk us through building the algorithm?

The algorithm looks for imbalances and shows me asymmetries. For something like “athletic”, we would expect it to be even across races and genders. I tell the algorithm that I expect everything to be even. In a perfect world, we would just be reporting sports!

Then I ask the algorithm to sort the corpus into two meaningful categories. Because it’s balanced for race and gender, it can choose which category is most informative. And it sorted them into categories based on race. 10% of the data is already marked up, already categorised for race and gender. Then you get it to predict the remaining 90%. And you open up these categories and look at which athletes are in what category.

Normally you’d have a lot of gradience in the middle. For example, for some members of those categories it would be only 49%, 53%, likely that that member should be in that categories. But mine was a straight line down the middle in the two groups. The white athletes in the corpus were all 3% likely to be Black and the Black athletes 96% likely to be Black.

I didn’t believe these results when I got them. It’s not meant to work this way. I checked with our code office, I checked with my PhD advisor and they said, what you’ve done looks right but you shouldn’t get these results. Finally, I checked my code with an expert in the field who said — this is the result you got. This is right.

Just counting words was enough to show that racialization is real.

How did you get to see the most important words?

As well as the counting I asked the algorithm to do the prediction task, using a random forest model to output a list of words that are important.

It’s not just the words, but also how they are used. So “culture” occurs in a 2:1 ratio in the Black corpus. For a Black athlete, it’s more likely to be talking about infusing their, the player’s, culture with their sport. With the white athlete it’s more likely to be talking about the culture of the sport itself.

You get all kinds of words you wouldn’t expect. “Rolex” only occurs in the white corpus. Athletes of both races have endorsement details with Rolex, but sports writers only talk about it when it’s white players.

You could write a lot of words associated with race, and never get to Rolex. But you can see it with an analysis like this.

What sort of changes would you like to see coming out of your work?

Changing the assumptions we make when we perceive the world is difficult. It’s a generational task.

We don’t teach linguistics really anywhere out of graduate school. People don’t learn about language in history, biology, even art. But we use language every day! It’s important to make people aware of the amazing things that language can do, how powerful it is. The more you say things over and over, the more they become the established way of doing things. That’s how these conversations about nationalism and ideology happen — it’s saying something over and over again. It takes individual effort to change.

My discipline is about recasting linguistics as public scholarship. There are many people who don’t know they are being oppressed, people who don’t know that their voice is part of how they are perceived. As linguists it’s our responsibility to make this clear — that the way people use language impacts the legal system, healthcare, housing, career advancement. It affects people everywhere.

It’s important to say here that we are all racialized people. Just because whiteness acts as a default it doesn’t mean it’s not a race. My work is not about how people of colour are described, it’s about how people are described.

And everyone has an accent! Just spending time with your own voice can be really informative.

What sorts of changes would you like to see coming out of your work?

We’ve seen motion like this in our lifetime. You see mental health and disability advocates who have person-first language to give minorities more agency. You see the feminist movement getting chairperson and not chairman as the thing we say. And now, a lot of people have experienced or are experiencing change in the way they use pronouns. That’s a place to point to and say — yes, we can do this work. Every time you as one person do the thing, it has an effect on the world at large.

It sounds like a huge task, and it is, but I am hopeful.

Alex note: if you enjoyed this interview, you can also try my newsletter — https://tinyletter.com/feministfriday

--

--

Alex Mitchell

Collected the Complete Short Stories of F. Scott Fitzgerald. Edits the Feminist Friday newsletter. Also I’m a data analyst.