Examining what is statistically true with Kareem Carr

Dec. 15, 2021, 6:47 PM UTC

By Why Is This Happening?

Statistics plays a role in virtually every facet of our lives. And throughout the pandemic, we’ve heard more stats than ever before, whether through headlines about Covid infection rates or vaccine effectiveness. But how are these figures calculated? How do we know when data is manipulated for nefarious reasons, and when it represents some true thing out there in the world? Lucky for us, Harvard Phd student Kareem Carr joined WITHpod for a heady conversation to break that and more down. Earlier in 2021, he shook up Twitter with a post about two plus two equaling five, a thread aimed at provoking some meditations on the nature of mathematical truth. He joins to discuss that, the importance of neutral AI algorithms, why statistics are anti-racist and why it’s essential to have a healthy level of skepticism of numbers. Side note: we’re approaching our holiday WITHpod Mailbag. Email us at withpod@gmail.com to share what you love about the podcast and what’s on your mind.

Note: This is a rough transcript — please excuse any typos.

Kareem Carr: If we do a mathematical analysis on data, are we doing knowledge generation? And are there circumstances under which we might not? One of the weak points in generating knowledge from data is if our assumptions are not accurate, our assumptions about physical reality are not accurate. That's basically why I started thinking really deeply about mathematical systems and the extent to which they represent reality. (MUSIC)

Chris Hayes: Hello, and welcome to "Why Is This Happening?" with me your host, Chris Hayes. One of the strange features of the pandemic era is how much time we spend looking at charts and numbers. It's become just, you know, the curve, the epidemiological curve.

The cases go up, or they go down. They never go flat. They're always going up or down. They kind of, like, dominate our visual field. I don't know, particularly me in my line of work I think about it a lot, I look at it a lot, and I'm constantly sorting through numbers and statistics, the rankings of which states are more vaccinated, which percentage of hospitalization cases are vaccinated people versus unvaccinated?

What percentage of people who are vaccinated have breakthrough cases? There's just this constant barrage of numbers and statistics to try to get our arms around understanding this fundamental question of where the virus is and what it's doing.

And the surveillance, which is the term they use in public health that leads into those numbers, it's like a form of x-ray vision or echolocation, right? There's this thing that's out there that's invisible, which is the microscopic virus with its proteins that are invading our systems.

And to try to get our eyes on it, what we do is we have processes to record cases and report them that produce these things like maps. And they produce charts, and they produce knowledge about the world. And then at the same time, as one is immersed in these statistics, one finds that people who want to deny the pandemic is real or deny the severity of the pandemic or deny the efficacy of vaccines have their own statistics. (LAUGH)

If you walk into the world of sort of COVID and vaccine denial, what you'll find is a lot of numbers, a lot of statistics. There was one particularly egregious example, which was a sort of open vaccine incident reporting system where Tucker Carlson and others who were sort of in this incredibly bad faith attempt to scare people about getting vaccine and ergo make more people sick and die.

I don't know if that's their intent, but that's the effect. You know, the vaccine reporting system is basically anyone can say, like, "Oh, after the vaccine, this happened to me," right? Well, (LAUGH) so they would take that. They'd say, like, "Oh, look, you know, thousands of people died after taking the vaccine."

Like, well, yeah, hundreds of millions of people got shots, and, well, things can happen, right? So it was such to me a pure example of, like, the manipulation of numbers and statistics towards nefarious ends. And I've been thinking a lot about, like, how to sift through statistics, how to sift through all the data that we're constantly being thrown.

How we form our opinions about the world. How to develop a kind of sensibility or expertise for data and statistics, and then a kind of deeper almost philosophical question which is about, like, their actual nature as knowledge. Like, is a given piece of data a thing about the world or simply a kind of, like, utterance of something expressive by a person who's trying to make a point?

And that's, like, flirting with a sort of postmodern view of this, but all of these ideas have played out in the feed of someone that I follow and really have come to admire, who works and thinks at the intersection of race and pop culture and sociology and statistics, particularly biostatistics.

He's a guy by the name of Kareem Carr. He's a biostatistics PhD student at Harvard. He's a researcher. He's got a huge following online that follows his biostats research. And the thing that tipped me over, I had sort of put him in my head of, like, "Oh, we should talk to that guy, he's an interesting dude."

He tweeted something about statistics being anti-racist the other day, that rigorous use of data and numbers is essentially an anti-racist activity because it deconstructs and lays bare the sort of biases and prejudices upon which racism is built. And I thought to myself, "That's a provocative and profound point." And so I'm really happy to have Kareem Carr on the program. Kareem, what's up, man?

Kareem Carr: Oh hey Chris, I'm really glad to be here.

Chris Hayes: I would say that there's not a ton of statisticians who've managed to (LAUGH) catch my eye online or, like, achieve a little bit of a following. So you grew up in the Caribbean, is that right?

Kareem Carr: Yup. So I'm from a small island called St. Kitts. It's part of the Federation of St. Kitts and Nevis. A lotta people haven't heard of it. It's the smallest country in the Western hemisphere. It's, like, 50,000 people. It's, like, two islands, St. Kitts and Nevis. And Alexander Hamilton was born on Nevis.

Chris Hayes: Right. (LAUGH) That I think is the one that people remember. Were you, like, a math person growin' up?

Kareem Carr: Yeah. I was definitely, like, my dad's an accountant, so yeah, I was very much exposed to numbers when I was growing up. I remember, like, you know, I would do math. And I think we had, like, new math at the time, and then my dad would show me, like, old math.

And, you know, so it would, like, kinda light my imagination about, you know, the idea that you could, like, do things different ways and get the same answer. So I always had this, like, baseline interest in math. And I guess the thing that tipped me over was that I'm not really sure when, but sometime around my preteens I read this book called Men of Mathematics.

You know, unfortunately kinda has a sexist title, but it was, like, this PR campaign for mathematicians where it was just saying how awesome, you know, Newton and Gauss and all these people were. So really got me into thinking, like, I would like to be like that.

Chris Hayes: Can we talk about math and math education for a little bit? Because I imagine you have undergraduates that you're dealing with.

Kareem Carr: Yeah.

Chris Hayes: Well, first of all, the thing you said about using different means to get problems, solve problems, I don't know if you've seen there's, like, this kind of viral Tik Tok genre that shows people from different countries at a chalkboard doing complex equations.

I mean, they're not that complex. They're just, like, multiplying two three-digit numbers. But it's wild, because they do them differently. And, like, if you're from Japan or from India, and, like, there are some shortcuts that people are using. Like, there are definitely faster and slower ways of doing it, and I definitely think we got taught the slower way. (LAUGH)

Kareem Carr: So yeah, I guess it seems like maybe there was more of an emphasis on creativity and context than there is on just rote, like, cranking out numbers. And I think it's good for, like, the motivation of students to kind of give them context.

But there's also that baseline mechanical skill that you have to develop, and so I think that that part is almost, like, important. Because, like, if it takes you too long to do arithmetic, it's gonna take you that much longer to do algebra.

If it takes you too long to do algebra, then you're gonna struggle with calculus. So, like, I've TA'd people where I've seen that that's, like, their problem. Like, they do, like, an arithmetic problem and they pull out the calculator, and they're just adding, like, eight plus seven. And it's, like, if you're gonna do that, like, every time you're doing a calculus problem, you're never gonna get there.

Chris Hayes: I mean, I also think there's this really interesting question of, like, mathematical aptitude, which I think that in the U.S. we have a kind of bad way of thinking about it, which is, like, there are math people and non-math people. And people are, like, "Oh, I'm a math person," "oh, I'm not a math person."

And people make jokes about it all the time. Journalists will be, like, "I went into journalism to stay away from math." And that's partly because math is hard, because the abstract manipulation of formal systems is basically the most difficult thing that we do with our brains. So, like, of course it's hard.

Kareem Carr: Yeah. Math is essentially pretty simple. Like, if you're doing arithmetic. I guess as, like, a technical problem, it's extremely simple. Like, if I had access to neurons and I could, like, build a machine to do arithmetic, I don't think it would take that many neurons, right? Like, we know this from, like, computer science and the type of stuff we've been developing over the last, you know, few decades.

Chris Hayes: Yeah, I mean, Ada Lovelace and Charles Babbage basically, you know, they built that mechanically back in the 19th century based on just switches essentially.

Kareem Carr: So in theory, like, the machinery in our head should be able to do arithmetic several times over, but we, like, I guess the way we're, you know, evolved, it makes math difficult. And on the other hand, like, I think, like, figuring out, like, social problems or figuring out, like, political situations, like, inherently way more complicated problem, we need to figure out a way of clearing our minds and letting go of the noise and sort of, like, focusing. Because I think ultimately, it's not that hard a task. I think we could figure it out. I think we need more patience, honestly. I think that's the problem.

Chris Hayes: There is a little very funny kinda mathematical Twitter dust-up about two plus two equaling five that you were involved in that I think might be interesting to sort of walk through for folks. And it's possible they saw that, 'cause I think at one point it was trending. Like, it was one of these weird caught in a viral updraft. What happened there?

Kareem Carr: I think I'll just keep it eye level and say that the thing that people were really concerned about, like, why it got viral is part of it was, like, arguing, like, are there fundamental truths, are there fundamental, you know, that are just rigid and, you know, firm? And that you can sort of rely on. When I went into math, that's what attracted me was, like, "Oh, you can get a right answer, and, you know, there's not that much ambiguity."

Chris Hayes: I mean, just to be clear, right, it starts a debate about what the nature of math is, right?

Kareem Carr: Yeah.

Chris Hayes: Is the nature of math essentially a set of formal systems that describes the structure of the universe inherently? Or is it a language we play like chess or, you know, Swahili or, like, any other language that we can do stuff with 'cause we're humans and can correspond to the world because we're doing that but doesn't inherently map onto the structure of the universe? Like, that's the sort of core question that you were engaged in, right?

Kareem Carr: Yeah. So I mean, I guess, okay, so here's Harry Potter, a wizard. In some sense in the world of Harry Potter, yes, Harry Potter is a wizard. But in, like, an actual fact of the world, there is no Harry Potter, there are no wizards. There's no equation between Harry Potter and wizards, right.

So if you're, like, bringing a science mind to it, it doesn't make any sense to ask, "Is Harry Potter a wizard?" And so in the same way, if I ask you, "Is two plus two equal to four?" Like, is that statement more, like, "Is Harry Potter a wizard," i.e., like, completely disconnected from reality?

You know, it's just this abstract system as you say. Or is it grounded in material facts? And I feel like when I ask someone, "Is two plus two equal to four," if I told them it was equal to five, the way that they would push back would be to say, "Oh well, I can count on my fingers, like, what are you talking about?"

Or if I had apples, you can kind of see that, you know, very concretely two plus two is equal to four. So clearly they're grounding that belief in material facts. 'Cause if it's just, like, an abstract system then, okay, I can't dispute. Like, if you like the rules of that particular game, then I might not be able to move you away from that.

And if I say, "Oh well, two plus two is equal to something else in a different system," that's just a different game. And, you know, if we're all sitting down deciding to play Monopoly and I decide I'm gonna play checkers instead, I'm being a jerk.

So, like, a lot of the anger is just, like, the idea that I'm just being a jerk by not following the rules that we're supposed to be following, like, we've agreed there's a social contract. So the other half of it is the idea that, "Okay, this thing is grounded in reality."

And then so I started just throwing up examples where two plus two is not equal to four. For instance, I was sort of saying, like, "Okay, like, let's say I give you some fluid, like, let's say 200 mL of one fluid, 200 mL of another clear fluid," and we, like, combine them.

And, you know, turns out empirically we see, "Oh, okay, it's 400 mL." And then I say, "Well, the reason one of the clear fluids was water and the other one was ethanol. And when you combine them, they actually contract in volume." And so, like, here's a concrete example.

Like, if we're using concrete examples, here's a concrete example where two plus two didn't really equal four. So then you might say, "Well, you know, come on Kareem, you just add like things, right, that's the problem." So then I might say, "Okay, but if you have to know what the things are before you can know if you can add them in a way that makes sense, then that's kind of a hole there, right?"

Like, that's kind of a problem. We move on and we say, "Okay, now we're just gonna add two containers full of water, like, 200 mL of one, 200 mL of another fluid that now we're sure they're water. But one is at 20° and the other one is at 40°.

So you combine it, and now it, like, shrinks, right, like, the average temperature's lower and so now they contract. And now it's less again. You know, okay, well, then they need to be the same temperature. So if the question is, like, how many of these caveats do you have to add, it seems like you actually have to know a lot about the material things before you can say "Is two plus two equal to four"?

And so my argument was sort of, like, there's this kind of connection between the mathematics, right, like, the descriptive mathematics and the physical system. That means that, like, sometimes two plus two might not equal four for deep physical reasons.

Chris Hayes: Right. But what occasioned the provocation? Like, why are you thinking about this question about what the nature of math is? What is the truth value of that sentence, and what does it represent? Does it represent a rule in a formal system that's a truth, like, "Harry Potter's a wizard," or nights can only, you know, go up and then over, right? Or that it's the Rosetta stone of the universe, right? (LAUGH)

Kareem Carr: So I'm a biostatistician. So I'm using math to represent reality. I mean, most statisticians do. So that's my issue that, like, when I apply math to a particular system, it's up in the air whether there's things that I derive from the mathematics are true or not.

And that's deeply important to data analysis, right, because that's where math interfaces with physical reality, right? We do some analysis, so I mean, it gets to that question you were talking about about knowledge generation. If we do a mathematical analysis on data, are we doing knowledge generation?

And are there circumstances under which we might not? One of the weak points in generating knowledge from data is if our assumptions are not accurate, our assumptions about physical reality are not accurate. That's basically why I started thinking really deeply about mathematical systems and the extent to which they represent reality.

Chris Hayes: This gets to this question about how our statistical modeling or the statistics we use represent reality, and the degree to which they're used to accurately reflect it or not. I think part of the thing that you're also sort of getting at with this sort of thought experiment about whether two plus two equals five or what the nature of this mathematical truth is just cashes out in this very real way, which is that there is an assumed authority to the statistic that defeats other things.

Kareem Carr: Mm-hmm.

Chris Hayes: In argument despite the fact that, like, the statistic, there are all kinds of true statistics that are wildly misleading or don't actually accurately reflect reality. For instance, like, you know, one about the survival rate of people that get COVID, right? You know, it's really high, yes. I mean, that's true. You see this all the time, "99% of people survive." Well, yeah, what they're saying isn't wrong. It's just that it is a way of describing the world that's insufficient or misleading.

Kareem Carr: Yeah. I mean, that's a very important distinction I think. The idea of percentages versus the absolute numbers. Even though it's less than 1% of Americans, it's a lot of people. So I feel like you can definitely get into trouble, and so that's, like, the first step of statistics.

Like, the first problem that you can have, which is, like, what is gonna be your measure of your outcome, and does it correspond to what you care about? So, like, I did a lot of statistical consulting, and the first question you always ask when someone comes in is, like, "What is the thing that you really want to ask?" Because the way that you translate that into numbers is, like, there's different ways to go. And depending on the way you go, you're gonna emphasize or deemphasize certain things.

Chris Hayes: Right. I mean, again you're getting back to this sort of deep question about the fact that this is not, like, there's some answer that's in a dark room. And you go in with a flashlight, and if you shine it there then that's it, right?

Kareem Carr: Right.

Chris Hayes: You're saying that, like, when you're dealing with problems of sufficient sophistication and sufficient difficulty that are mathematical and empirical questions that involve some statistical analysis, you're making choices about what you're looking for.

Kareem Carr: Yeah. Just a huge number of choices. So I think people just don't really get that. So I like to say that, like, statistics is critical thinking with numbers. And if you think about, like, what's good about numbers, one of the things that's good is that they're explicit, so you can kind of look and see what procedures people are using.

And they're kind of standardized, so, you know, like, we have calculus. As long as you give me some numbers, I mean, that's what a lot of statisticians do. You know, you give me some numbers, I plug it into some algorithm that I already have ready to go that just needs numbers.

You know, like, just being skeptical about numbers is kind of, like, a gateway to just statistical thinking. So, like, let's say I say, like, something, like, "Oh, like, nine out of ten dentists recommend my favorite gum." Right? Now the first question is, like, "What do you mean recommend?"

Right, like, what happened? Did you give them, like, a really crappy gum and then your gum that you wanted them to choose, and they chose that? Were they, like, "Eh, it's okay?" Were they, like, strongly enthusiastic? You know, what does strongly enthusiastic even mean?

You know, so I feel like just in general, when you just start asking those kind of questions, you immediately get into, like, statistical territory. Because that's what, like, we statisticians care about. We care about, like, what is the variation in answers over the multiple choices that you might have?"

Also, like, in that example, like, if I said, "Nine out of ten," you know, you might ask, like, "What if I had a bigger sample? Like, ten seems small. Maybe it should be a thousand." You know, then you might ask, well, maybe is it representative?

You know, like, if it's dentists that you paid off, then maybe it's not representative of the whole population. And all of those are statistical questions. Like, how can you tell if a sample is representative? How can you tell if a sample is large enough?

How can you tell if numerical value you get is different between two groups? Like, I think it's under-recognized in the public how much statistics actually plays into these questions. Like, people throw out, like, the skepticism, but I don't think they take back the idea that, hey, there's a field of study that's devoted to exploring these questions and how, for example, basic assumptions might affect, like, what conclusion you come to.

It's basically nontrivial both to translate something into mathematics and to translate something out of the mathematics into a final conclusion. And then it's also nontrivial to take a conclusion from an experiment and generalize it to something in the real world.

Chris Hayes: Right. I want to talk more about that, but just to follow up on this, like, that point about the nine out of ten dentists, like, there's two things I think there for people's thinking statistically. Like, one is, I always say when we're just dealing with this in the newsroom, like, if a number seems crazy, it's probably wrong.

Or if it seems too good to be true, it probably is. (LAUGH) Like, I mean, doesn't mean necessarily, but just, like, you know, if you see a really shocking statistic, you should approach with care. And then that point you make there, this is a key point. Every statistic you encounter in the world, nine out of ten dentists, right, is a product of a set of methodological choices.

Kareem Carr: Yeah, exactly. I mean, I really hope people will not just stop there. Just to say it, like, throw out, like, "Well, you made a lotta choices, I don't know." But I mean, I think on top of that, you can say, "If I explore the range of possible choices, how much does it change the answer?"

'Cause that gives you a sense of the uncertainty in the answer. I feel like anybody can be skeptical, but doing something with the skepticism is harder. And, like, that's to me what statistics brings to the table, which is these are all just branches of statistical practice.

Like, varying different assumptions in your analysis to see how it varies the outcome. Thinking about what if I collected the data under certain different circumstances. So, like, imagine you spent 100 days collecting some data, a priori, there's nothing different about any of the different days, right?

Like, so I mean, you could imagine an alternative world where, you know, on day 20 you were sick, you didn't go in. And ultimately you would think that, like, day 20 shouldn't be critical to your conclusion, right? You still have, like, 99 other data points.

So if you do an analysis, and day 20 was critical, that should stand out for you, right? And so, like, just kind of doing that kind of, like, "Well, what if I leave one data collection day out of my analysis and then look at each?" You know, like, I'd leave out day one and look at the analysis. I leave out day two, I look at the analysis, and so on.

Chris Hayes: Right, 'cause is there some single anomalous day that's driving the outcome?

Kareem Carr: Exactly. So I feel like what statisticians bring to the table is the idea that, okay, like, we started out with a criticism of this analysis. Like, hmm, it seems like day 20's really factoring in. But we then take the extra step of proposing a solution. Let's look at the variation across, like, different analyses if we leave different days out and see, you know, is this an outlier? And then we can start to think about what we can do from there.

Chris Hayes: Right. So I think I want to sorta capture two key concepts here. One is that, like, the skepticism is important in terms of seeing a statistic and understanding it's the product of a bunch of methodological choices. But then the next step from that isn't, like, a total, like, relativism, which is that, like, it's all essentially socially constructed. There are good and bad choices. There are better and worse methodologies. And there are models or outcomes that better or worse represent either the world or the answer to some important question.

Kareem Carr: Exactly. As a statistician, I'm biased. I want to say, like, we're unique in how we do things. And I think we are, but lots of other people can analyze data. Like, physicists are very adventurous. They like going out to their field and analyzing things.

Economists are very adventurous. They do lots of things. There's lots of people that are capable of doing mathematical analysis, but I think specifically what, like, a statistics trained person brings to the table is the idea of (SIREN) but what is your degree of confidence in this thing that you have said? Like, to what extent is this thing you are saying truth, you know, or knowledge, as you were saying?

Chris Hayes: There's been an interesting conversation kind of in the world of social science about what's called the replication crisis, which is there'll be some social psych experiment about whether people can identify a color and how fast they can, right?

And then someone will come along and rerun the experiment, and they're not getting the same results. And a lot of surveys and particularly in social psychology but in a lot of fields in the social sciences we've seen what's called this replication crisis.

People rerun the same experiments, they're not getting the same results. And when you look under the hood of that, part of what people have been talking about is essentially an aggressive use of statistics in some cases to generate what are quote "statistically significant outcomes," that when you look under the hood of what they did aren't really significant. And I wonder how much you've thought about that, because it's a central epistemic statistical crisis in social sciences right now.

Kareem Carr: Oh yeah, I've thought about this lots. (LAUGH)

Chris Hayes: This is a great tease. We're gonna go deep on the replication crisis right after we take this quick break.

Chris Hayes: Okay, so the replication crisis. What are your thoughts on it?

Kareem Carr: So you said aggressive use of statistics. Aggressive use of bad statistics, that's the first part. Like, they're doing bad statistics. One of the things that I think is the heart of the failure is, like, a failure to model what researchers are actually doing. So I'm gonna try to avoid mathematical terminology.

Chris Hayes: Please.

Kareem Carr: So I'll just say that statisticians really early on came up with a sort of almost platonic model of how you would do an experiment, where, you know, like, the scientist goes in and they have what's called a null hypothesis, which means that they have this preconception of what it would look like if the effect that they're trying to examine doesn't exist. And then they're just gonna see by doing the experiment, did the data that they observe differ significantly from data where they would expect there was no effect?

Chris Hayes: Let's just concretize this, just because this will be useful to people. So your intervening on a variable, right? So there's the null hypothesis. So right now I have a new puppy.

Kareem Carr: Awesome.

Chris Hayes: We're trying to house train the puppy. So the question is, well, how do you stop the puppy from peeing inside, which we've been largely successful at but not 100%. So the null hypothesis is, right now the standard is we have water out for the puppy all the time. That's the null hypothesis. What happens if we only put water out in the morning, right? That's the intervention, and then the data we capture is the amount of accidents that happen in the house, right?

Kareem Carr: Yeah, exactly.

Chris Hayes: That's the variable, that's the outcome. So we're intervening on this one thing, which is we're running an experiment of we're changing the water availability and we're measuring the accidents that happen in the house. This is just to concretize the sort of basic framework the statistics and the scientific method uses for investigating the cause and effect between a variable and an outcome.

Kareem Carr: Right, exactly. Imagine, you know, whether the puppy has peed or not is, like, a coin flip, like, literally a coin flip. Like, some deity up in the sky flips a coin, and if the coin lands head the puppy pees, and if it doesn't then the puppy doesn't, right?

Chris Hayes: Just completely random. (LAUGH)

Kareem Carr: Right, yeah. Well, random with a particular, like, rate. The idea is that when you do that intervention, it's gonna be less than 50/50 going forward, right? And so that's fundamentally a question about probability, right? And that's where statistics comes in.

So it's kind of this idea that you can kind of think about what the world ought to look like if what you're doing doesn't matter. And then you can see if the data differs significantly from that world. So there's a bit of hubris there, right, which is that you know what the world is like. And so if you're wrong, then your statistics will be wrong.

Chris Hayes: Let's just stop that for a second, which is that, like, what we need to be doing before we run this is that we need a good body of data about accidents with the dog. Because otherwise we don't know what we're comparing again.

Kareem Carr: Yeah.

Chris Hayes: And if you start the experiment after only a week, that may not be a big enough sample size, right? Like, maybe your expectation of your sort of null set hypothesis is wrong, because you're not sampling broadly enough to get an actual sense of, like, to your point the hubris of what the world is like in absence of the intervention.

Kareem Carr: Exactly. And yeah, scientists get caught with that all the time. They have an inaccurate view of what the world is like. Like, what really messes with your head now is, like, it's not just the world as you understand it as an uninvolved observer. It also involves the experimenter. So if you don't have your stuff together and you mess up, like, sometimes you don't do the intervention that you were planning to do.

Chris Hayes: Be, like, "I forgot to take the water away," or "I put the water out too late," or "I wasn't putting it out the same time every day," right? These, like, little fluctuations.

Kareem Carr: Right. And the other thing is that, like, a lotta times, like, you know, science is frustrating. You know, you're doing your experiment. You're maybe not paid the best. You know, you're an underpaid PhD student or whatever, (LAUGH) and you're just having a tough time.

And so, like, you maybe start an experiment. And then it's not going the way you want, and then you stop it, right? But the idea of, like, it's not going the way you want, that kind of bias is, like, what data you get, right? And that happens a lot. Like, that's a real problem.

So, like, I'm saying that sometimes just by chance you're gonna see low numbers. And the reason you might just see low numbers is just by chance, because you've been trying lots of different things. And so what happens is basically you got lucky.

You got lucky on your results. So a lotta times, like, when they're doing the statistics, they don't factor in experimental behavior. They have an optimistic view of how they do things, that, you know, they're very systematic and that they're not, like, cherry-picking. A lotta times it's easy to deceive yourself into cherry-picking.

Chris Hayes: Gotcha.

Kareem Carr: And then the next level, the analyst, so me, the statistician. So I'm coming in and analyzing the data, and I'm also frustrated and I'm also trying lots of things. And I can also get lucky in my analysis where just for whatever reason, the analysis kind of tilts my way, right?

Like, 'cause I said, like, it's subjective and you can try different things with the data, like, different assumptions. And so, like, I'm also kind of repeating my analysis, and I might get lucky about the combination of variables I use. And then that might come out to show the experiment is significant.

Somehow when you're a statistician, you need to factor all those things in. I would say that's a big part of what's driving the replication crisis is it's statistics that's not up to the job. Like, I would argue that if you could take into account all of that extra complexity, you could mathematically model the starts and restarts of the research, or if you could mathematically all the options that you're trying as you're analyzing the data, then I would say that you could get replication rates.

Chris Hayes: I see, because those things are essentially outside the analysis as it stands now, and because they're not being captured then you're not actually rerunning things precisely the same. And if you're not rerunning the things as precisely the same, then you're not gonna get the replication.

Kareem Carr: Yeah. And I guess the other thing you really want to do is just avoid cherry-picking.

Chris Hayes: I mean, there's a sort of interesting question there about, like, what's cherry-picking and what's not, right? (LAUGH) Like, let's talk about the vaccine efficacy data. Because to me, like, the vaccine efficacy data and a lot of the vaccine data is just, like, very much at one pole of things where it's just, like, it's pretty clear.

And there's lots of things you can do with the data, and the signal emerges from the noise time and time again. You know, that's reassuring, both because of what it means about it. But it's also, like, to me the vaccine question is the question where, like, all these abstract statistical questions are playing out in front of us right now. Like, the rubber is really hitting the road.

Kareem Carr: Yeah. The way in which you can really tell if your statistics are correct is if you can do a real experiment, right? So if you can ahead of time say what the results of some experiment are going to be, that's, like, the real test. And so the problem we kind of have generally is that people are not as careful in their analyses as they should be, and they're more overconfident than they should be.

So I mean, everyone who's ever heard of statistics I think has heard the idea that, like, correlation is not causation, right? And that's generally true, but there are conditions under which you can conclude causation from a statistical analysis.

And so basically the first step in kind of getting a reliable analysis from data is following these rules. And unfortunately, like, the most important part of the rule is that you have to be able to draw out the, what's called, like, the causal structure of your problem. So you have to know what causes what accurately, like, within your experiment, and then you can do a statistical analysis that you can be confident will be reliable.

Chris Hayes: I don't think I understood that. (LAUGH) So let's go back to the vaccine example, right? So that's a randomized controlled trial, which is the sort of gold standard, right?

Kareem Carr: Yes.

Chris Hayes: So it's double-blind. The experimenters don't know who's getting the placebo and who's getting the vaccine. The people don't know who's getting the placebo and who's getting the vaccine. And it's a large sample size, and they're running this, and then they're monitoring the outcomes, right?

And, you know, there's all kinds of things to worry about in that statistically, like, if you happen to have some selection bias in the people that are willing to volunteer for a vaccine, for instance. Maybe people that are willing to volunteer for a vaccine are much more careful, right?

Kareem Carr: Mm-hmm.

Chris Hayes: And, you know, and maybe that means something for what the data does. Now that effect should be taken out by the placebo vaccine intervention, right, hopefully controlling for that. But the other question is, like, maybe there's geographical dispersion. There's a lotta things to worry about when you're running this, and they run them at very, very large, like, huge scale, these randomized controlled trials to get to where we are now.

Kareem Carr: Yeah. Randomization addresses selection bias. So if you properly randomized, that shouldn't be a problem. I mean, randomized controlled trials are the gold standard for a reason, and that's the reason. That, like, randomization puts you on a footing where you should be reasonably confident that, like, your input is causally related to what you get as an output. I think what you're saying is that the result might not be generalizable.

Chris Hayes: Right.

Kareem Carr: So the randomized controlled trial, let's say you just did it on women, right? Like, no men at all, you just did it on women. The proper way to proceed from that is to say, "Okay, this vaccine is effective in women, and, you know, it's whatever percent the trial says." If you did it on a random sample of Americans, then the proper thing to say is, "This vaccine seems to be effective in Americans, you know, this is the percentage we get."

Chris Hayes: I mean, that's exactly what they did with age groups, right? I mean, precisely along those lines. And we saw them sort of move down the chain of, "Yes, this is for this age group or that age group."

Kareem Carr: Right. So yeah, so you can get in trouble when you generalize whatever, like, variable you generalize on. That could actually, like, flip things. So I don't know. Like, if you tried it on women, and then testosterone or Y chromosomes and whatever happened to be important, then that might change your outcome.

If age is important, then that might change your outcome. So you're always on the safest ground to just, like, randomly sample from a particular population, and then only generalize your results to that population. And then as you take it out of context, it can be a little dangerous. And that's where you need to understand, like, the deep causal structure.

Chris Hayes: I see what you're saying about generalizability. And the way that's been solved in this case is that, you know, what's remarkable about what we're seeing in the vaccine undertaking, right, is that you have randomized controlled trials across the world happening, like, all these different places. Whether it's among Rwandans or among Belgians or among Japanese. Like, all those countries are running randomized controlled trials.

Kareem Carr: Yeah. I guess it just depends on whether you think if they don't matter then you don't need to do that kind of specific measuring. But I mean, it's the safest policy to basically try it everywhere. I mean, that's why they do it, because it's a smart idea, which is to just try in multiple contexts. And if you're seeing the same answer in multiple, like, very varied contexts, then it can give you confidence that maybe we do kind of understand it completely and there's not gonna be a surprise.

Chris Hayes: I want you to talk a little bit about the idea of statistics being anti-racist, which you tweeted about the other day. Because you tweet a fair amount about race and the intersection of race and statistics. And obviously, statistics have been used to profoundly racist ends for many, many years and in many venues to this day.

I mean, go google "race crime statistics," and you will get a real helping of that. And so I would like to hear you talk through how you think about the intersection of race and racism and racial hierarchy and statistics, how they've been used to sort of uphold it, and how they can be used to dismantle it.

Kareem Carr: So a lot of the early founders of statistics, they were looking at human populations. And a lot of them were just, like, sort of open eugenicists. So this would be, like, Galton or the statistician Fisher. And, you know, they were early founders.

They were, like, brilliant people, but they had these biases, and they had this, like, underlying interest in sort of, like, sussing human difference. I would say, like, I would strongly emphasize that the field has very much moved on from that.

Like, you know, very recently there was a prize named after Fisher that was renamed for a Black mathematician. So there's been a lot of energy around moving away from those roots. So how statistics can be relevant to, like, anti-racism. So I think a lot of people who are attracted to this kind of idea, to, like, what a racial difference is, they naturally look at statistics.

And I guess even people who are kind of trying to, like, explore disparities, a lotta times they're quoting statistics too. Like, you know, like, maybe the Black infant mortality rate, or Black crime rates or whatever. Like, there's a lot of energy around trying to, like, understand what's going on.

And lots of times people bring statistics to that. One of the main functions of statisticians is to sort of act as referees. So I was kind of telling you about, like, that whole idea of, like, trying to, like, question the result and look at uncertainty, what is the uncertainty in a result. And so, like, a lot of the function of statisticians is just to stop overclaiming. Like, to tell people, "Hey, this is just an association, and, you know, the uncertainty is much bigger than you're saying."

Chris Hayes: Kareem, I'm a cable news pundit. I get paid to overclaim. (LAUGH)

Kareem Carr: So yeah, so I guess we're natural enemies. (LAUGHTER) So there's just all this overclaiming. So statisticians kind of tend to get in and sort of, like, put a lid on that. 'Cause I think when you actually look into statistics, it's not as clear as people say.

Like, so, for example, like, there's no clear definition of race. And I mean, like, you could say that from a biology point of view, but it's also a statistical thing. Like, it's very hard to come up with a single gene where you can say, "This gene is in, like, white people and not in Black people whatsoever."

Like, gene variants tend to spread over multiple populations. So then you might take it up another level of saying, like, "Well, maybe there are patterns in the gene differences." And even in that, you see, like, the distributions tend to overlap.

I mean, obviously people vary across the earth, but it's not clear that they vary as clusters, right? Like, because, you know, just naturally if group A is next to group B, of course they're gonna be kind of similar, right? So people tend to vary across the earth in this kind of continuous way, but to then say it's a cluster, like, that's overclaiming.

Chris Hayes: Right. You mean even just the category of a racial category is your point?

Kareem Carr: Yeah.

Chris Hayes: Right, that, like, we draw this circle around a thing. That there's a gradated variance of human difference, and then we come in with lines and we draw them around this set of dots. And we say, "Inside this set of dots is Black, and outside that set of dots is white, and this set of dots is." That's the social production of the category of race, which to your point is a social construct for exactly that reason. Because human difference is this gradated variance across this huge canvas.

Kareem Carr: Yeah. The metaphor I like to use is, like, country borders and actual, like, geographic variation. So, you know, like, the world naturally has variation, but, like, we're the ones that come in and put these hard boundaries and say, "This part is Mexico and this part is America," or whatever.

Chris Hayes: Right. In the case of the U.S., it's literally the same river. Like, for huge parts of it, or in Canada, like, the same forest, the same species of trees on one side and the other that we draw the line through.

Kareem Carr: And the other thing I think statistics can do is it can shed a light on unfair practices. So I've been seeing a lot of really great work where people maybe will, like, look at people's loan rates based on credit scores. And they see that, even when you control for all these factors, you know, there seems to be a difference that matches up with race, right?

So it's sort of, like, calling that process to account, to say, "Hey, you know, you say that you're making decisions based on all these criteria, but it seems like even when all the criteria are over and done with, there seems to be a difference."

And so it, like, pushes people to, like, "Well, maybe you need to explain what you're doing." So I feel like it can make, like, murky processes more explicit. Another thing that I find really exciting is artificial intelligence systems. So one thing that we've found is that when artificial intelligence systems are trained on biased data, so, for example, like, if you have a group of people that are rating a professor or something like that, you can kind of see, like, oh, they have some biases there.

Or when you train a language model, and it starts, you know, spouting the N-word, then you kind of know. You know, like, you train models on data, and a lot of data's just the internet. And a lotta people on the internet are racist. (LAUGH)

Chris Hayes: That's an amazingly succinct formulation of the problem. Garbage in, garbage out.

Kareem Carr: And I feel like that's interesting. I mean, it's sad, but it's also cool, because the model is learning structural racism. That's how I understand it. I think it's kind of cool, because it gives an actual quantitative way of saying, "Hey, this system is biased. Like, we learned this algorithm which should be neutral, but clearly there's some racism out there because the model learned the racism."

Chris Hayes: So what are examples? Like, I think I've read a few stories about this, about AI models that produced these biased, racist outcomes because the data set they're training on is some huge vast internet data set. And the internet data set is full of racism. But, like, what's an example of that?

Kareem Carr: Well, I mean, they are pretty ubiquitous. Like, there're some that are, like, a visual algorithm that's supposed to recognize people but, like, classifies a Black person as a gorilla. Or a very striking example where, like, there was this algorithm that was trained to generate what's called a sentiment.

So basically, it's trained to say a happy statement or a sad statement. Like, it was associated with negative emotion or something like that. So, like, if you feed in something, like, "It was a great day," it should have a high score.

And if you're, like, "Football sucks," it should have a negative score. Something like that. And so a particular researcher found that when they took the language model and applied it to just really neutral things like Mexican food versus French food versus Japanese food, they were getting, like, negative.

Chris Hayes: Oh, fascinating.

Kareem Carr: So you should basically get zero.

Chris Hayes: Right, because there's no sentiment. If you say, like, Mexican food, French food, the model should be saying zero across the board because there's no sentiment embedded. But because of whatever was being fed in that produced the algorithm, it was giving negative sentiment scores for things like Mexican food.

Kareem Carr: Yeah. And I mean, like, a lot of people want to say that algorithm didn't work, but I feel like it did work. (LAUGH) It learned, like, actual people's sentiments, right, that they have a negative sentiment. Basically they're biased. And I feel like that's helpful, because, you know, just for people of color and maybe a lot of times they maybe get messages that what they're perceiving isn't real. And this is kind of a validation that, yes, this is real.

Chris Hayes: That is so fascinating, right. So if you have completely neutral AI algorithms that are training on data and then becoming racist, what you can say is that they're picking up on something out in the world that they're learning, and they're actually learning it correctly, in the sense that they're, like, accurately reflecting say sentiments out there based on the data that they're feeding on, that they're training on.

Kareem Carr: Right. And then statistics provides the solution as well. Like, ways that you could both detect when this is happening and then ways that you might be able to, like, adjust the algorithms to be more race neutral.

Chris Hayes: Are there a lot of people doing that kind of work in statistics?

Kareem Carr: Purely in statistics, I would say, like, there is work. There's a group called AI Ethics. It's, like, a subfield of traditional computer science. They're very involved in doing this sort of I would say empirical work of cataloging all algorithms work, and then doing, like, the, you know, really good work on thinking about fairness. So, like, how would you measure fairness and that kind of thing.

I guess on the other side, I would say that in mainstream statistics, we're doing a lot of work on "What does race even mean when you put it into an algorithm?" You can imagine that you have some outcome that differs because of race, and there could just be a lot of different mechanisms.

So I guess the most simple one is it might be hereditary, right? Like, 'cause race is at least a little bit correlated with lineage, right? Like, so imagine, like, you see, like, a difference in diabetes rates. That might be, like, hereditary.

Chris Hayes: Okay, you're saying between, like, say, Black Americans and white Americans?

Kareem Carr: Yeah, it could be that. But it could also be my perception of myself. So maybe my health behaviors I would say, "Oh, these things are for me, these things are not for me, because, you know, like, it's not part of my culture," or whatever.

So I might be making decisions like that. There might also be the possibility that other people are treating me different, so maybe, you know, I'm not getting the kinda healthcare that I'm supposed to get because of discrimination. It could also be, like, historical legacy.

Like, so, you know, because of things that happened way in the past, different groups are just on different social trajectories and that's why you're seeing this difference. And then there's another discussion that's going on, which is, can we even understand race as a cause? So can race even cause an outcome? So it's kind of, like, a fusion of, like, statistical work and then deep investigations of what causation means.

Chris Hayes: Yeah. Our good friend, Issa Kohler-Hausmann is doing some work on that at Yale at the law school there. And I've had conversations with her about this. And talk a little bit more about what that means, 'cause I think it's a little hard to process. Like, what does it mean to have race as a cause?

Kareem Carr: Well, you know, if you think about, like, physics, right? Like, when you think about something causing some outcome, usually they're kind of directly physically related, right? Like, either some particle, like, physically interacts with some other particle as an exchange of energy, or maybe they have some kind of explicit property.

Like, Particle A has a mass, Particle B has a different mass. And then because of the force of gravity, like, they're, you know, like, they're influenced by this thing that, like, causes them to act differently. If you're sort of used to thinking about causation like that, when you get to thinking about race and you try to think about, like, "Well, how does this abstract category cause these disparate outcomes?"

And so then that pulls you into thinking about mechanisms, right? And so those were the potential mechanisms I outlined. Like, so whenever I see race act in the world, maybe it's acting through all of those different mechanisms. In particular, how do I know what the different ratios are?

What are the ratio of mechanisms for any particular thing that I observe? So the skepticism about race being a cause is basically the idea that race might not be one thing. Like, race as a causal element might not be one thing. And so, like, mathematically, you know, can do a lotta things with numbers, but not everything makes sense.

Chris Hayes: Yeah. So when we say, like, you know, "there's higher maternal mortality for Black women, right, in birth," which is true. And there's a crazy disparate number, right? It's, like, well, it's not the color of someone's skin clearly that's causing that. Then there might be, like, right, there might be a hereditary aspect, then maybe socioeconomic, so maybe we play with that.

But to the extent that these differences keep appearing when you filter against different things, then you're left with this category called race, but you haven't really explained anything. Because, like, it's, like, "Well, we've run the regressions and it's not an education effect and it's not a geographical effect, and it's not a socioeconomic effect, it's not a wealth effect, it's a race effect." It's, like, what does that mean? Like, how is that doing, right? It doesn't explain anything to say something is a race effect.

Kareem Carr: Yeah, exactly. So, like, as a scientist you want to kinda understand what you're doing, right? Like, you want to understand what this causal element is, how does it act in the world. And so that's kind of one of the reasons why, like, this maybe a push to disambiguate, like, the different ways in which it might cause whatever the outcome is that you observe. (MUSIC)

Chris Hayes: Kareem Carr is a biostatistics PhD student at Harvard, researcher. He's got a big online following, and you could follow him on Twitter @Kareem_Carr. That's K-A-R-E-E-M underscore C-A-R-R. Kareem, great to have you on the program.

Kareem Carr: Thanks for having me.

Chris Hayes: Once again, my great thanks to Kareem Carr for a great and very heady conversation. I think I'm going to have to listen back over to it and see if I can parse it all out. I hope you guys enjoyed it. It is that time of year, you know, it's your favorite time of year.

What time is it? The holiday mailbag episode of WITHpod. We want to hear from you whatever's on your mind. Let us know any questions, comments, feedback, stuff you like about the show, stuff you love about the show, stuff you absolutely love about the show, all of it. Send those to our email WITHpod@gmail.com. Tweet us at the hashtag WITHpod.

"Why is This Happening?" is presented by MSNBC and NBC News, produced by the All In team and features music by Eddie Cooper. You can see more of our work, including links to things we mentioned here by going to NBCNews.com/whyisthishappening.

Tweet us with the hashtag #WITHpod, email WITHpod@gmail.com. “Why Is This Happening?” is presented by MSNBC and NBC News, produced by the “All In” team and features music by Eddie Cooper. You can see more of our work, including links to things we mentioned here, by going to nbcnews.com/whyisthishappening.