Proteins are one of the main drivers of human diseases. Scientists are now mapping all of the proteins in the human body in a similar way to how the Human Genome Project mapped genes. On this episode of The Show About Science, Neil Kelleher, PhD invites Nate to his lab on the campus of Northwestern University to explain how it all works.
Photo credit: Jenny Butkus
Learn more about Neil’s work here: proteomics.northwestern.edu/
Nate: Hello everyone, and welcome to another episode of The Show About Science. This is your host, Nate. So back in the nineties, scientists were hard at work cataloging the genes in the human body. This groundbreaking initiative was called the Human Genome Project. On today’s episode of the show, we’re going to learn about the sequel, which stars proteins.
So there are about 20,000 genes that make up our DNA and genes are the blueprints for making proteins. So how many proteins exist in our body? And what can we learn about human health from identifying them? To help me answer these questions, I reached out to today’s guest.
Neil Kelleher: Hi, my name’s Neil.
I am Director of Northwestern Proteomics or protein analysis infrastructure, or just the machines and process that we use to weigh molecules basically, and count atoms.
Nate: Our journey down the protein rabbit hole began oddly enough, by looking at the floor in a building right by Lake Michigan.
Neil Kelleher: Yeah. So we’re in Silverman Hall and staring at the structure of Lyrica. Also called pregabalin and it’s got how many carbon atoms Nate?
Nate: I believe when I last counted, it was eight carbon, 17 hydrogen, two oxygen and one nitrogen.
Neil Kelleher: Yep. And this has helped a lot of people in pain from like diabetic nerve pain and other things.
Nate: And I think you said 3 billion dollars it made the inventor.
Neil Kelleher: Yeah. Pfizer sells this drug and it’s helped a lot of people. And Rick Silverman here at Northwestern invented it.
Nate: Wait a minute. Is this another molecule?
Neil Kelleher: That is cool.
That’s another inlaid thing on the floor here in Silverman Hall.
Nate: Yeah. And there’s like a little snake or something rapping around it or ribbon.
Neil Kelleher: Yeah. Right.
Nate: And it looks like one of those oxygen or carbon molecules.
Neil Kelleher: Yeah. It’s the same color. Mostly for the decor. But this is what we call a nucleosome. So that’s supposed to be DNA wrapped twice around a group of eight proteins.
Nate: Ooh. That’s cool.
Neil Kelleher: And, and that’s how you get your genome, which is all your DNA, 3 billion letters, AGC and Ts, right?
Neil Kelleher: And how do you get a, it’s a meter long if you stretched out all the DNA. In your genome, it’s this long.
Nate: Yes. But you wouldn’t be able to see it.
Neil Kelleher: You wouldn’t.
Nate: Because this is so small.
Neil Kelleher: You’re right. And so how do you pack that into a little 10 micron cell? Or…
Nate: Gotta break it up!?
Neil Kelleher: No, you have to package it.
And there’s about a hundred million of these little balls made of proteins and the DNA gets packaged like the old phone cords that get super coiled.
Nate: How many would that be for the whole body?
Neil Kelleher: Like 10 to the 13 or, yeah, I’m using geek speak.
Nate: But like, billions or trillions.
Neil Kelleher: Yeah. It’s trillions.
It’s 10 trillion different cells that make up your body.
Neil Kelleher: And in almost every one of them, you have the whole genome. Stuffed inside the nucleus.
Nate: And this is the representation of just one of those things of DNA wrapped around eight proteins.
Neil Kelleher: Exactly. And that’s called a nucleosome, but it’s just, yeah.
It’s how we evolve to pack the genome into a cell.
Nate: This must be what being in college feels like. I mean, I’m just looking at the floor and I’m already learning something new. I can’t wait to see how much I’ll have learned by the end of this episode.
Alright, so. Where are we going next?
Neil Kelleher: Yeah, Nate, I thought we would swing down into the instrument bay, where we have a lot of instruments that measure how big molecules are and how much they weigh or their molecular weight.
Nate: All right, sounds good. Let’s go.
Neil Kelleher: Okay. Rock and roll. Here we go.
Elevator: Going Down.
Neil Kelleher: We even have an elevator that talks to us.
Neil Kelleher: And, we’re going down to the place where we analyze protein molecules.
Nate: All right. Sounds good.
Nate: And just like that, the doors open and we’re in the basement of Silverman Hall.
Neil Kelleher: There’s all sorts of science equipment down here.
Nate: Yeah. I can see it through those windows.
One quick walk down the hall later and we had arrived.
Neil Kelleher: Okay. Hang on to your hat. So you’ll probably hear a lot of background noise.
Nate: Yep. Yeah. I, I definitely do.
Neil Kelleher: Yeah. What do you see, Nate?
Nate: Uh, so this thing looks like a printer except smarter. That thing looks like a, a big feat of human engineering. Uh, bunch of wires, circuit boards.
Neil Kelleher: Yep.
Nate: Oh, this thing looks. A 3D printer except with vials.
Neil Kelleher: Right. Yeah. No. And so we, we put in, like, we start with like human blood or cells. Process it. Separate proteins and then weigh all those proteins.
Nate: So how do you know what protein it is that you’re measuring?
Neil Kelleher: Yeah. That’s a great question. That’s like the core, the main thing that we’re supposed to do in proteomics.
Which we can define later, but the analysis of proteins, you wanna know which protein you wanna identify, which protein it is. And that is another way of saying what human gene gave rise to that protein. And there are 20,300 human genes. And the way that we do that is to basically take the protein.
And destroy it, controllably, break it up, take it out behind the woodshed, break it into a bunch of pieces. Uh, splinter it, get the masses of those pieces. And they are like a fingerprint.
Nate: Yes, the molecular woodshed.
Neil Kelleher: Yeah, that’s right. It’s like a sawmill or it’s like, yeah. And, and we identify the proteins by breaking them into pieces, measuring the mass of the pieces and we say, oh, this is from that human gene.
And so we can tell which gene the protein came from and all the decorations on the protein.
Nate: What do you mean by decorations on the protein?
Neil Kelleher: Yeah. So when we sequence the human genome, we were able to tell, okay, well, here’s here should be your proteins, right? 3% of the genome codes for proteins, you know, the 20 amino acids that make up the sequence of a protein. We can predict what the proteins will be. But here’s the thing. When you actually create your body and you’re living, there’s a lot of changes to proteins.
Nate: And so, yeah, you’re predicting which proteins you’re measuring, but since the human body is sometimes creating new proteins, you don’t know for sure exactly which proteins you’re measuring.
Neil Kelleher: The exact form of the protein. So you may know, Hey, it’s from this human gene. Oh, okay. That’s the identification of the protein, but it comes in like a hundred different forms, different flavors. Yeah. Like, let’s say there’s a human gene that encodes the protein ice cream. Okay, great. Then you say, oh, well we identified ice cream as distinct from other foods, so, okay.
We got a protein called ice cream, but that’s not enough. What is actually driving human biology is all the different flavors of that ice cream. So if you have chocolate, vanilla, strawberry, that’s a way of saying different flavors of the same thing. So like in your liver, you could be expressing the chocolate form of the protein, but in your kidney or in your brain, you have vanilla or strawberry. Different forms of the proteins. And we call those proteoforms.
Nate: Vanilla for life.
Neil Kelleher: But that’s why I think the next step after the genome project is to determine all these different flavors of all the different proteins in our bodies. And that’s. That’s what we’re trying to do.
So after leaving the instrument bay, I was intrigued, but I still had a number of questions and we still need to Neil to define proteomics for us.
So what is proteomics?
Neil Kelleher: That’s a great question, Nate. And do you know what genomics is? The genome?
Nate: What is genomics?
Neil Kelleher: Yeah. Okay, cool. Like the genome, the human genome is all of our DNA, right?
AGCS and Ts, those letters that spell out our DNA. So the study of all the genes is genomics. Mm. So the study of all the proteins is proteomics.
Nate: I see.
Neil Kelleher: Yeah.
Nate: And so I should probably have said this earlier, but what is a protein?
Neil Kelleher: Wow. That’s a foundational question. Yeah. Perfect. So a protein is a string of amino acids.
So like amino acids are things like lycine, arginine, there’s 20 of them. So like as DNA has four letters, that a G, C and T that make up the DNA. Amino acids, there’s 20 flavors of amino acids, or 20 different kinds of amino acids. And then they just string up and there could be 500 of them in a row. And then you have a 500 amino acid long protein.
And when I say sequence them, you, you determine the order of the amino acids. So, oh, it goes lycine, arginine, histidine, tyrosine, proline. And then you determine that 500 of them in a row. That is a protein and it folds up and it goes off and does jobs in your body.
Nate: And so the amino acids, they’re, they’re like a kind of molecule and they have like their own atoms. So like, just take any one of them, like, lycine I think you said.
Neil Kelleher: Sure.
Nate: And so like what molecules would be in a typical amino acid?
Neil Kelleher: So you remember down on the floor, you saw the drug Lyrica on the floor. That is basically, like, an amino acid.
Nate: Mm. So it would have like carbon, hydrogen, nitrogen oxygen in an amino acid?
Neil Kelleher: Yep.
Nate: So really those four molecules are the building blocks of amino acids, proteins, and the, the body.
Neil Kelleher: That’s exactly right. You’re built up in layers. And so you start with atoms in the amino acids, then the amino acids string together to form a protein. Proteins, get together and form parts of cells. And then you have cells. And that’s what constructs your body and keeps you healthy.
Nate: I asked Neil to demonstrate how this works or how this could work in a protein. And what he did was he decided to invent a whole new protein on the spot.
…so tell me the first letter.
Neil Kelleher: I think E let’s do E and then L V I S and then P R E S L E Y. There you go. All right. Those are all that, that would spell a peptide. That would spell a protein Elvis Presley.
Nate: And just like that we had sequenced the Elvis Presley protein.
Now this protein could actually be made, but it doesn’t currently exist in real life.
Neil Kelleher: You can make it and analyze it.
Nate: [Whispering] It’s the Elvis Presley protein.
Neil Kelleher: It’s the Elvis Presley protein.
Concert Announcer: All right. Elvis has left the building.
Nate: No, he hasn’t left the building. He’s just in the basement being analyzed. Don’t worry. We’ll come back to him later. But before we get back to Elvis, we need to talk a little bit about the Human Genome Project.
So what was the human genome project and how does it relate to proteomics?
Neil Kelleher: Yeah, that’s a great question. Right? So the human genome project was a project funded by the United States government for about $4 billion. And it ran between 1993 say and 2003. Somewhere in that decade. And it was this amazing big science project.
It was like the moonshot. It’s like, okay, we’re going to the moon. And we just did this amazing thing. Being able to map and sequence the human genome, which is to determine the 3.1 billion, with a B, letters of the genome. So we were able to at the molecular level, the level of chemistry, sequence our DNA.
And that was just a revolution in understanding biology.
Nate: So how is proteomics kind of the sequel to the Human Genome Project?
Neil Kelleher: Wow. That’s a great word, sequel. I like that a lot. It, it is. It’s an obvious next step, not to say that it’s easy. But once you have the sequence or, you know, the genome, what we know now 20 years later is that there’s 20,000 human genes. Each of which creates a protein and that protein goes often does things. It has a life of its own, and it comes in many different flavors and it has many different functions. So like in your liver or your kidney, Or your eyes, you will have different proteins that make up your body. And this is unknown.
This is a frontier that has not been mapped out, and it’s really holding us back for improving people’s lives and detecting disease earlier and more precisely. So the genome project was $4 billion. We have a similar project of about that scale. A 10 year long [project]. It’s called the Human Proteoform Project.
And it would determine a few billion proteoforms for a dollar each.
Nate: Wait a dollar each.
Neil Kelleher: Yeah. So in 1995, along with very questionable music in that decade, we had reached a dollar per base. So to sequence DNA, like A G C, C T A. There’s only four letters. We could do that as a species. Right. Human beings invented the technology to do that for $1 per base.
So if it was 3 billion bases in the genome, how much would it be?
3 billion, 3 billion dollars.
That’s right. And so that’s all we’re saying. A group of like 400 scientists around the world kind of got together and were like, hey, we need to do the same thing, a dollar per protein form. And that’s the Human Proteoform Project. And what we would do, like, for the Elvis Presley protein.
Neil Kelleher: It’s like, let’s see…
Nate: The made up Elvis Presley protein.
Neil Kelleher: It is a made up…
Nate: We apologize to our listeners. You do not have Elvis Presley inside of you.
Neil Kelleher: No.
Yeah. And so if you were to sequence, this made up Elvis Presley protein, you would spell out the letters, the 12 letters, and it would be, let’s see if I can do this.
Uh oh, this is a quiz. It’s glutamic acid. E. And then L is leucine. V is valine. Isoleucine is I S is serine.
Neil Kelleher: P is proline. R is, uh, wait for it. Arginine E is glutamic acid .S is serine. Leucine again. E again, lots of E and everybody’s into Wordle, right? Have you heard this Wordle?
Nate: Oh, I, I, I do Wordle every day.
I also do Worldle every day.
Neil Kelleher: Wow. I’m gonna have to look up what the heck Worldle is.
Nate: Yep. I I’ll start for you.
Neil Kelleher: Okay. And then just to finish off Presley, it’s E is glutamic acid and then Y is tyrosine. And that’s what it means to sequence the protein Elvis Presley. And they each have a mass. So like E is 129. L is 113. V is 99.
Nate: And what do those numbers mean?
Neil Kelleher: That’s the mass of all the atoms. So each atom has an atomic mass. And so if you…
Nate: And, and that’s like their number on the periodic table, right?
Neil Kelleher: That’s exactly right.
They each have a mass, right. Yeah. Perfect. And so the, the value of knowing the mass of the protein is it can be used to identify the protein and it has all the decorations on there as well.
The mass of the chemical changes to the protein. But here’s the thing. This is why we need the Human Proteoform Project to generate the phone book or the catalog of all the proteoforms that exist in our cells and in our body fluids. Because then the mass, we would know what that thing was. And so it’s sort of a discovery, technology, development project, but directly tied to human disease because all diseases involve human proteins.
Nate: So you got into like science and scientific engineering at a young age, but when and how did you get interested in proteomics?
Neil Kelleher: Yeah, in the nineties, I went to graduate school, and I was interested in proteins. I was interested in enzymes. So enzymes are proteins that do a specific thing in your body.
And, that put me on a course to study proteins and how they work, how they’re modified and decorated, and that can change their function. Like you could turn a protein on or off.
Neil Kelleher: There’s a switch called a, well… Do you want the geeky word or do you…
Neil Kelleher: You do. Okay. We’re all in for, I love it. It’s called a post translational modification.
Nate: Or I assume a PTM.
Neil Kelleher: A PTM. You know, sorry, pounding the table. Cause you got it. Yeah. A PTM. That’s right. We all are running around. Hey, how do we detect PTMs and detect what they do. How do we assign their function in our bodies? And that’s what we do.
Nate: I’m writing PTM on the whiteboard nearby.
That was what that sound was.
Neil Kelleher: PTM that’s right. Like Elvis Presley, the letters that you can add on top of the letters, a decoration, a PTM, and that could turn the function of that on or off. And that’s why we need to know all the decorations.
Nate: Is there , like, an example of a PTM that, like, people don’t really know about, but they use every day.
Neil Kelleher: Oh yeah. So there’s about 300 different decorations, different PTMs known in biology. But there’s some really common ones. And we’re sort of done discovering them, but, there’s one called phosphorylation. It’s a phosphorus with four oxygens.
Neil Kelleher: And then that gets stuck on a serine or a tyrosine or a threonine.
Any of those amino acids, you stick a phosphorylation on. And guess what goes crazy in cancer?
Neil Kelleher: Phosphorylation. The whole signaling network of the cell gets out of control. And that is what drives your cells to proliferate and for people to get really sick.
Nate: And so you wanna turn that off.
Neil Kelleher: Right. You wanna know exactly how phosphorylation is happening in the cells, and that’s why we need to do phospho proteoform measurement, which now we’re getting really geeky and long words.
But yeah, we just need to know basically what protein molecules are there.
Nate: So after talking to Neil, I was thinking – this all is so cool, but it also seems really hard and complicated. So can any like person just do this work or does it take a very smart scientist?
Neil Kelleher: People when they approach science, they think, oh, I can’t do that. Right. It’s so hard to understand. You’re helping to show people it’s just a decision. Like I’m not that, I mean, I hustle a lot. I try really hard, but I wasn’t gifted with like, you know what I would consider like, you know…
Nate: I mean the scientist could come from anywhere. Yeah. It’s like in Ratatouille. Anyone could cook, anyone can science.
Neil Kelleher: Anyone can science and you just, let’s just what you choose to spend time on.
Neil Kelleher: And it’s really, we need so many more people in stem. You know, careers and to be persistent and I think more and more people are, and they will.
Nate: Well, whether you choose to spend your time cooking or sciencing. Not a word, but I’m going with it, you are always welcome to listen to The Show About Science. So, thank you for being on the show Neil.
Neil Kelleher: Thanks, Nate. Great to spend some time with you and thanks for your interest in proteins.
Yeah, it’s a really cool subject.
There you have it, folks. The Show About Science is complete. Music on today’s episode was composed by the amazing Breakmaster Cylinder with additional music by Epidemic Sound. And as always, our theme song was composed by Jeff, Dan and Theresa Brooks.
Okay, dad, you can shut the recording off.
Leave a Reply