What's the Longest Word in the English Language?
And what does it have to do with the unusual chemistry of carbon?
Shakespeare had a go at at the longest word in the English language with “honorific-abilitude-in-i-tat-i-bus.” If you play the game of stacking suffixes and prefixes together, you can get “antidisestablishmentarianism,” one letter longer for a total of 28 letters. But the longest word by far appeared in 1964 in Chemical Abstracts, a dictionary-like reference for chemists. The word describes a protein in what’s called the tobacco mosaic virus, and it runs 1,185 letters long. Besides being too long to write here, it tells us a lot about the unusual chemistry of carbon.
Sam Kean: What’s the longest word in the English language? Shakespeare had a go at it, with “honorific-abilitude-in-i-tat-i-bus.” Which, depending on whom you ask, either means “the state of being loaded with honors,” or is an anagram declaring that Francis Bacon, not the Bard of Avon, really wrote Shakespeare’s plays.
But that word, a mere 27 letters, doesn’t stretch nearly long enough to count as the longest word in the English language.
Of course, determining the longest word is a bit of a mug’s game. What even qualifies as English can differ in different contexts. Shakespeare’s word was spoken by a clown in Love’s Labor’s Lost, and obviously comes from Latin. But maybe we shouldn’t count foreign words, even in English sentences.
Plus, if you play the game of stacking suffixes and prefixes together, you can get “antidisestablishmentarianism,” twenty-eight letters. Or what about nonsense words like “supercalifragilisticexpialidocious,” thirty-four letters. Under those rules, you can string letters along pretty much indefinitely.
But let’s adopt a sensible definition: the longest word to appear in an English-language document whose purpose was not to set the record for listing the longest word ever. In that case, the word we’re after appeared in 1964 in Chemical Abstracts, a dictionary-like reference for chemists.
The word describes a protein in what’s called the tobacco mosaic virus. This anaconda runs 1,185 letters long. It reads...
...and so on. If you really want to hear me stumble my way through it, there’s a little bonus on my website. But the real point of all this is that there’s a cool little chemistry lesson buried inside this longest word. Because if we look a little deeper, we wouldn’t even have this longest word without the very special chemistry of the element carbon.
From the Science History Institute this is Sam Kean and the Disappearing Spoon—a topsy-turvy science-y history podcast. Where footnotes become the real story.
The tobacco mosaic virus was the first virus ever discovered, in 1892. It infects tobacco plants, obviously. It was also the first virus to have its shape and structure studied in a rigorous way. Some of this work was done by none other than Rosalind Franklin of DNA double-helix fame.
However, the word here describes an individual protein in the virus. And proteins are built up from the most versatile element on the periodic table, carbon.
Specifically, carbon forms the backbone of amino acids. Cells make proteins by stringing amino acids together like beads on a necklace. For instance, this tobacco virus protein contains 159 amino acids in a row.
Now, the names of most individual amino acids end in the same three letters, i-n-e. There’s glycine [GLIGH-seen] and isoleucine [eye-so-LOO-seen] and proline [PRO-leen] and one that first discovered in asparagus, called asparagine.
But when biochemists are talking about stringing amino acids together into proteins, they take a linguistic shortcut. They turn the i-n-e at the end into a y-l—an lll sound, to make the word flow a little smoother. That’s why the tobacco virus protein reads acetyl-seryl-tyrosyl-seryl..., and so on, with all those lll sounds lulling you along.
Taken in order, one after the other, these linked lll words describe the protein’s structure precisely. That is, if you want to know the order of the amino acids in the protein, you can just read them off in a sequence. That’s why biochemists in the 1950s and early 1960s gave molecules official names like acetylseryl... et cetera. So they could reconstruct the whole molecule from the name alone.
Now, this system was exact, if a little exhausting. Historically, the tendency to mash words together like this has its roots in the influence that German scientists had on chemistry—especially the compound-crazy German language.
But what allows cells to string amino acids together in the first place? Mostly the special properties of carbon.
Now, I know this might be like a bad acid-flashback to high school chemistry, but the overall idea here is pretty simple. Because I have a secret for you. You can actually summarize half of all chemistry in one sentence—in one little rule.
Think of atoms like tiny balls. Inside that ball are particles called electrons. Some atoms have just a few electrons, some have several dozen electrons.
Now, these electrons are stacked in different layers inside the atom. Some electrons huddle in layers near the center. Others hover in layers near the surface.
And the secret rule that summarizes half of chemistry is this: that most atoms want eight electrons in the outer energy level, the level nearest the surface. Sensibly enough, it’s called the octet rule.
And what drives most of chemistry is that atoms that don’t have eight electrons in the outer level will do whatever it takes to reach eight. Sometimes they add electrons. Sometimes they shed electrons. Regardless, they’ll fight, beg, barter, steal, make and break alliances, and pull off whatever dirty tricks they need to in order to get eight. They almost always want eight.
So, how does that play out in real life? Consider oxygen.
Oxygen has six electrons in its outer energy level. Six is two below eight, so oxygen is always hunting around for two additional electrons from other atoms.
And it turns out that oxygen is one of the more aggressive atoms in chemistry. We usually think about oxygen as nice and beneficial, since we breathe it. But it’s actually a chemical bully—always beating up on other atoms and stealing their electrons. And it does so to get to eight electrons in its outer layer.
Or consider sodium. Sodium has just one electron in its outer energy level. So does it aggressively go after seven more electrons? Nope. It’s much easier to go the other way, and shed its lone outer electron.
Once it does shed that lone outer electron, then the next energy level beneath it becomes the new outer layer. This new outer layer already has eight electrons, so once again, sodium is happy.
Okay, fine, but how this help us understand chemistry? Well, consider sodium and chlorine together. As I mentioned, sodium has one extra electron to shed. And chlorine has seven electrons in its outer layer, one short of eight.
So sodium and chlorine are natural partners. Sodium gives up its electron, chlorine takes it in. The result is sodium chloride, or common table salt. Again, that’s most of chemistry right there—atoms will fight, beg, barter, steal, make and break alliances to move up or down to eight electrons in their outer energy level.
But then consider carbon. Unlike sodium or chlorine, which are close to eight already, carbon has four electrons in its outer level. That leaves it stranded in no-man’s-land, equally distant from happiness on either end.
So instead of shedding or stealing electrons, carbon tends to share them with other atoms. It’s one of the elements that forms alliances.
And to be frank, carbon has pretty low standards for sharing. It’s thirsty, it’s desperate, it’s willing to latch onto almost anything.
But in many ways, that promiscuity is carbon’s greatest virtue. Carbon has to share electrons with up to four other atoms at once, on all sides. This in turn allows carbon to build complex chains, or even three-dimensional webs of molecules. And because it shares electrons instead of stealing them, the bonds that carbon forms with these other atoms are steady and stable.
That’s a good thing for living creatures like us. We need those complex chains and webs to do complicated things inside our bodies. Without this feature of carbon, life as we know it simply wouldn’t exist.
And here’s where we finally get back to amino acids. They’re the building blocks of the proteins that run our bodies. If you look at the structure of amino acids, they consist of various parts. But all amino acids have a core of two carbon atoms and one nitrogen atom.
Nitrogen is a lot like carbon, in that it has five electrons in its outer energy level. That’s kind of far from eight, so it also has to share electrons and form alliances. This makes carbon and nitrogen natural allies.
When cells are building proteins, they take the individual amino acids and link them together, one after another. A carbon at the head of one amino acid shares an electron with a nitrogen at the tail end of another amino acid, snapping them together. And since all amino acids have a carbon on one end and a nitrogen on the other, you can pretty much snap amino acids together for as long as you want, in a gigantic chain.
That’s how we get long complicated proteins that run our bodies. Not coincidentally, it’s also how we get really, really long names for proteins. The long words are a direct result of the chemistry of carbon and nitrogen.
Thanks to advances in technology since the 1960s, scientists nowadays can describe vastly longer proteins than then the tobacco mosaic virus protein mentioned before. In fact, some publications—including the Guinness Book of World Records—once listed the longest word in the English language as a 1,913-letter word for another protein.
But they’re wrong. That word would indeed be 60 percent longer than the tobacco virus protein name. But trust me, I’ve searched through many, many dusty volumes of that chemical reference book, Chemical Abstracts. And that 1,913-letter word doesn’t appear in print there.
Why? Because scientists in the mid-1960s could see where things were headed. Pretty soon they’d have 2,000-letter words, then 3,000-letter words, and even longer. And frankly, they were tired of having to string them together and spell-check all that crap.
So they ended up dropping the whole unwieldy German system of just mashing the lll words together. Instead, they started using shorter names, even for official purposes. For instance, the 1,913-letter word is officially known as the tryptophan synthetase α protein. That’s just 28 letters, way shorter than the 1,185-letter protein I’m about to read in a moment.
For this reason, the Guinness Book of World Records has since rescinded the title for the nineteen-hundred-letter word as the longest. That means the tobacco mosaic virus protein is officially the longest word that’s ever appeared in print for non-record-setting purposes. And given the changing fashion in chemical names, it’s likely to hold that record for a long time.
As a final note, the name for the current record-holder for the most gargantuan protein in nature would run 189,819 letters if spelled out. That would fill roughly 110 pages of an average book. But its official name, mercifully, is titin [titan], just five letters long.
This is the Disappearing Spoon podcast, brought to you by the Science History Institute. Find out more about their library, museum, and multimedia magazine at sciencehistory.org.
Make sure you check out the Science History Institute’s other awesome podcast, Distillations. You can find their in-depth narrative stories and interviews about everything from Space Junk to Sex, Drugs, and Migraines anywhere you get your podcasts, and on their website: Distillations.org.
You can find more incredible stories from my books at samkean.com. You can also book me as a speaker at your school or event.
If you like this podcast, please support it at patreon.com/disappearingspoon. It costs as little as seven cents per day. You can also get bonus episodes and signed books.
Please spread the word to others as well, and subscribe in iTunes, Stitcher, or other places.
This episode was written by me--Sam Kean. It was mixed by Jonathan Pfeffer, and produced by Mariel Carr and Rigoberto Hernandez.
Thanks for listening...