Hello Whale

I still find myself wishing that Wired had gone with the headline that I did, but that’s neither here nor there.

What is here and there is one of the most fascinating use cases of AI that I have read about. The title of the article is “How to Use AI to Talk to Whales—and Save Life on Earth”. The second half may be a little bit of hyperbole – we’ll find out, one way or another – but the article makes for fascinating reading.

Humans have always known how to listen to other species, of course. Fishers throughout history collaborated with whales and dolphins to mutual benefit: a fish for them, a fish for us. In 19th-century Australia, a pod of killer whales was known to herd baleen whales into a bay near a whalers’ settlement, then slap their tails to alert the humans to ready the harpoons. (In exchange for their help, the orcas got first dibs on their favorite cuts, the lips and tongue.) Meanwhile, in the icy waters of Beringia, Inupiat people listened and spoke to bowhead whales before their hunts. As the environmental historian Bathsheba Demuth writes in her book Floating Coast, the Inupiat thought of the whales as neighbors occupying “their own country” who chose at times to offer their lives to humans—if humans deserved it.


That led me down one of the weirder Wikipedia paths I’ve recently been on, and I got to learn about the Killer Whales of Eden, and about Old Tom. I also got to learn about the Earth Species project. The idea is quite something:

The motivating intuition for ESP was that modern machine learning can build powerful semantic representations of language which we can use to unlock communication with other species.


It’s early days yet, of course, but this paper made for interesting reading:

We introduce the Bioacoustic Cocktail Party Problem Network (BioCPPNet), a lightweight, modular, and robust U-Net-based machine learning architecture optimized for bioacoustic source separation across diverse biological taxa. Employing learnable or handcrafted encoders, BioCPPNet operates directly on the raw acoustic mixture waveform containing overlapping vocalizations and separates the input waveform into estimates corresponding to the sources in the mixture. Predictions are compared to the reference ground truth waveforms by searching over the space of (output, target) source order permutations, and we train using an objective function motivated by perceptual audio quality. We apply BioCPPNet to several species with unique vocal behavior, including macaques, bottlenose dolphins, and Egyptian fruit bats, and we evaluate reconstruction quality of separated waveforms using the scale-invariant signal-to-distortion ratio (SI-SDR) and downstream identity classification accuracy. We consider mixtures with two or three concurrent conspecific vocalizers, and we examine separation performance in open and closed speaker scenarios. To our knowledge, this paper redefines the state-of-the-art in end-to-end single-channel bioacoustic source separation in a permutation-invariant regime across a heterogeneous set of non-human species. This study serves as a major step toward the deployment of bioacoustic source separation systems for processing substantial volumes of previously unusable data containing overlapping bioacoustic signals.


(If more than half of this flew above your head, as it did in my case, ask ChatGPT to do an ELI5.)

Part of the reason this is so fascinating is because of two separate problems. How does AI “learn” language? Second, how do animals talk communicate?

But these new machine learning methods bypassed semantics altogether. They treated languages as geometric shapes and found where the shapes overlapped. If a machine could translate any language into English without needing to understand it first, Raskin thought, could it do the same with a gelada monkey’s wobble, an elephant’s infrasound, a bee’s waggle dance? A year later, Raskin and Selvitelle formed Earth Species


Raskin, one of the co-founders of the Earth Species project, thinks that AI will allow us to understand animals in much the same way that improvements in optics helped seventeenth century astronomers understand the night sky better.

This might be a good time to say “OK, cool, but so what, exactly?”

If you’re wondering what the hell that was all about, it is this:

Teaming up with DFO and Rainforest Connection, we used deep neural networks to track, monitor and observe the orcas’ behavior in the Salish Sea, and send alerts to Canadian authorities. With this information, marine mammal managers can monitor and treat whales that are injured, sick or distressed. In case of an oil spill, the detection system can allow experts to locate the animals and use specialized equipment to alter the direction of travel of the orcas to prevent exposure.
To teach a machine learning model to recognize orca sounds, DFO provided 1,800 hours of underwater audio and 68,000 labels that identified the origin of the sound. The model is used to analyze live sounds that DFO monitors across 12 locations within the Southern Resident Killer Whales’ habitat. When the model hears a noise that indicates the presence of a killer whale, it’s displayed on the Rainforest Connection (a grantee of the Google AI Impact Challenge) web interface, and live alerts on their location are provided to DFO and key partners through an app that Rainforest Connection developed.


So not only can AI “learn” these languages, but we are already putting them to good use – and we’re just getting started.

But learning these languages is complicated, because animals may not just “speak” their language. And that’s where the answer to the second question comes in – how do animals communicate?

Ari Friedlaender has something that Earth Species needs: lots and lots of data. Friedlaender researches whale behavior at UC Santa Cruz. He got started as a tag guy: the person who balances at the edge of a boat as it chases a whale, holds out a long pole with a suction-cupped biologging tag attached to the end, and slaps the tag on a whale’s back as it rounds the surface. This is harder than it seems. Friedlaender proved himself adept—“I played sports in college,” he explains—and was soon traveling the seas on tagging expeditions.

Friedlaender’s multifaceted data is especially useful for Earth Species because, as any biologist will tell you, animal communication isn’t purely verbal. It involves gestures and movement just as often as vocalizations. Diverse data sets get Earth Species closer to developing algorithms that can work across the full spectrum of the animal kingdom. The organization’s most recent work focuses on foundation models, the same kind of computation that powers generative AI like ChatGPT. Earlier this year, Earth Species published the first foundation model for animal communication. The model can already accurately sort beluga whale calls, and Earth Species plans to apply it to species as disparate as orangutans (who bellow), elephants (who send seismic rumbles through the ground), and jumping spiders (who vibrate their legs). Katie Zacarian, Earth Species’ CEO, describes the model this way: “Everything’s a nail, and it’s a hammer.”


Opportunity costs are everywhere, right? So also with this very cool technology – the benefits are there all right, but so are the risks:

Rutz is careful to say that generating calls will be a decision made thoughtfully, when the time requires it. In a paper published in Science in July, he praised the extraordinary usefulness of machine learning. But he cautions that humans should think hard before intervening in animal lives. Just as AI’s potential remains unknown, it may carry risks that extend beyond what we can imagine. Rutz cites as an example the new songs composed each year by humpback whales that spread across the world like hit singles. Should these whales pick up on an AI-generated phrase and incorporate that into their routine, humans would be altering a million-year-old culture. “I think that is one of the systems that should be off-limits, at least for now,” he told me. “Who has the right to have a chat with a humpback whale?”


Read the whole article, as always, to get a full flavor of all of what is possible and what the possible risks and dangers are. As with pretty much everybody else on the planet, I have been keen to find out all of what AI can do. You may, given your outlook towards AI, want to replace the word “keen” with something that fits better with your worldview. But I think it safe to say that most of us are curious to find out what AI can do.

And this is certainly one of the cooler (coolest?) applications of AI that I have come across.

Let me end with this delightful, but also haunting paragraph:

If you could speak to a whale, what would you say? Would you ask White Gladis, the killer whale elevated to meme status this summer for sinking yachts off the Iberian coast, what motivated her rampage—fun, delusion, revenge? Would you tell Tahlequah, the mother orca grieving the death of her calf, that you, too, lost a child? Payne once said that if given the chance to speak to a whale, he’d like to hear its normal gossip: loves, feuds, infidelities. Also: “Sorry would be a good word to say.”