The power of suggestion: Birding in the Age of AI
By Ryan Nakano
Whenever I hear the phrase “the power of suggestion”, I immediately think of magic, or rather, the role of the magician. Suggestion allows the magician to influence our perceptions in order to pull off stunning feats that feel real, sometimes real enough to make a lasting impression on the way we see the world. For birders in the modern age, just like townsfolk in the Arthurian legend, the most prominent and powerful magician hands-down is the one and only Merlin.
When an email “Is there any data on Merlin Sound ID accuracy?” apparated out of thin air and into my inbox from our GGBA listserv, I clicked and disappeared down a rabbit hole.
Note: This blog does not answer the original poster’s question (apologies Steve). Rather, what started as a single question, multiplied very rapidly into a steady stream of questions, aka, more rabbits — how does Merlin sound ID even work? If a bird chirps in a forest but only AI identifies it, can it really be listed? If you only use Merlin are you truly “birding”? What does using AI mean for your potential to learn bird species? Maybe more importantly, what might using AI mean for our relationship to birds, other birders, and nature in general?
The Reveal
Okay, before we get back to the rabbit hole, it feels important to break the cardinal rule for all magicians, we must reveal (at least slightly) how the act is done. In this case, we must explain what Merlin Sound ID is and how it works.
Merlin Sound ID, is an AI machine learning feature of Cornell Lab of Ornithology’s Merlin Bird ID App. The Merlin app more generally, assists users in identifying birds from live recordings (Merlin Sound ID), by your photos, and through other tools like its “step-by-step” function, using location and observed field markings.
For the purposes of this blog, pay no mind to the other features. We are only here for Merlin Sound ID.
Trust me, it’s more than enough.
The Merlin Sound ID works by capturing your recording in the field in real-time, mapping the frequencies in the recording visually via a spectrogram. Enter — magic, or, the power of suggestion.
The Macaulay Library holds nearly 2.4 million sound recordings of birds, making it the largest repository of bird audio in the world. Using a machine learning model (deep convolutional neural network, whatever that means), Merlin Sound ID has been trained on at least 140 hours of bird audio from the Library that’s been tagged by sound ID experts to correspond to specific species. When you start recording, the app predicts a particular bird species through the use of an algorithm that adjusts its computational values to match to the closest spectrogram it “knows” based on its training, to the real-time spectrogram in your hand, or… something like that.
And Abra Kadabra Alakazam! The Merlin Sound ID “creates as it speaks!”
Misdirection (When Merlin Gets it Wrong)
Now that we pseudo understand the trick, let’s get back to the rabbit.
Imagine a field. Flying high above is an Accipiter. You look up and see what appears to be a juvenile Red-tailed Hawk. It calls out. You look down and a tiny wizard appears in the palm of your hand waving a wand up and down mirroring the sound of the bird’s call. “My friend, that right there, is a Bald Eagle.” The wizard is wise, having obtained the knowledge of thousands, maybe millions of mortals. You blink and look up again. You start to wonder if the wizard is correct.
Or, given that some birds perform their own kind of magic by mimicking the calls of other birds, or get caught up in a chorus of closely related species at the same time and place, the wizard cannot split their calls and thus casts a misinformed spell.
This same wizard has even been known to transmogrify nonbirds into birds. My favorite is the reversing garbage truck into a Great Blue Heron. How does Merlin do it?!
Are these accurate examples of Merlin misidentification? Maybe not, but hey, we didn’t use ChatGPT to write them, so kudos to us.
On the other hand, these analogies and scenarios do beg the question: what is the cost of Merlin’s occasional misdirection?
For one’s own personal knowledge and learnings, this kind of misdirection could easily lead to false associations between certain bird species and their calls in the future. After all, if one is just beginning their birding journey, it seems easy enough to not question a tool built on the collected evidence of experts.
Add on top of this the desire to contribute “one’s ID” with a community science tool like eBird, and the misdirection has the potential to disseminate and create a cluttered environment of mis-ID’s for eBird moderators to sort out. (Round of applause for our moderators, magicians in their own right.)
For our moderators’ sake, please don’t list on eBird if your only form of ID comes from Merlin.
Levitation (Who wants to walk the walk, when we can float?)
New thought experiment: What if Merlin Sound ID always got it right, never making an ID mistake?
Time travelling back a couple decades, do you remember the moment when David Blaine levitated? I do. The year was 1997, and I was at home watching this man hover what felt like a foot off the ground. While he appeared suspended in mid-air I was suspending my own disbelief. Two thoughts occurred in my adolescent mind at the time. It’s not possible AND how quickly can I learn to levitate?
Being out in the field with people who can proficiently bird by ear feels like magic, doesn’t it? You’ll be strolling about and all the sudden they’ll stop dead in their tracks and start calling out common names like an elementary school teacher in the middle of roll call. I’m not there yet, and honestly I don’t intend to be necessarily. Maybe I hear a Red-winged Blackbird or California Towhee. Still, I’m always impressed by others’ ability to pick out species by the sound of their vocalizations alone. They’ve done the work. Playing back recordings, connecting visual cues with calls. It’s incredible what we humans are capable of when passion meets persistence. That said, there are times where I’ve caught myself thinking — it’s not possible and how can I learn quickly.
If Merlin Sound ID never made mistakes, then potentially, the quickest and understandable path toward building my knowledge, or to know a specific bird call, would be to open the app and hit record. Boom, Brown-headed Cowbird.
The thing is, a magician, or rather a wizard, is not known for being knowledgeable per say, but wise. Wisdom comes when one learns to wield knowledge and apply it effectively in any given situation, drawing from one’s experience. That said, the two are related. Which brings us back to David.
When David Blaine levitates, he draws from his understanding and perfected practice of Ed Balducci’s levitation, which itself requires one to understand one’s physical position to others as well as the power of physical movements to redirect the attention of an audience. The trick is an embodied practice, learned through trial and error to produce the most accurate effect possible. In short, it takes real in-the-field, or in this case, on the street, practice. Does David Blaine shortcut the process? No. But, and here’s where things get interesting, he does film the trick, capturing the audience’s reaction, and then edits in footage of the same trick at the same location where he is assisted by wire, thereby exaggerating the levitation in post-production.
What are we meant to take away from this?
In a 2024 study “Does Using Artificial Intelligence in Citizen Science Support Volunteers’ Learning? An Experimental Study in Ornithology” published in Citizen Science Theory and Practice, researchers found that “novice participants of community science projects significantly improve their content knowledge of familiar birds in their neighbourhood, and that eBird users outperform Merlin users on the knowledge post-test. Although AI may improve volunteer productivity and retention, there is a risk that it may reduce their learning.”
In the study, three groups with no prior birding knowledge were compared to each other, one group used eBird to try and identify birds they observed, one used Merlin to do the same, and a control group (that did not participate in birding at all). Comparing pre and post-bird identification tests between each group, researchers found that the group that improved the most was the eBird group, theorizing based on participant testimonial, that “students learn better when they have to search for information” as it forces the individual to slow down, observe carefully, and compare details. Interestingly enough, 26% of this eBird group did not pass and finish the post-identification test compared to 6% who used Merlin.
Still following the ball? Just like the use of post-production in David’s levitation videos, the use of Merlin becomes a powerful tool to attract and inspire a larger audience to watch or listen in awe to the magic that surrounds them, it may even hold their attention long enough to spark a curiosity about the practice itself. At the same time, the tool cannot replace the practice, a practice solely responsible for training the tool in the first place.
Linking Three Rings (Materials, Competence, and Meaning)
In the article “Birdwatching in the digital age: how technologies shape relationships to birds” published in Bioscience in May, we receive a useful framework in which to further discuss tools like Merlin Sound ID and other “automated species identification tools”. As a social practice, birding can be broken down into three components; materials, competences, and meanings. As new tools are introduced these three components shift, not unlike the thinking of Canadian philosopher Marshal McLuhan who believed that media technologies were not only an extension of the senses but a catalyst for changing our lived experience and understanding of reality.
Given its ability to automatically identify bird species (albeit faulty at times), one could conclude that it makes all other materials obsolete in the field. Given that it is faulty, binoculars/scopes/cameras providing visual confirmation for ID continue to be necessary.
With competences (skills and techniques), it’s equally complicated. Of course, one has to learn how to download and use Merlin, but honestly, the learning curve is the click of a button. Like I said, magic.
As a Sound ID, what does the technology do for our ears if we are not hard of hearing? Does it cement connections between a certain call and bird species when making its predictions, or does it operate like the cell phone, where we once memorized our best friends’ numbers and have now since forgotten them given our reassurance that they will always be contained in one safe place for when we need them most?
In my novice-birding experience using Merlin, the app does seem to encourage me to continue listening for a particular call after its initial prediction, and maybe not surprisingly, cues me into looking for a particular species. I think more than anything, this “knowing” what I’m looking for or listening to before I can confirm or cross-reference with other sources, sharpens my desire to positively ID the bird in question. It’s like having a little clue on a map without a key. All of a sudden my productivity from having this clue skyrockets because I’m now in full birding mode, activating all senses.
Another reason for entering full birding mode — if Merlin is the expert wizard and he heard a particular bird species, you bet I want the challenge of matching this magician, of taking Merlin up on his sidequest. Why can’t I be satisfied with challenging myself? I’ll save that for another blog. I guess what I’m saying is, I think the value of Merlin Sound ID is less about competency and more about renewed curiosity and a strange relationship to being challenged by others. The other thing I’m saying is, maybe curiosity and a challenge can lead to competency, potentially even accelerating competency over time simply from my renewed productivity via a greater awareness of my surroundings.
So what does it mean to use such a tool?
As far as our relationship to birds is concerned, I think it means several things. For one, it reminds us just how incredibly diverse the language of birds really is, even within a single species. In the tool’s misidentification, it highlights the prevalence of mimicry and the close taxonomic proximity of some species to others. If the increase of productivity for community science theory is correct, it might also mean that the use of automatic species identification tools via AI could bring us ever more valuable data to instruct our conservation efforts now and in the future. At the same time, just like the camera is sometimes seen as being a distraction from being truly present in nature, looking at one’s phone every other minute to see what it heard seems to distance us from our connection to a nature we are, and are supposed to be, a part of.
Being out in the field with a friend who looks at their phone in real time and declares a name, does not a magician make. Yet, if that same friend trained under Merlin, extending their ear to narrow their sight, filtering the chatter for a single call, before calling out without the aid of their mentor, then yes, magic indeed, passed on like an incantation.
A quick note: If your aim is neither wisdom nor magic, more power to you (literally).
I want to pause here for an acknowledgment and yet another rabbit hole. The acknowledgement is that, if you live in an area with a relatively supported birding community, then it is highly likely that you have your own human Merlin’s who are hopefully willing to impart their knowledge, share their secrets, and build and normalize a culture of social learning when it comes to birding.
As we’ve learned from the use of social media, sometimes our technology which is marketed to help connect us to each other ends up isolating our experiences and disincentivizing the social benefit that comes with learning and growing in community. Why ask a friend when you can ask Google? Good question, let’s ask Google, becomes our default response and on and on. Truly, it is okay to be a magician’s assistant. Not only will you extend your learning process over time, but also, you benefit the Magician who now gets to further solidify their learning by passing on their knowledge, and you get to be in the theater, among the audience, where the magic happens. Something about this feels important.
Two other notes: If you do not live in an area with a supported birding community, then you can think of Merlin Sound ID as filling in a particular gap. Also, I’m not sure if it goes without saying but, Merlin Sound ID for birders who are deaf or who have varying degrees of hearing loss could potentially serve an even more important role depending on its use and need.
Out of Thin Air: The Final Act
Now that you are thoroughly hypnotized under the spell of a long and arduous rant. I commit my final act. This entire time you thought you were on a journey of magic metaphor and bird ID apps, but there’s something you’ve missed, something we’ve all missed. Let’s take one step back, take off the blindfold, and with our bird’s eye view remember — generative AI writ large is… pretty awful for the environment.
If birding is about expanding our awareness to our surroundings, then I would be remiss in not identifying the proverbial elephant in the room. The need for massive amounts of electricity, water, coal and gas to support data centers used for training generative AI, is incredible and terrifying. According to one researcher, a single inquiry from ChatGPT approximates to leaving a light bulb on for 20 minutes.
Leaving Merlin aside for a moment, as I do not believe its training model to be a primary culprit for such energy use, looking at you ChatGPT, I do think this is something that we should be paying attention to from a conservation and environmental lens. Even with more calls to “green” AI infrastructure, the amount of energy required based on the rate of demand by these companies, and subsequently consumers, is only projected to increase over time.
This is the cost of our convenience.
Drawing the Curtains
As Uncle Ben once said to Peter Parker, “With great power comes great responsibility.”
The use of AI in an app like Merlin is powerful and I sincerely believe it to be a useful tool both for recreational birding and for applications of community science and conservation efforts. But like all technology, using it responsibly is the key, something the Cornell Lab of Ornithology expressly addresses on its “Merlin Sound ID best practices” page which I have copy and pasted below.
“Merlin’s suggestions are just a starting point. You should always independently verify each suggestion before reporting it. Tap a Merlin suggestion to see and hear it in the spectrogram. Compare each suggestion against Merlin’s example recordings. Then tap “Details” and consider the range map, behavior, and habitat description of the bird that is vocalizing–does it seem like a good fit? You’ll also want to consider the seasonality of the species (try Merlin bar charts under Explore Birds). If possible, try to see the bird making the sound to confirm the ID. Like all birders, Merlin can make mistakes. If you’re not confident that Merlin’s suggestion is correct, or if you have not considered it independently, don’t report it to eBird. (Do not report whatever Merlin says without considering it first!).”
As for AI as a whole, it asks us to pick a card. Sometimes the card we pick is called efficiency, sometimes cost-effective, sometimes novel, and at its best, socially and environmentally progressive (see Birdcast for AI-powered Bird Migration mapping). We are then told to examine the card, “our card” for a moment, just long enough to remember it. Holding it in our hands, we feel empowered, as if we have made a conscious choice, as if we have agency in impacting the outcome of the trick. Meanwhile the deck is being cut, and the magician is memorizing the positions of other cards. We hand the card back. A shuffle occurs. We are reassured by the shuffle. And then, we are asked the question, “is this your card?”.
At this point, there are several responses to consider. 1. No this is not my card, it is just one of many cards I was offered, I did not ask for this. 2. Wait…do the trick again, I want to be astounded. 3. It is my card, how did you do that?
And with that I leave you with less answers and more questions. What kind of magic is worth the power to summon and for what end? What are the social and environmental implications of casting such spells? What happens when magic is so strong it renders us speechless? What does it mean if we stop asking?
Ryan Nakano is GGBA’s Director of Communications and an occasional casual birder.