I got an email inviting me to try the beta testing of the Google AI audiobook service. It basically converts your ebooks you have uploaded on Google Books as a Publisher into an audiobook. They have many different voices to chose from, women and men, different accents/nationalities, etc. By the way, through the nature of metadata, Google themselves will probably see this so: Screw You, Google. Learn how to tell the difference between a dialect and a disability. (For everyone else: I’m Baltimorean, this is why I said that)
Admittedly, some of the voices sound really, really human-like, such as “Madison”. As long as there’s no character dialogue, I would most likely believe at a glance that it was a human being – a bit of a bored one but a human being all the same. With character dialog, it becomes really obvious.
Now, I have actual human narrators for my works. All my works that are in audio are all by living people. I like tech (tho I should probably like it a little less because I spent so much time trying to edit out the breaths of my poor narrator, Soraya Butler, on Dreamer … because I was used to AI voices and thought “Oh, noes, is it okay if people can hear a narrator breathe normally?” Yes, yes it is. It very is.) but human narration is not going away any time soon. The thing about human narrators is that they can inflect, have tone and express emotion that an AI simply can’t grasp. I’ve worked with AI, human emotions are hard to replicate, especially emotional tone. (And the AI doesn’t get emotion much either, especially when the emotion play a role in decision making.) And it showed that here in my samplings of the audiobook Ais. The AI got the narration down fantastically … as long as you want a calm, mostly unfeeling voice.
This audiobook idea might be good for people who are putting up literary works that don’t really require much emotion and inflection, like a non-fiction work perhaps. For fiction, especially in speculative fiction, which is what I do, it might not really be too helpful, outside of helping catch remaining sneaky typos as you read your book along with the spoken word. Self interruptions, multi-character interruptions, trailing off, things like that are not really caught well in the AI. If a character is winding up from anger, the AI will 1000% not convey that. Everything is pretty, welp, flat for the most part. If you have calm prose, this AI route is the route for you.
Unless you use Findaway Voices for distro. Findaway Voices explicitly says in its contract that it will not distribute any AI narrated works whatsoever, and that they do indeed check. So that means the AI audiobook would only live on Google Audiobooks … which Findaway already distributes to. That’s for Findaway and Google to sort out.
Speaking of contracts, Google’s AI contract isn’t 100% “free and clear”. I would have to read it more but basically, they own the voice while you own the words. So, if there is any dispute going, they can snatch the voice, pretty much taking down your book. What if Google wants to censor the book somehow? AI uses deep machine learning to say the correct words correctly, it therefore “knows” what it’s saying. What if you write a book and there’s a passage about Uighur people? Even if it is just a plain ol’ Uighur character sitting in a park eating ice cream, no mention whatsoever of genocide or oppression. Google already has been caught with being sneaky about this stuff. The example that stands out in my mind is when Hong Kong was taken over by China, if you put into Google Translate “I’m sad for Hong Kong” to translate into Chinese, it will literally say “I’m happy for Hong Kong”. The two words “happy”/”sad” are not similar in Pinyin (English written Chinese) or in neither traditional or simplified Chinese characters. But there would have been no way for a person to spot that unless they also knew both English & Chinese. Someone at Google had put into the coding, “when someone types in this, put out that”. It only was changed back to the accurate Chinese phrase once people pointed that out en masse.
All AI, algorithms, deep machine learning, all code everywhere is just plain 1s and 0s. All they do is execute orders, no independent thinking whatsoever. If you type in the code, “every time someone says ‘hello’, jump three times and chirp”, that’s exactly what the code is going to pump out. Unless you made an error in the code somewhere, the tech is going to jump three times and chirp when someone says “hello”. It isn’t because the tech is an English speaker by nature but someone told the tech, “when you hear this particular string of sound, this is how you react”. It could be Japanese, it could be Swahili, whatever, it is up to the person coding, not the technology itself.
This means that if you write a book about Black issues and Google feels like suppressing that because, who knows why, maybe because clearly no one at Google really likes reading the book Algorithms of Oppression by Safiya Noble there or whatever random reason floats across their brain, their AI voice is going to be informed either “don’t say these parts”, “skip these passages/chapters” or “say something else” and, most importantly, “make it look seamless”. The average audiobook listener is not reading along with the book in hand. Just like the English/Chinese example above, it only works if you don’t know you’re being tricked. It wouldn’t be difficult to mod things up from Google’s side. As long as you’re not checking what they’re doing, it flies.
Also, who knows, it could get the account, publisher or author flagged/shadow-banned without knowing it. Or passed over to governments and institutions who are being nosy for really nefarious (and usually oppressive) reasons. Because, remember, the AI “knows” what it is saying. It can be useful so that people don’t try to upload, say, “The Beautyful Ones Are Not Yet Born” into audiobooks and thus steal royalties, the AI can point it out. But it also can flag the book on the back-end as “Talks about Black issues in a way that makes anti-Black people moody, caution alert”.
AI is very good at deciphering human words but it still kind of screws up when it comes to our inflections, accents (again, screw you Google. Accents are not disabilities (sorry everyone, just gotta throw that in there)), etc etc. It’s not as easy to figure which works could be a “problem” work with human parts, but when you feed a literal text into the AI and let the AI pump out whatever it pumps out? Wow, so much easier, looking at it from a “nefarious coder” perspective.
Will I be using AI for my works? Maybe just to run a last check on the print versions, reading the words along with the book so I can catch remaining typos. But I’m not publishing them and I’m not going to give the AI that much help where I don’t need to. It’s just going to be so that I can give my human narrator a cleaner script and get rid of the last typos that somehow eluded me. A faceless tool that will be replaced the second I find a better alternative, in other words.