Writing in Yorùbá on the internet has come a long way. I remember my undergraduate days, and my first computer, and my inability to combine both subdots like ọ and ẹ with tone marks like ọ́ and ẹ̀. Microsoft (or is it unicode) didn’t seem to care enough about the language.
That’s what I thought, in any case. I was just upset that I couldn’t write some things properly. We had to make do with the possibility of ambiguity. But over time, I realized that the problem wasn’t limited to Yorùbá. African languages in general seemed to get a raw deal.
For the big tech giants that had then begun to spring up, it was more important to focus on the languages that made money for the business. These languages, as I later realized, weren’t even always the most populous. They were just those the provided returns.
Imagine realizing, for instance, that number of speakers of Swedish, Norwegian, and Danish all together just about equals half or so of the population of Yorùbá speakers. Yet Apple’s Siri exists in each of these languages, and doesn’t in Yorùbá.
And so, instead of waiting for the tech giants to care about the speakers of Yorùbá, for instance, I thought it was important to get like-minded individuals who have the skills to create tools in the language, to just do it, hoping that it will find its users over time.
That’s why we created the tone-marking software, which is free. That’s why we did ttsyoruba.com, which is (I believe) the first free speech synthesis in the language. And that’s why we work to create these solutions to fill a gap that seems to get bigger every day.
I realized the problem of the absence of enough Yorùbá data on the web in early 2015 when I started reading up on the process of creating synthetic speech. Most tools used to create text to speech use a corpus of texts taken from around the web. With English, it’s not a problem.
The reason why Google Translate (which uses machine learning and neural networks) gives more accurate translations in these languages is because it has had enough materials to use in training the machine.
Until the BBC started full web service in Igbo, Yorùbá, and Pidgin, it was hard to find one website where one can find fully readable text in the language. There were very very few. And so, Garbage in Garbage Out, the result of efforts to create automatic tools is frustration.
Couple that with the fact that there hasn’t been many new books published in these languages since, perhaps, the late eighties when the economy went down the drains. So getting content to digitize — to mitigate the problem — becomes a big and expensive problem.
So, our efforts are geared towards resetting the status quo. Allow people to write in the language on the web >> provide a large corpora over time >> create tools that make the language even easier to use (voice tools, artificial intelligence, etc) >> revitalize the language.
And make these tools open and available so that they can be used by others in other language communities in Nigeria and across the continent. Hence making YorubaName.com open source on GitHub github.com/yorubaname
By the way, contrary to the instructions/suggestions in our journalistic style guide for BBC journalists in Yorùbá/Igbo service, the text on the website today is still not always diacritized, so the headache hasn’t gone away.
So if we can create automatic diacritizers, then their editors will also need to do less work. And others who want to write the language without worrying about the diacritics can do so. The research into this in ongoing, and we’re seeking volunteers with L1 competence in Yorùbá.
Ah, I just found @daanvanesch is following this account, which is nice, because — if I remember correctly — he supervised my work in creating the diacritical markers for Nigerian languages on what eventually became Google’s GBoard in 2015/2016.
Google has been, I believe, one of the most responsive companies to the issues in language representation on the internet. Being able to balance a profit motive with a social good is one of its strong suits. The end result benefits both the company and the languages in question.
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Kọ́lá Túbọ̀sún (African Language Digital Activism)
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!