Profile picture
SHARIAsource @SHARIAsource
, 13 tweets, 6 min read Read on Twitter
Matthew Miller of University of Maryland on Developments in Arabic Script Optical Character Recognition: Technical Advances and Organizational Barriers #WIDH2018
With credit given to @sarahsavant1 @maximromanov @SHARIAsource and many others #WIDH2018
Benjamin Kiessling’s Kraken was the start. Does not require character segmentation, which eliminates a common source of error in Arabic and Persian. A thousand lines of training data for a certain typeface was fed, eventually reaching 98.7% accuracy. #WIDH2018
Scan quality was not as important as initially hypothesized for reaching this accuracy. #WIDH2018
But was it generalizable to other typefaces? If you train a model, you can get very accurate digital texts. #WIDH2018
Worked with JSTOR to create an Arabic pilot using the publicly available al-Abhath journal (going back to 1948). Arabic-script only error rate: 98.46% #WIDH2018
For Persian: 98.62% #WIDH2018
Detailed accuracy study: persistent problems in Arabic and Persian OCR: diacritics, white space, Arabic in Persian texts, doubling of letters, multi-language failure, ligatures, skewed lines, footnotes #WIDH2018
Line segmentation is also an issue, the fix for which is under way. #WIDH2018
Collaborating with @SHARIAsource to create an interface for the Arabic OCR geared towards scholars and researchers. Includes a number of useful features, such as robust version control. #WIDH2018
To build a robust scholarly corpus: there are organizational barriers, need sustainable funding, need topic editors. #WIDH2018
Thread on work of @M_T_Miller of @RoshanInstitute above. #WIDH2018
Keep an eye out for updates regarding this Arabic OCR project and the accompanying interface (with @M_T_Miller @maximromanov @sarahsavant1 @SHARIAsource) #WIDH2018
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to SHARIAsource
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!