Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Vik Paruchuri

@VikParuchuri

Aug 16 • 11 tweets • 5 min read • Read on X

Announcing Surya OCR 2! It uses a new architecture and improves on v1 in every way:

- OCR with automatic language detection for 93 languages (no more specifying languages!)
- More accurate on old/noisy documents
- 20% faster
- Basic English handwriting support

Find Surya here - .

Surya OCR 2 is more accurate across all document types. It also compares favorably to Tesseract and Google Cloud OCR. The benchmarking script is in the repo.

Language is not hinted to Surya 2 for these benchmarks. github.com/VikParuchuri/s…

My earlier benchmark compared mainly clean documents, so I made a new noisy document benchmark to compare v2 and v1. This was created from tapuscorpus by @Alix_Tz. Again, language is not hinted.

v2 is 20% faster than v1. I tested using an A10 GPU, with batch size 525 for v2, and 300 for v1, which use about the same VRAM (20GB).

This is despite having twice as many active inference parameters!

There's now very basic support for English handwriting, with improvements to come soon.

As you can see, there are mistakes, but this is a good step forward. It's hard to find handwriting data, please let me know if you have any you can share!

Surya OCR 2 uses a custom architecture with a swin transformer encoder. I really wanted to use a non-autoregressive model, but it's hard to beat the kv cache baseline.

Instead, I used a shallow decoder to get the benefits of autoregression, but with much faster processing.

Unlike the original model, which used an MoE, you no longer need to specify the language of a document. Languages are determined automatically.

You can still give language hints to improve accuracy - up to 4 languages can be hinted.

See the hosted version and on-prem options at .

The model is trained from scratch, so it's okay for commercial usage. There are some restrictions if your company is over $5M in revenue or funding raised (see link for details).datalab.to

I'm hiring people to work with me on Surya and other models! We'll be training and open sourcing models with task-specific architectures.

More details here - .

Join the Discord - - if you want to discuss Surya generally.discord.com/channels/11820…
discord.gg/h9JC3hFY85

I'm still testing some approaches to OCR, and will hopefully have more updates soon, including:

- Improved speed, especially on CPU
- Better handwriting support
- Math support

This model is built on amazing open source work. Thank you to everyone whose open contributions made it possible, especially the @huggingface transformers team and the swin transformer authors.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @VikParuchuri

Vik Paruchuri

@VikParuchuri

Jul 12

I just released new surya layout and text detection models:

- 30% faster on GPU, 4x faster on CPU, 12x faster on MPS
- Accuracy very slightly better
- When I merge this into marker, it will be 15% faster on GPU, 3x on CPU, 7x on MPS

I used a modified version of efficientvit from MIT - - which was then adapted by @wightmanr . I made some small modifications, including adding a segmentation head. Thanks for much for the architecture/code!github.com/mit-han-lab/ef…

I didn't change the training data much, but the new models do allow for higher resolution (since there's no global softmax attention), so benchmark scores are slightly better.

Read 4 tweets

Vik Paruchuri

@VikParuchuri

Jan 12

Announcing surya - a multilingual text line detection model for documents. It gives you accurate line-level bboxes and column breaks.

Find it here - . github.com/VikParuchuri/s…

Surya was trained on a diverse set of documents, including scientific papers. It works with every language that I've tried.

It should work with good quality scanned documents as well due to image augmentation.

Text detection is step 1 in building a GPU-accelerated OCR model that is more accurate than tesseract. Step 2 is to build the text recognition system - I'll be working on that in the next couple of weeks.

Read 13 tweets

Vik Paruchuri

@VikParuchuri

Nov 30, 2023

I'm excited to ship marker - a pdf to markdown converter that is 10x faster than nougat, more accurate outside arXiv, and has low hallucination risk. Marker is optimized for throughput, like converting LLM pretrain data.

Find it here - . github.com/VikParuchuri/m…

Nougat is an amazing model, but is slow and hallucination-prone (1.5% of pages in arXiv, 5%+ outside) due to autoregressive decoding.

Marker converts and cleans text incrementally. It uses 4 models - column detector, layout detector, nougat, postprocessor. It OCRs if needed.

To benchmark, I found some documents that had parallel latex and pdf versions, then converted the latex to markdown. 1/2 from arXiv, 1/2 textbooks.

I compared the references to the converted versions and computed a 0-100 alignment/accuracy score.

Read 22 tweets

Vik Paruchuri

@VikParuchuri

Oct 15, 2019

@LambdaSchool

1/ In this thread, I'll discuss @LambdaSchool, a bootcamp that charges 17% of your pre-tax income for up to 2 years (ISA).

tl;dr Lambda is much more expensive than the average bootcamp, and has similar outcomes. 75% of Lambda students could pay an avg of $9k less elsewhere.

2/ First, outcomes.

85.9% of Lambda graduates get a job within 180 days, with a median 60k salary.

A survey across multiple bootcamps found that 79% of all bootcamp grads were employed within 120 days, with a median 65k salary.

3/ Students at Lambda pay 17% of their pre-tax income for 24 months in which they make more than $4,166 (50k a year). The ISA expires after 60 months.

76% of employed Lambda grads get a first job paying > 50k. Assuming no salary increases, their tuition will be 23.2k on average.

Read 16 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Vik Paruchuri

Try unrolling a thread yourself!

More from @VikParuchuri

Vik Paruchuri

Vik Paruchuri

Vik Paruchuri

Vik Paruchuri

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!