Vik Paruchuri Profile picture
Founder @dataquestio, where you can learn data science interactively online. Worked at @edXOnline and @StateDept.
Jan 12 13 tweets 4 min read
Announcing surya - a multilingual text line detection model for documents. It gives you accurate line-level bboxes and column breaks.

Find it here - . github.com/VikParuchuri/s…
Image Surya was trained on a diverse set of documents, including scientific papers. It works with every language that I've tried.

It should work with good quality scanned documents as well due to image augmentation. Image
Nov 30, 2023 22 tweets 6 min read
I'm excited to ship marker - a pdf to markdown converter that is 10x faster than nougat, more accurate outside arXiv, and has low hallucination risk. Marker is optimized for throughput, like converting LLM pretrain data.

Find it here - . github.com/VikParuchuri/m…
Image Nougat is an amazing model, but is slow and hallucination-prone (1.5% of pages in arXiv, 5%+ outside) due to autoregressive decoding.

Marker converts and cleans text incrementally. It uses 4 models - column detector, layout detector, nougat, postprocessor. It OCRs if needed.
Oct 15, 2019 16 tweets 4 min read
1/ In this thread, I'll discuss @LambdaSchool, a bootcamp that charges 17% of your pre-tax income for up to 2 years (ISA).

tl;dr Lambda is much more expensive than the average bootcamp, and has similar outcomes. 75% of Lambda students could pay an avg of $9k less elsewhere. 2/ First, outcomes.

85.9% of Lambda graduates get a job within 180 days, with a median 60k salary.

A survey across multiple bootcamps found that 79% of all bootcamp grads were employed within 120 days, with a median 65k salary.