Shubham Saboo Profile picture
May 9, 2024 6 tweets 2 min read Read on X
Document OCR in 90+ languages using python (100% free and without internet):
For document OCR, we'll use opensource toolkit Surya.

Python toolkit that does:
• OCR in 90+ languages
• Line-level text detection in any language
• Layout analysis (table, image detection)
• Reading order detection
• Outperforms Tesseract, on par with Google Cloud vision Image
1. Install the Python library

Run the following command from your terminal. You'll need python 3.9+ and PyTorch. Image
2. Try out the OCR (text detection) boilerplate code Image
3. Try out the built-in interactive demo with Streamlit.

Install Streamlit and run the following command in your terminal: 'surya_gui'
🌟 Support the opensource project:

8000+ readers receive my AI newsletter everyday and you are not one of them. Join @_unwind_ai to stay on top of the latest AI developments and tools: github.com/VikParuchuri/s…
unwindai.substack.com
Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Shubham Saboo

Shubham Saboo Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Saboo_Shubham_

May 28
I vibe coded Instagram clone with this AI Agent using Claude 4 in less than 5 minutes.

Ship full-stack apps using Claude 4 without writing a single line of code.

Let that sink in.
Emergent is an agentic coding platform to build web apps, games, SaaS, Chrome extensions, and everything else.

It doesn't just generate code snippets, it builds entire systems with databases, APIs, authentication, and infrastructure, all with just simple prompts.
Here's another one:

I connected it with my GitHub repo Awesome LLM Apps and asked it to create a nice landing page for my Legal AI Agent Team app.
Read 6 tweets
May 27
5 Autonomous General Purpose AI Agents that can literally do your work.

1. II-agent can think, code, reason and browse in a single loop just like humans.

Outperforms other AI Agent frameworks like Manus AI, Genspark AI, and OpenAI Deep Research.

100% opensource.
2. Skywork Super Agents can generate documents, slides, Excel sheets, webpages, and podcasts with deep research, using multi-agents.

These Super Agents can complete 8 hours of office work in just 8 minutes.
I have created 50+ AI Agents and RAG tutorials, 100% free and opensource.

To get started:
1. Follow me → @Saboo_Shubham_
2. Subscribe to Unwind AI (for free): theunwindai.com
3. Star the repo: github.com/Shubhamsaboo/a…

New AI Agents and RAG tutorials added every week.
Read 8 tweets
May 4
10 MCP servers to supercharge your AI Agents.
1. Firecrawl MCP Server

Turns any site, even those rendered by JavaScript into ready‑to‑parse HTML or Markdown for your AI agent.
2. Browserbase MCP Server

Spins up a cloud browser so the model can load JS‑heavy pages, click around and grab screenshots. Image
Read 13 tweets
Apr 18
I built an MCP AI Agent using Gemini Flash 2.5 with access to AirBnB and Google Maps in just 30 lines of Python Code.

100% Opensource Code.
I have created 50+ AI Agents and RAG tutorials, 100% free and opensource.

Two simple steps to get started:
1. Subscribe to Unwind AI (for free): theunwindai.com
2. Star the repo: github.com/Shubhamsaboo/a…

New AI Agents and RAG tutorials added every week.
Here's the full Opensource code.

🌟 Star the repo for more: github.com/Shubhamsaboo/a…Image
Read 4 tweets
Apr 16
OpenAI just launched o3 and o4-mini with agentic tool use.

It's been just a few hours, and the internet is filled with incredible AI examples.

10 wild examples:

1. o3 creates a movie without video tools by sketching each frame of an otter and airplane scene and stitching them into a GIF, all in one shot.

2. o3 and o4-mini nailed the coding vibe check

Read 13 tweets
Apr 16
Web Action AI Agent that doesn’t just scrape but finds the data you need.

Firecrawl launched FIRE-1, an AI agent that navigates complex websites, interacts with buttons, fills forms, and gathers data beyond traditional scraping.

No manual steps required.
FIRE-1 AI Agent is available to use on Firecrawl starting today.

Learn more about it here: firecrawl.dev/blog/launch-we…
I have created 50+ AI Agents and RAG tutorials, 100% free and opensource.

Two simple steps to get started:
1. Subscribe to Unwind AI (for free): theunwindai.com
2. Star the repo: github.com/Shubhamsaboo/a…

New AI Agents and RAG tutorials added every week.
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(