Tyler Profile picture
Oct 7 10 tweets 2 min read Read on X
there's so much content on how to build AI agents, but no one ever talks about the data engineering pipelines that support them

here's a thread going over the basics of data engineering:
the goal of data engineering:

- extract data from various sources
- transform it into structured format
- load into a data warehouse like Snowflake

and this structured data often is used context for AI systems to make personalized recommendations
why this matters for AI:

to build personalized AI systems, you need clean, structured data

as the more structured and labeled it is, the more granular/accurate context we can retrieve for an AI system
step 1: extract data from sources

where is your data coming from?

- Google Sheets (pull via API)
- websites (web scraping)
- third-party APIs
- existing databases

this part depends entirely on what data you're trying to collect
step 2: transform the raw data

raw data is messy and unstructured - contract text, website content, whatever

you need to:

- clean up missing values
- add structure to make it consistent
- standardize formats across different sources
AI is often used in transformation pipelines as well

an example I’ve done: extracting entities from contract text like specific clauses, start/end dates, party names

raw text isn't in clean tabular format obviously

so we use to AI to pull information from unstructured documents to put them in a tabular format

(like think of a google sheet with a column for each clause and key entity we want to collect for example)
step 3: load the data into a data warehouse

push the cleaned, structured data into Snowflake or similar warehouse

and now you have tables of organized data that you can easily query and use as context
the tools involved:

- Python for scripting
- AWS for cloud infrastructure
- Airflow for workflow orchestration
- Snowflake for data warehousing
- DBT for data transformations

there’s an entire tech stack that comes with this shit and that’s why data engineers get paid good af
note as well why reliability is critical af with data pipelines is

if if breaks anywhere here, every AI system built on top of it is cooked

bad data here = broken systems downstream
to recap data engineering basics:

- extract data from various sources (APIs, scraping, databases)
- transform raw data into structured format (clean, standardize, extract entities)
- load into data warehouse
- build reliably with error handling and monitoring

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Tyler

Tyler Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @tyler_agg

Oct 2
how to reverse engineer any successful AI product:
step 1: understand the manual process

before diving into a technical analysis, figure out what human task this AI product is automating

> what would someone do manually to achieve the same result?
> what decisions need to be made?
> what data is required at each step?
> what is the most painful part of this task that people are paying to automate?
step 2: create your own technical hypothesis

based on your knowledge of AI fundamentals (embeddings, RAG, APIs, etc.)

sketch out how YOU would build this

don't overthink it - focus on the core workflow and data flow

this gives you a baseline to compare against
Read 8 tweets
Oct 1
how to discover AI business ideas that actually make money (step by step breakdown):
step 1: research industries with manual bottlenecks

use this prompt to understand where people are struggling:

"You're a 20-year veteran in the [INDUSTRY]. What tasks consume the most time daily? What repetitive work do you wish would disappear?"

test this across different sectors
step 2: reach out to real people in those industries

LinkedIn search for decision-makers and send this:

"I'm researching the [INDUSTRY] and would love insights from someone with experience.

Would you be interested in a quick call to ask about your daily daily operations? Happy to send $50 for your time"

(if you're in college you can just say it's for a school project lol)
Read 7 tweets
Sep 30
how to vibe code your AI project in 7 days (step by step breakdown):
day 1: map out your system design

don't jump straight into code - plan the complete workflow first

what steps does your tool need to get users from point A to point B?

create a document outlining each step, what data is needed, what decisions get made
note that this isn't about code or AI yet - it's pure logic

"step 1: user uploads document, step 2: extract key info, step 3: if condition X then do Y"

having a clear system design prevents building something that doesn't work and makes it much easier to build
Read 11 tweets
Sep 26
how to build your first AI agent (complete roadmap):
step 1: find a real problem worth solving

forget about AI for a second and think about tasks that:

> take up hours of someone's time every week
> are repetitive and monotonous
> cost the business real money when delayed
> currently require employees to do manually
classic example: customer support tickets

responding to the same questions over and over again eats up tons of time

but it's critical for keeping customers happy

this is the type of problem where an AI agent can actually provide real value
Read 13 tweets
Sep 25
the last 6 years I've built AI systems for startups, mid-sized companies & global enterprises

here are my 11 biggest lessons on building AI systems:
lesson 1: shadow employees before building anything

spend time with whoever currently does the job you're trying to automate

understand the industry nuances, requirements, and how they actually envision the workflow
lesson 2: no data = no project

I don't work with companies unless they have the actual data needed to build the system

or they are willing to put in the time to acquire such data

want a company chatbot? you need documented SOPs and FAQs

want personalized recommendations? you need structured user data
Read 13 tweets
Sep 23
how to become elite at AI (step by step breakdown):
step 1: learn to code

start with Python basics - for loops, data structures, classes, all that fundamental stuff

you need one programming language to understand how systems actually work

and Python is the most common language for development
step 2: learn system design thinking

before building anything, learn to reverse engineer manual processes into step-by-step workflows

forget AI for a minute - how would you break down this task?

map out everything:

- what decisions need to be made at key points?
- what data and context is needed at each step?
- where does human escalation happen?
Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(