Ergest Xheblati Profile picture
Data + Business | Author: Minimum Viable SQL Patterns | Newsletter: Data Patterns (see links below)
Erwan Colson Profile picture 1 subscribed
Apr 4, 2023 4 tweets 1 min read
It feels like a lot of founders approach startups very pragmatically, they see a problem and go after it without thinking about the bigger picture.

Many of them are after status and glory (CEO / Founder title) and couldn't care much about the problem. Others are too myopic in their approach to the problem. They go after local optima vs global maxima.

This is true for a lot of data startups. The local optima might be “messy data” or “fast / easy reporting” while globally the customer wants to succeed with data.
Apr 1, 2023 4 tweets 2 min read
I was trying to create a course on advanced SQL patterns and decided to use ChatGPT to help me with the structure and projects.

It worked. In fact it worked so well, I got discouraged and shelved the whole project.

Here's what happened:

I started by giving ChatGPT the schema… twitter.com/i/web/status/1… What it came up with both delighted and discouraged me. You can see why:

Project 1: Top Users by Activity and Reputation
In this project, students will analyze the users and their activities on Stack Overflow, such as asking questions, answering, and commenting. They will use… twitter.com/i/web/status/1…
Mar 25, 2023 6 tweets 1 min read
Why do we do data analysis in companies?

That’s such a foundational question that no one bothers asking, because the answer is often “because everyone else is doing it and we don’t want to be left behind.”

Can we get to a satisfactory answer?

Let’s see: What is the goal of any company? In most cases it’s to make money, simple enough.

In some cases this objective is punted to the future in favor of growth. Get big first, dominate the market, then enjoy the spoils.

Initially some basic data is needed for acct. and finance.
May 17, 2022 10 tweets 4 min read
I didn’t really get Kimball’s dimensional modeling (DM) until I read an early article by him comparing DM to the ER model.

In the process I answered the question of whether dimensional modeling is still valid today.

kimballgroup.com/1997/08/a-dime… Early in the article Kimball states that ERs are nearly impossible for business users to query and use, hinting that DMs are used as the report delivery mechanism.

The equivalent of a DM these days are wide tables which are still valid for reporting/ BI tools to use.
May 17, 2022 4 tweets 1 min read
Finally started reading Bill Kent’s Data and Reality. It’s a shame this book is out of print and not very popular.

These are some of my favorite quotes: At its core, modeling is representing / mapping reality into the digital world therefore it is not very precise. Our maps of reality are by design incomplete and all models are wrong but some are useful.
May 15, 2022 4 tweets 1 min read
Data modeling is easy to understand but really hard to put into practice. Why?

Because it’s fundamentally an opinionated design process. The fundamental purpose of data modeling is to reduce redundancy in data so that updates, inserts and deletes are easy to and cause no anomalies or inconsistencies.

That seems easy right?

Let’s take a very basic example.
May 11, 2022 6 tweets 2 min read
Here's my attempt at calculating an approximate cost per query in Snowflake:
github.com/ergest/sql4fpl…

It assumes that compute was spread out evenly amongst the queries that ran during that hour.

Explanations below: The spine table is one of my favorite patterns. You lay out the "tracks" for the query and conform the data to fit them.

Here I'm creating one row per day per hour because warehouse_metering is only available hourly
May 5, 2022 10 tweets 2 min read
When modeling data in the warehouse, choosing the right concepts and entities to model is the hardest part.

Here’s a quick introduction to the process Start with the business processes first.

At the heart of any organization lie the processes that enable people to create and deliver value in the form of products and services.

Optimizing these processes requires you to measure their effectiveness.
Apr 22, 2022 9 tweets 2 min read
I’ve been an analytics engineer for the majority of my career. Even if the title is relatively new, the role has always existed in one way shape or form.

This is why I believe it’s here to stay: As an analytics engineer your job is to design and build usable, well-documented data models for analysts and data scientists to use.

These models serve as the building blocks on top of which metrics and KPIs are defined and the core business dashboards are built.
Jan 8, 2022 10 tweets 2 min read
I’ve been writing SQL for ~15 years. I’ve seen hundreds of thousands of lines of code.

Over time I developed a set of patterns and best practices I always come back to when writing queries.

This is my attempt to decode them 👇👇👇 Rule 1: Always use CTEs

When writing a complex query it’s a good idea to break it down into smaller components. As tempting as it might be to solve the query in one step don’t.

CTEs make your query easier to write and maintain in the future.