Tweet

Ismael Ghalimi

Jan 14 • 26 tweets • 6 min read

The implementation of our new SELECT transform triggered many fascinating discussions at #STOIC. Among them, one was trying to answer the following question: should we encourage users to use SELECT as much as possible, or should we steer them toward simpler Tabula transforms?

For reference purposes, here are Tabula transforms:

By design, Tabula transforms are easier to use and easier to distribute. On the flip side, SQL is an industry standard, and much, much more powerful from a functional and semantic standpoint. But it's also much more difficult to use, and very challenging to distribute.

This creates an interesting dilemma. Here is how we think about it: on one hand, users who are new to SQL will start with Tabula transforms, because they are simpler, and benefit from a point-and-click UI. On the other hand, SQL developers will naturally gravitate toward SELECT.

Therefore, we need to invest both on the UI for our Tabula transforms and our graphical SQL query editor (the UI for the SELECT transform). And we should establish a very progressive path from the former to the latter.

In other words, we must turn #STOIC into the very best SQL learning tool that has ever been brought to market. And to do so, we must create the very best SQL query designer.

To do so, we must start by reducing the gap between Tabula and SQL. This is something that we have been working on for about two years now, and we have developed a pretty good idea of what needs to be done.

For example, our transform used to select columns in a table is called PICK, and it has two modes: DROP and KEEP. The former is a SQL keyword used to drop columns, therefore we should put more emphasis on it.

By the same token, we do not want to multiply the number of transforms that we present to our end user. We now have 23 of them, and we probably want to keep them at that level.

@olivier_gaucher

A solution was recently suggested by @olivier_gaucher: use PICK as main transform, but display it as DROP or KEEP when used in the data journey. That way, we have the best of both worlds.

And the same can be done with the FILTER transform, which is used to filter rows in or out. In the data journey, we could display it as PRESERVE or DELETE, depending on whether data is filtered in (PRESERVE) or out (DELETE), knowing that DELETE is a SQL statement to delete rows.

Doing so will turn most of our Tabula transforms as proto-SQL statements, most of them using a single clause. Most importantly, it will introduce our SQL-illiterate users to many of the core concepts of SQL, in a very safe and progressive manner.

From there, we need to give these users a gentle on-ramp to the infinite power of SQL, and to do so, we need to show them that they're actually producing SQL without knowing it.

A naive approach to this challenge would be some kind of "View SQL", which would show the user the SQL code produced by a single transform or a complete data journey. Unfortunately, that would not work so well, because it would show a very complex and deeply-nested SQL statement.

A better approach would consist in outlining a set of consecutive transforms in a data journey that could be collapsed into a single SELECT transform.

For example, if you have a FILTER transform followed by a few COMPUTE transforms, then complemented with a GROUPBY transform, you could easily collapse all these into a single SELECT transform with SELECT, FROM, WHERE, and GROUP BY clauses.

This outline could be displayed as a colored vertical margin on the left side of the transforms in the data journey, with a prompt at the bottom inviting the user to collapse these transforms into a single SELECT.

Mousing over the prompt would preview the resulting SELECT transform, making the link between the collapsed transform and the SELECT transform's clauses quite obvious.

This design would serve two main purposes: first, it would introduce our SQL-illiterate user to the power of SELECT; second, it would allow our user to design SELECT transforms incrementally, by using simpler Tabula transforms step by step.

Initially, our user will probably keep using Tabula transforms before collapsing them into a single SELECT transform. But after a while, there is a good chance that our user will start using the SELECT transform directly, because it will simplify and accelerate their workflow.

In other words, this progressive design will create two opportunities for epiphany: first when our user discover what SELECT does; second when they decide to use SELECT directly.

Good product design creates progressive paths to epiphanies.

With that in mind, let's review where we are today, and where we want to go next. On the Tabula front, we have many years of experience refining our data journey. In its current state, it is working really well, and very little work has to be done there.

All we have to do is to fine tune the point-and-click forms for our 22 transforms other than SELECT. This is very incremental work that will happen on its own, without too much effort.

Things are quite different on the SELECT front though. There, we're just getting started, and we have a lot to learn. But we can also stand on the shoulders of giants, because the topic of SQL query design is a well-researched one.

@bbonnotte_

But before I cover this topic on a separate thread, let me thank @bbonnotte_ for his seminal ideas about Tabula vs. SQL. What was presented on this thread is a direct result of his awesome contributions.

https://twitter.com/ghalimi/status/1614244954657927169

See you there:

https://twitter.com/ghalimi/status/1614244954657927169

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @ghalimi

Ismael Ghalimi

@ghalimi

Jan 15

https://twitter.com/ghalimi/status/1614266907666055169

Yesterday, I described a version of our upcoming SQL query designer that focused on making it easier to compose SQL queries, while preserving SQL's hierarchical structure. Today, I want to explore an alternative path.

https://twitter.com/ghalimi/status/1614266907666055169

Instead of taking a SQL-oriented approach, I want to take a task-oriented approach. And I want to free myself from SQL's hierarchical structure, while still producing a well-formed SQL query in the end.

The idea is similar to what Excel's pivot table is doing: give the user a relatively simple graphical user interface to pivot a table across multiple dimensions, and produce a query from it (be it a SQL or a DAX query).

Read 61 tweets

Ismael Ghalimi

@ghalimi

Jan 15

One of #STOIC's most useful features is its signature Summary Charts, which are these bar charts displayed at the top of every column in a table. They work really well, unless your table has very few rows. Here is how we'll improve them for some interesting corner cases.

Relative Baseline in Bar Plots

When a table has less than 50 rows, we replace the Histogram displayed for a numerical column with a Bar Plot visualizing discrete values, but we keep 0 as baseline. We should use MIN or MAX instead, as we do for cell summaries below.

Bar Plot with All Equal Values

When a Bar Plot is produced for a set of values that are all equal, we would want to know that at a glance. To do so, the length of bars will be reduced by 50%, while keeping the plot horizontally centered.

Read 18 tweets

Ismael Ghalimi

@ghalimi

Jan 15

@berkeleygfx

Things that I love about @berkeleygfx's design philosophy:

Emergent over prescribed aesthetics.
Expose state and inner workings.
Dense, not sparse.
Performance is design.
Beauty emerges automatically without deliberation.
Do not infantilize users.

"Emergent over prescribed aesthetics."

Our core UI design was totally emergent. We did not follow any trends. We just tried to render as much qualitative and quantitative information as possible, in a well-organized fashion. Aesthetics were a mere by-product.

"Expose state and inner workings."

That's exactly what we do when we expose the "Runtime Boundary" on the left panel: Lambda functions upstream, EC2 monostore downstream.

Read 7 tweets

Ismael Ghalimi

@ghalimi

Jan 15

One of the main factors that is holding back the design of good data tools for less technical users is our lack of trust in their ability to understand complex mathematical concepts like logarithmic scales or percentiles.

While it is true that the vast majority of such users are not yet familiar or comfortable with these concepts today, they could learn about them over time, and we need to design user interfaces that can facilitate this learning.

This is why I am such a strong proponent of what I call progressive user interfaces. These are user interfaces that use advanced concepts surreptitiously. These concepts are there right in front of the user, yet might not be noticed at first, and don't get in the way.

Read 5 tweets

Ismael Ghalimi

@ghalimi

Jan 14

https://twitter.com/ghalimi/status/1614244954657927169

This thread introduced our upcoming SQL query designer and its two modes: graphical and textual. This other thread will cover the graphical mode in more detail, reviewing each clause of SQL's SELECT statement.

https://twitter.com/ghalimi/status/1614244954657927169

@duckdb

As a baseline, we will use this @duckdb documentation for SQL's SELECT:

duckdb.org/docs/sql/state…

This gives us the following clauses:

WITH
SELECT
FROM
USING
WHERE
GROUP BY
WINDOW
ORDER BY
LIMIT

Read 53 tweets

Ismael Ghalimi

@ghalimi

Jan 14

@duckdb

This thread introduced the SQL vs. Tabula discussion. This other thread will cover our SQL query designer. This tool will focus on @duckdb's SQL dialect, but it will be designed to work with virtually any SQL cloud database (@SnowflakeDB, @databricks, etc).

https://twitter.com/ghalimi/status/1614234050981658624

There are many ways to design a SQL query designer, and to design a good one, two critical questions must be answered first: who is the audience, and what is their working environment?

For #STOIC, we must serve two very different user groups:

1. Users who are new to SQL
2. Users who are SQL experts

Read 34 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Ismael Ghalimi

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @ghalimi

Ismael Ghalimi

Ismael Ghalimi

Ismael Ghalimi

Ismael Ghalimi

Ismael Ghalimi

Ismael Ghalimi

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!