12,399 views

Chris Saxon

@ChrisRSaxon

, 38 tweets, 8 min read

My Authors

Woooo! Another milestone reached!

Thanks all for joining me 😃

To celebrate, I'll post an (Oracle) #SQL tip for each retweet this gets (up to some limit)

Go!

NULL is neither equal to nor not equal to anything

So

select * from ... where anything = NULL

ALWAYS returns no rows

This can lead to other unexpected outcomes where NULLs are possible

For example:

WHERE c1 NOT IN ( ... )

Returns no rows if any values in the list are NULL

To avoid this, usually you want NOT EXISTS instead of NOT NULL

NULL also impacts some aggregate functions, for example:

SELECT COUNT ( column_with_nulls )

returns the number of non-null rows for that column

This can be less than the number of rows in the table!

While we're discussing COUNT:

COUNT (*) means count all the rows in the group

This IS THE SAME as COUNT (1) or COUNT ( mandatory_column )

The Oracle optimizer rewrites COUNT (1) -> COUNT (*)

Back on NULLs

NVL is Oracle-specific with 2 args, returning the 2nd if the 1st NULL

COALESCE is ISO standard and can take many args, returning the 1st that's NOT NULL

NVL always evaluates both args; COALESCE uses short-circuiting

But NVL can give better plans in some cases!

NULL brings lots of complications and is best avoided

That said, it's almost always better to use NULL instead of choosing a "magic value" to represent missing/not applicable; e.g. -1

Magic values bring similar issues to NULL, but are specific to your app!

SQL is designed so statements read like an English sentence:

"SELECT weight FROM applies WHERE colour = 'red'"
"Get the weight of the apples that are red"

But the order the database processes queries is:

FROM/JOIN -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY

But the "order of processing" rule above is more complicated!

The optimizer can take your query and rearrange it to another statement that's logically the same

This means that the database IS NOT GUARANTEED to process subqueries first/last; it may be converted to a join!

On the optimizer

It can use constraints to eliminate unnecessary work

e.g. if you join a parent/child table
AND there's an FK child -> parent
AND the FK cols are all NOT NULL
AND you only access columns from the child

The optimizer can "remove" the parent table from the plan!

So if you want:

- Better data quality
- Faster #SQL statements

-> Create and enable the relevant constraints on your tables!

But can't I enforce the constraints in my application?

Well yes you can IF:

- You like writing (LOTS) of code
- Are prepared to spend LOTS of time testing they work in multi-user environments
- Like building a bottleneck into your app ;)

Me: I prefer ALTER TABLE ... ADD

Missing a UNIQUE constraint and have lots of duplicate rows?

Remove them with:

delete <tab>
where rowid not in (
select min(rowid)
from <tab>
group by <list of cols that should have unique values>
)

Want to write SLOOOOOOW #SQL processes?

Put it inside a loop:

FOR ... LOOP
<SQL that processes 1-2 row/iteration
END LOOP;

Want FAST SQL?

Ditch the loop. Have ONE statement that processes MANY rows

You want one execution -> many rows
NOT
Many executions -> one row each

@MarkusWinand

@MarkusWinand

"My SQL processes lots of rows but is still slow"

Learn how to read execution plans and how indexes work

For the former, see blogs.oracle.com/oraclemagazine…

And for the latter use-the-index-luke.com by @MarkusWinand is a fantastic resource

@OracleSQLDev

@OracleSQLDev

Need to export data as CSV?

In @OracleSQLDev or @oraclesqlcl run:

set sqlformat csv
spool <file>
select ...
spool off

Other formats like JSON, HTML, supported too

If you want it to go faster, save the above in a SQL script with

set term off
set array 1000

at the top

If you have SELECT *, the database returns all the visible columns in the table

So adding/removing columns changes the shape of the result set

This is likely to lead to errors!

Only select the columns you need

Only selecting the columns you need has further benefits that:

- Less data are sent over the network
- It increases the chance of the optimizer being able to do an index-only scan

Both of which can make your SQL faster

Analytic/window functions enable you to calculate running totals

The window allows you to control which rows are included

rows between 7 preceding and 3 following

sums up to 11 rows, the 7 before the current and 3 after; this is very powerful!

You can number rows with:

ROW_NUMBER - unique/row
RANK - ties have same value; gaps according to Olympic ranking
DENSE_RANK - ties have same value; no gaps

Want sub- and grand totals for aggregations in a GROUP BY?

Check out ROLLUP & CUBE

ROLLUP - produce subtotals working right-to-left through the columns
CUBE - produce subtotals for every possible combination of the columns

Normalization is a key concept for table design

This matters whether you store data in relational tables, JSON, or something else!

Understand the trade-offs you make if your tables aren't in 5NF

For a quick primer on forms 1-5 see bkent.net/Doc/simple5.htm

SQL Injection is STILL one of the most common vulnerabilities

If you're constructing SQL using string concatenation, you're almost certainly vulnerable

Use bind variables!

More on security:

Database users have full privileges on their own objects; meaning they can do nasty things like drop tables!

ALWAYS connect your application to the database with a user that owns NO tables. Grant (least) privs to the tables it needs in another user

ALWAYS

On security

Often DBAs, deployment apps, etc. need to connect as the schema owner. This creates password sharing issues etc.

Proxy users are a great way to overcome this

Each person/app has their own user & credentials; but with the ability to connect as the table owner

Views are a great way to create reusable SQL statements

The cycle of view use seems to go

1 We don't use views
2 Views are cool
3 Views are awesome, use them everywhere
4 Views on views on views
5 Debugging is hard, performance sucks
6/1 We don't use views

Try and stay at 2 ;)

If you do use views and do INS/UPD/DEL against them, use:

create view ...
select ... where ...
WITH CHECK OPTION

This ensures if you write a row that you get an exception, instead of it "disappearing" because it doesn't match the WHERE clause

Another NULL oddity:
Back on NULL

constraint check ( c > 0 )

Allows you to INSERT null

But

select ... where c > 0

Won't return it! Why?

1 Constraints reject rows => FALSE
2 Selects return rows => TRUE

NULL > 0 => UNKNOWN so not false for 1 and not true for 2!

If you have

FROM t1
LEFT JOIN t2
ON ...
WHERE t2.col = ...

You've turned the outer join to inner!

If you need to filter the inner table, do so in the JOIN:

FROM t1
LEFT JOIN t2
ON ...
AND t2.col = ...

Sometimes with outer joins you want to do "data densification"

e.g. for every date, show every customer and the value of their orders. Even if they made no orders!

This is easy with a partitioned outer join

FROM dates
LEFT JOIN customers PARTITION BY ( cust_id )

The WITH clause is a great way to create named subqueries you can use later in the query:

WITH q1 as ( ... ),
q2 as ( ... )
SELECT ... FROM q1 ... q2

This is MUCH easier to understand & debug than subqueries

I often use this as a way to make large queries easier to work with

The value:

'02-MAY-2020'

Is NOT A DATE!

It's a string that looks like a date

If you want a date value use:

date'2020-05-02'
to_date ( '02-MAY-2020', 'DD-MON-YYYY' )

Date literals are of the form YYYY-MM-DD
With to_date, passing the value and the appropriate format

You can't store a DATE with the format "DD-MON-YYYY". Or any other format for that matter

The database stores dates in an internal format

The display format is controlled by the client via the NLS settings

Be explicit with:

to_char ( dt, 'FMT')

Store datetime values in either a DATE or TIMESTAMP

NOT VARCHAR2
NOT NUMBER

Date values => DATE or TIMESTAMP

Unlike ISO dates, Oracle Database date always include a time component

If you want a day with no time, set the time to midnight

And stop people storing other times with this constraint:

check ( date_col = trunc ( date_col ) )

If you want to guarantee that a query returns rows in a particular order, you MUST add an ORDER BY

And if you want this order to be deterministic, ensure this includes columns with unique values (enforced by a PK/UK ;)

GROUP BY != ORDER BY

The optimizer *might* use a sorting operation to do the grouping

But this isn't guaranteed!

GROUP BY might appear to sort, but if you need rows in a certain order see my previous tweet - USE ORDER BY!

If you see:

MERGE JOIN CARTESIAN

in an execution plan, it's usually a sign of a missing join

But this *can* be the best way to join; provided the first returns just one row

Enjoying this thread?

Try unrolling a thread yourself!

Enjoying this thread?

Try unrolling a thread yourself!

Related hashtags

Embed code for your website

Did Thread Reader help you today?