Thanks all for joining me 😃
To celebrate, I'll post an (Oracle) #SQL tip for each retweet this gets (up to some limit)
Go!
So
select * from ... where anything = NULL
ALWAYS returns no rows
For example:
WHERE c1 NOT IN ( ... )
Returns no rows if any values in the list are NULL
To avoid this, usually you want NOT EXISTS instead of NOT NULL
SELECT COUNT ( column_with_nulls )
returns the number of non-null rows for that column
This can be less than the number of rows in the table!
COUNT (*) means count all the rows in the group
This IS THE SAME as COUNT (1) or COUNT ( mandatory_column )
The Oracle optimizer rewrites COUNT (1) -> COUNT (*)
NVL is Oracle-specific with 2 args, returning the 2nd if the 1st NULL
COALESCE is ISO standard and can take many args, returning the 1st that's NOT NULL
NVL always evaluates both args; COALESCE uses short-circuiting
But NVL can give better plans in some cases!
That said, it's almost always better to use NULL instead of choosing a "magic value" to represent missing/not applicable; e.g. -1
Magic values bring similar issues to NULL, but are specific to your app!
"SELECT weight FROM applies WHERE colour = 'red'"
"Get the weight of the apples that are red"
But the order the database processes queries is:
FROM/JOIN -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY
The optimizer can take your query and rearrange it to another statement that's logically the same
This means that the database IS NOT GUARANTEED to process subqueries first/last; it may be converted to a join!
It can use constraints to eliminate unnecessary work
e.g. if you join a parent/child table
AND there's an FK child -> parent
AND the FK cols are all NOT NULL
AND you only access columns from the child
The optimizer can "remove" the parent table from the plan!
- Better data quality
- Faster #SQL statements
-> Create and enable the relevant constraints on your tables!
Well yes you can IF:
- You like writing (LOTS) of code
- Are prepared to spend LOTS of time testing they work in multi-user environments
- Like building a bottleneck into your app ;)
Me: I prefer ALTER TABLE ... ADD
Remove them with:
delete <tab>
where rowid not in (
select min(rowid)
from <tab>
group by <list of cols that should have unique values>
)
Put it inside a loop:
FOR ... LOOP
<SQL that processes 1-2 row/iteration
END LOOP;
Want FAST SQL?
Ditch the loop. Have ONE statement that processes MANY rows
You want one execution -> many rows
NOT
Many executions -> one row each
Learn how to read execution plans and how indexes work
For the former, see blogs.oracle.com/oraclemagazine…
And for the latter use-the-index-luke.com by @MarkusWinand is a fantastic resource
In @OracleSQLDev or @oraclesqlcl run:
set sqlformat csv
spool <file>
select ...
spool off
Other formats like JSON, HTML, supported too
If you want it to go faster, save the above in a SQL script with
set term off
set array 1000
at the top
So adding/removing columns changes the shape of the result set
This is likely to lead to errors!
Only select the columns you need
- Less data are sent over the network
- It increases the chance of the optimizer being able to do an index-only scan
Both of which can make your SQL faster
The window allows you to control which rows are included
rows between 7 preceding and 3 following
sums up to 11 rows, the 7 before the current and 3 after; this is very powerful!
Check out ROLLUP & CUBE
ROLLUP - produce subtotals working right-to-left through the columns
CUBE - produce subtotals for every possible combination of the columns
This matters whether you store data in relational tables, JSON, or something else!
Understand the trade-offs you make if your tables aren't in 5NF
For a quick primer on forms 1-5 see bkent.net/Doc/simple5.htm
If you're constructing SQL using string concatenation, you're almost certainly vulnerable
Use bind variables!
Database users have full privileges on their own objects; meaning they can do nasty things like drop tables!
ALWAYS connect your application to the database with a user that owns NO tables. Grant (least) privs to the tables it needs in another user
ALWAYS
Often DBAs, deployment apps, etc. need to connect as the schema owner. This creates password sharing issues etc.
Proxy users are a great way to overcome this
Each person/app has their own user & credentials; but with the ability to connect as the table owner
The cycle of view use seems to go
1 We don't use views
2 Views are cool
3 Views are awesome, use them everywhere
4 Views on views on views
5 Debugging is hard, performance sucks
6/1 We don't use views
Try and stay at 2 ;)
create view ...
select ... where ...
WITH CHECK OPTION
This ensures if you write a row that you get an exception, instead of it "disappearing" because it doesn't match the WHERE clause
Back on NULL
constraint check ( c > 0 )
Allows you to INSERT null
But
select ... where c > 0
Won't return it! Why?
1 Constraints reject rows => FALSE
2 Selects return rows => TRUE
NULL > 0 => UNKNOWN so not false for 1 and not true for 2!
FROM t1
LEFT JOIN t2
ON ...
WHERE t2.col = ...
You've turned the outer join to inner!
If you need to filter the inner table, do so in the JOIN:
FROM t1
LEFT JOIN t2
ON ...
AND t2.col = ...
e.g. for every date, show every customer and the value of their orders. Even if they made no orders!
This is easy with a partitioned outer join
FROM dates
LEFT JOIN customers PARTITION BY ( cust_id )
WITH q1 as ( ... ),
q2 as ( ... )
SELECT ... FROM q1 ... q2
This is MUCH easier to understand & debug than subqueries
I often use this as a way to make large queries easier to work with
'02-MAY-2020'
Is NOT A DATE!
It's a string that looks like a date
If you want a date value use:
date'2020-05-02'
to_date ( '02-MAY-2020', 'DD-MON-YYYY' )
Date literals are of the form YYYY-MM-DD
With to_date, passing the value and the appropriate format
The database stores dates in an internal format
The display format is controlled by the client via the NLS settings
Be explicit with:
to_char ( dt, 'FMT')
NOT VARCHAR2
NOT NUMBER
Date values => DATE or TIMESTAMP
If you want a day with no time, set the time to midnight
And stop people storing other times with this constraint:
check ( date_col = trunc ( date_col ) )
And if you want this order to be deterministic, ensure this includes columns with unique values (enforced by a PK/UK ;)
The optimizer *might* use a sorting operation to do the grouping
But this isn't guaranteed!
GROUP BY might appear to sort, but if you need rows in a certain order see my previous tweet - USE ORDER BY!
MERGE JOIN CARTESIAN
in an execution plan, it's usually a sign of a missing join
But this *can* be the best way to join; provided the first returns just one row