i'll nitpick two details, though :-)
the 512k upper bound for disk IO size isn't properly a #sqlserver limit, although it *is* the limit on most systems. it's the upper bound of the disk adapter.
some disk adapters - most fibre channel HBAs and some others, too - allow configuration of the maximum disk transfer size, and can be set higher than 512kb.
in a vmware vm, the max transfer size of the pvscsi vhba is 512k and can't be changed.
the LSI vHBA has a max transfer size of 32 mb!
~~
PVSCSI and Large IO’s
2014 March 14 blogs.vmware.com/vsphere/2014/0…
while the LSI vHBA's larger max transfer size (used as long as the physical host also has the larger max transfer size) vs pvscsi can be advantageous for some workloads, i still don't recommend the LSI vHBA for typical #sqlserver workloads.
the LSI vHBA has a nonconfigurable device queue depth of 32 and a nonconfigurable adapter queue depth of 128.
the pvscsi vHBA has a default device queue depth of 64 and a default adapter queue depth of 256.
~~
No one likes queues
2011 March 4 yellow-bricks.com/2011/03/04/no-…
Large-scale workloads with intensive I/O patterns might require queue depths significantly greater than Paravirtual SCSI default values (2053145)
2015 December 16 kb.vmware.com/s/article/2053…
my recommendation to use pvscsi vHBA for #sqlserver workloads in #vmware, 512k max transfer size notwithstanding, is because disk queue length higher than 32 (which would cause use of wait queue if queue depth is 32) is much more common for #sqlserver than read > 512kb. *and*
and if there are more than 4 "disk devices" in the vmware vm attached to a vHBA, the aggregate service queue length (up to but not over disk device queue depth) sent to the adapter will exceed 128 (and use LSI adapter wait queue) more often than a read > 512kb would occur.
anyway. #sqlserver can issue reads for contiguous hobt extents larger than 512kb *if* the storage adapter allows it.
This last part is tricky. Let's talk about the barebones reason readahead is important.
~~
"This is why read ahead is important, you get more data from the disk into SQL Server faster and with fewer IOs."
Fast IO is good. And if disk IO size for a given disk-bound workload can be increased by a factor of 4 while avg service time only increases by 2 (assuming the same level of outstanding IO), bytes/s could be doubled.
But... the most important part of readahead isn't how fast, or how large. it's "ahead". just get it in the buffer pool *before* the query thread asks for it, and the pageio_latch wait will be avoided.
OS or filesystem readahead has to function without the benefit of a SQL query plan :-)
So, sequential patterns of access within a file are detected and IO sizes increase in response to continued sequential access of a file.
#sqlserver on the other hand, benefits from a query plan. Even if there has been sequential access of the data file, #sqlserver won't be tricked into reading parts of the data file that aren't according to plan :-)
imagine an 10 vcpu vm, with 1 TB vRAM and Max Server Memory set to 800 GB. Only one DOP 8 query runs at a time.
Because a query plan was generated before the query started, the database engine knows which indexes will be scanned in full by the query plan. if those indexes were 50 gb total in size, the database cache could be pre-warmed in cache with those entire indexes...
as fast as the reads can occur, with as many outstanding read IOs at a time as the queues will allow. And as long as the index pages are in cache before the query gets there... no pageio latch waits.
#sqlserver doesn't have the luxury to read everything from a query plan into cache as fast as possible when the query starts executing, though.
in part because there's no guarantee that'll be the only query.
wouldn't it be horrible if #sqlserver read your query's fullscan indexes into database cache, only for Bob from accounting to have *his* queries indexes read into cache too... with *Bob's* indexes pushing yours out of cache before the query was done with them?!? Re-read city.
this is why #sqlserver readahead has a limited target, updated over time, based on the progress made on the scan over time.
no benefit in reading it all in ahead of time *fast* if, by the time the query workers get there, it's gone. so readahead won't pull *too* far ahead of query worker progress for the operator the readahead belongs to.
ok, that's enough tweets about this.
some of these details are unnecessary for understanding or tuning the majority of systems and workloads.
but sometimes you'll find an exceptional workload, or an exceptional system, that'll require an exceptional explanation.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
just like that, i'm probably done thinking about scalabilty strategies for #sqlserver, and back to investigating latch timeouts on very busy systems.
here's the thing. the number of mdmp files for a given large-memory system can be pretty small, especially if an Availability Group is involved.
while a dump is in progress, #sqlserver memory must remain in a constant consistant state. the more memory, the longer the dump may take (there are other un-enumerated factors).
but, Availability Group lease timeout can only be set as high as 100 seconds currently.
484 #sqlserver batch mode on rowstore queries. DOP 8. run consecutively by a single session. each query is a fullscan of a different single table, grouping by a CASE statement on a non-indexed INT/BIGINT/NUMERIC data type column. producing one or two result rows of 1 column each.
based on observation of the first experiment with these queries, in which the queries were ordered alphabetically by the name of the only table in the FROM, i reordered the queries based on decreasing pages:rows ratio. that gives the evident decreasing pages/sec trend :-)
a closeup of the first 10 minutes shows the considerable variation even though the downward trend is evident.
wow
~~
Oracle in Talks to Buy Cerner
An agreement, which could potentially be worth $30 billion, would rank as biggest ever for software giant
2021 December 16 wsj.com/articles/oracl…
i don't know i've ever heard a story about Oracle in talks to buy a company with an ending other than Oracle *buying* that company.
oh. i forgot about the TikTok debacle, mentioned by Financial TImes here.
so that's *one* story about Oracle not buying.
~~
Oracle nears deal to buy health IT company Cerner for $30bn
2021 December 16 ft.com/content/9bf806…
indubitably
~~
"To conduct tactical monitoring, we must also consider the complexities of virtualization, consolidation and concurrency alongside the activities of multiple CPU cores."
"... for capacity planning, this assumption is mostly harmless.
However, this assumption is reckless for tactical monitoring..."
Startling?
~~
We invited an AI to debate its own ethics in the Oxford Union – what it said was startling
2021 December 10 theconversation.com/we-invited-an-…
Startlingly... ridiculous!
"The good news is that you don’t have to build your own AI team. You can outsource your AI work to experts in the field, which can help you make the most of technology and ensure that you don’t fall victim to the same AI problems as your competitors."