Tweet

L_ N___

15 Dec, 66 tweets, 11 min read

i don't use #Windows Resource Monitor very often.
it doesn't seem to like me.
first right-hand disk graphs froze, stopped updating. Even though sqlservr numbers kept updating.
then i tried to change graph size/position and whatever in the lower screen cap happened.

not a vm-wide memory problem. no paging space used.

not a vm-wide cpu problem. 67% is the peak - got batch mode workers on 32 of 48 vcpus and they're keeping things cooking.

so the disk graph displays again. but its a still, point-in-time graph. the sqlservr numbers are updating about once per second, graph never updates.

i wonder if there's a problem with the Windows physicaldisk or logicaldisk WMI. that might show up in event log.

maybe process disk counters in WMI are still ok, but physicalDisk or LogicalDIsk are borked.

oh. Windows update is doing its thing.
no logicaldisk or physicaldisk numbers came through in perfmon for quite some time, either.
but the disk counters in the process object kept getting updated.
its over now.

but my disk bytes/s are too low now, not what i expected.
i think the pending updates are bogging down performance. thank goodness i'm free to just install and reboot.

wish there was a way to tie performance impact to a pending update without an exhausting investigation.

i don't have time for that. too many other investigations.

it looks like something in this set of Windows updates really harshed my groove.

both in #sqlserver performance, and observability of performance since at least the disk counters were unavailable for quite a while.

ok, the RDP connection has gone from silently failing to timing out now. it's making progress on the updates.

omigosh. it's worse after the install and reboot. both the process disk numbers and the disk graph are frozen now.

hm. clock in the rdp still says 11:47, it hasn't updated in 4 minutes either.

i'm starting to get the hint.
2 DOP 8 MBoR queries. Looking much better now.

This might be an interaction between Carbon Black and the KB4589208 install, which failed with Error 0x80070002.
my time on this system is a temporary gift, i didn't realize Carbon Black was on here.

ok, 2 DOP 8 BMoR queries. Looking much better now.

1.5 billion - 2 billion disk bytes/s is the range for two of these DOP 8 BMoR queries run simultaneously. i've got ~485 of these queries in the stack so this can run continuously and keep churning the bpool. Let's see how 3 and 4 queries concurrently look.

Four DOP 8 queries doesn't give much much more bytes/s disk throughput. just more tightly grouped at 1.8 billion bytes/s.

its a thread scheduling problem. i'm purposely avoiding tables large enough for full scan of pk/ci to be disfavored.
and all of these queries are BMoR pk/ci/heap full scans.

so i have 4 sessions, each running fairly fast dop 8 bmor queries in succession. each dop 8 query decides its thread placement on its own. my cpu utilization is ranging from 48% to 67%. Four DOP 8 BMoR queries gives me 32 BMoR workers churning through their part of full scans.

each of those workers could drive a vcpu to near 100% utilization by themselves. So 32 of 48 vcpus with a BMoR worker doing fullscan; system should stay at ~65-67% cpu utilized *if* each worker on their own vcpu.

but when some of the parallel workers for session 55 double up on a scheduler with parallel workers for session 52, CPU utilization drops below 65%.

i could micromanage this and get the parallel workers for each DOP 8 query on their own scheduler (for up to six DOP 8 queries on this 48 vcpu VM).
But that's a pain. Can also assume a range of doubling up, and then overscheduler concurrent queries to compensate.

that would mean a "full rack" on this 48 vcpu vm wouldn't be six DOP 8 queries, but 9-12. Maybe 9 DOP 8 queries will be enough to get one BMoR parallel worker on every scheduler.

what the heck!!? i almost missed this entirely.
just a moment ago my 4 BMoR dop 8 queries were up over 6 billion bytes/s.
So the range for 4 DOP 8 BMoR queries is 1.8 billion - 6 billion bytes/s.

that's way too big of a range for this tool.

my first pass was really ugly; i was doing something with VARCHAR or NVARCHAR columns in these rowstore tables. problem was the string column might be in off-row storage. Even though readahead gives good throughput for in-row data, off-row storage pages don't readahead.

https://twitter.com/sqL_handLe/status/1471184916142563338

or, if off-row storage pages *do* readahead, they weren't using readahead in my query :-)

https://twitter.com/sqL_handLe/status/1471184916142563338

so i was more selective. that first pass, i just grabbed really large tables. This pass, i made sure they weren;t so large as to be disfavored. And i only used tables that had no off-row storage pages. but enough rows to ensure BMoR.

7 billion disk bytes/s for four DOP 8 BMoR queries.
And... this isn't super-duper hardware. this is the "almost time to cycle it out of the datacenter" hardware.

well, i'm glad i looked into the Windows updates and got them squared away (ish) for now.
this level of activity wasn't even possible on this system just a little while ago.

https://twitter.com/sqL_handLe/status/1471192070446530561

another clean reboot, all updates installed, no more eligible updates.
fired up logman to csv for perfmon counters bc Resource Monitor CPU and disk visualizations are not synchronized with each other, and that bugs me.

https://twitter.com/sqL_handLe/status/1471192070446530561

ok, power run of all queries at DOP 8 one after the other. 484 queries. made it through 186 of them since 12:59; about 14 minutes.

my math is bad. it really is 484 queries, but the pace so far has been 183 completed in 15 minutes.

300 of the queries in 23 minutes.

i need at least one more full power run of the 484 queries before i take a close look, anyway.
because the queries which ran while the bpool was ramping up had different runtime conditions than those while the bpool was at "mature" size.

at 30 minutes, ~400 DOP 8 queries have completed.

35 minutes for the first power run.

time to theorize why these full scans have such varying bytes/s rate in a power run.

i think it comes down to rows/page. After a tipping point, additional rows/page for this query probablybrings down the bytes/s rate for the query. Even at DOP 8.

so to get a subset of these rowstore queries with a tighter bytes/s range and an increased average bytes/s, i may have to get rid of tables that exceed a certain number of rows per page in the PK/CI/heap.

i think the power run that had bpool rampup at the beginning will do slightly better than the "all mature bpool" power run.
and that just might come down to contention brining new pages into the bpool when there is next to no free memory, vs no contention during bpool rampup.

ah, no real conclusions to be made about that right now.
this second power run also finished in 35 minutes. Too little difference to analyze for that purpose. but two runs close enough to each other for me to whittle down the count of queries and bring up avg disk bytes/s :-)

oh. i've maybe made a mistake in assigning a table/query to a session in multi-session tests.
no two sessions will overlap execution of the same query, but i didn't do much to prevent a session arriving at a query for table A while table A is still in cache from a prior query.

this is the kind of thing one might see if the various sessions were stagger-started and maybe separated by 10 items each in the list. but over time drew closer and closer and suddenly were right next to each other.

then the follower would get higher cache hit ratio due to the leader's prior query.
rats.

let's see. if the list is sufficiently long i think it will be anough to always grab the next assignment based on the assignment most recently given.

i was making assumptions about "most recently given" based on placement of the sessions in the task list. instead i actually need to store and look up the most recent assignment time for the 484 table queries in the list.

there's the two power runs. although the shape is chaotic, the echoes between the two power runs can be seen.

these graphs are with a 5 second interval.

i'll sort the tables in ascending order of rows/page for the pk/ci/heap, which i think will give me a power run for a single session with bytes/s declining over time.

for the prior two power runs, this was surprising. there's 8 Windows volumes with the main database data files, and two voolumes with tempdb data files. a lot less in current disk queue length than i was expecting.

314 queries done.
sorting based on ascending rows per page seems to have worked well to give me an avg disk bytes/s that declines over time.
there are still some surprises in there, though.

because it's only 1 session, the ratio of plan compile time to plan execution time is quite important. because disk IO during plan compile won;t be very high.

so this run started at 2 billiob bytes/s according to perfmon, and now it has settled below 1 billion bytes/s. i imagine it'll stay below 1 billion bytes/s now till it's done.

under 800 million bytes/s now.

400 queries complete, 84 to go.

under 600 million bytes/s.
40 queries to go.

seems like my prediction fits better at the low throughput end than the high throughput end.

my prediction wasn't perfect but that's pretty nice, innit?

and the queue length

didn't show this yet.
workload was basically 100% readahead.

this suggests to me that the readahead target is not sufficient for a DOP 8 BMoR query.
At DOP 4 i expect the queries to take a bit longer, but the waits may be more hidden because readahead might be enough to hide the waits more often.

ack. i forgot to share the graph 🤣

yeah, with the current readahead, scaling DOP 4 BMoR queries might be more productive than scaling DOP 8 BMoR queries when the goal is to max disk throughput.

what i'm really trying to do is stress the buffer pool with the highest rate of traffic i can.
turns out that is largely the same as trying to max disk read throughput 🤣

uh-oh. what if readahead already accounts for DOP of the scan operator?

then i'd expect everything to stretch by 2 for the DOP 4 queries compared to DOP 8. twice as long at half throughput. same number of total waits, same total amount of wait time, accumulated at half speed.

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

L_ N___

Try unrolling a thread yourself!

More from @sqL_handLe

L_ N___

L_ N___

L_ N___

L_ N___

L_ N___

L_ N___

Did Thread Reader help you today?

Like this author's thread?