approximately 15 minutes into the test.
if this #sqlserver vm becomes unresponsive, it'll be ~15 minutes from now.
well, that was unexpected. The prior screenshot was from 10:45 am. it only took 8 minutes - the vm became unresponsive at 10:53 am
remote restart command issued at 10:55...
in this test, when the remote restart commands don't fail outright, they take several minutes to complete.
CPU is near 100% utilized. Almost all privilege time. Available memory hovers around 200 mb.
FZPL memory hovers near zero mb.
200 mb imo is too small of a low water mark.
As of 11:00 am the remote shutdown command has returned. but shutdown of that vm has not yet completed.
in order to ensure a more reponsive remote shutdown under memory crisis, either the low water mark for available memory must be increased from 200 mb, or a low watermark of FZPL memory must be imlemented.
Windows shutdown of that vm has still not completed at 11:10 am.
it is known that a sufficiently low #sqlserver [Max Server Memory] on this vm with 64 gb vRAM will avoid memory crisis with this workload of 4 concuurent BMoR queries.
i'm trying to locate that tipping point.
as of 11:20, remote shutdown of the vm has not yet completed.
the remote shutdown command returned over 20 minutes ago.
i *could* ask a colleague to reset the vm, and re-gain access about a minute afterward.
but i want to understand the liability in such cases.
in the cloud, especially with PaaS offerings, service recovery options will be more limited. And i gotta be ready for that.
35 minutes after remote shutdown command issued.
30 minutes after remote shutdown command returned.
remote shutdown has not yet completed.
A low watermarmark of 200 mb Available Memory probable works just fine for most systems and workloads.
So perhaps expose configuration of the LWM. Like AIX does.
45 minutes after remote shutdown command issued.
40 minutes after remote shutdown command returned.
remote shutdown has not yet completed.
hm. actually i think that 200 mb Available Memory LWM *is* exposed in a Windows registry setting. Or at least used to be...
just left this feedback.
~~
i visited this page -- techcommunity.microsoft.com/t5/windows-ser…
originally published Aug 27, 2007.
re-published Apr 04, 2019.
no indication if HeapDecommitFreeBlockThreshold Windows registry key is still applicable, or in what conditions.
but since this key seems to have a goal of reducing fragmentation in the pagefile, even if it *is* still active in Windows 10+, i don't think it will offer any mitigation to what i'm currently studying.
but i find a relevant or adjacent registry key, and start pulling the thread... hoping to find what i'm looking for.
meanwhile the remote shutdown command was issued 1 hour 13 minutes ago, and returned 1 hour 8 minutes ago. Remote shutdown has not yet completed.
sooner or later i always come back to this stellar series from @markrussinovich.
this is not only the first in the series, but at first glance most relevant to my investigation.
~~
Pushing the Limits of Windows: Physical Memory
2008 July 21 techcommunity.microsoft.com/t5/windows-blo…
here's one way you can tell that post is from 2008.
~~
"the 64-bit Windows Server 2008 Datacenter can handle a whopping 2TB. There aren't many 2TB systems out there, but the Windows Server Performance Team knows of a couple, including one they had in their lab at one point."
"The maximum 2TB limit of 64-bit Windows Server 2008 Datacenter doesn't come from any implementation or hardware limitation, but Microsoft will only support configurations they can test. As of the release of Windows Server 2008, the largest system available anywhere was 2TB..."
imo the former 2 TB limit was based on sound policy.
dang. that first post is still awesome. but for me, it's main benefit is to remind me how joyous i was once i no longer had to work with 32 bit windows or 32 bit #sqlserver 😂. nothing in it relevant to this investigation.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
the #sqlserver production and perf/scale test systems i work with have recommended configs based on models and observations by my colleagues. those recommendations are solid :-)
but not every non-prod vm needs to have production-level perf/scale.
today i was asked about a non-prod #sqlserver instance used as a first stop for report-writers and developers. For schema exploration, ensuring queries are correct logically.
Two vcpus, 8 gb vRAM.
That's *way* smaller than any production or perf/scale test #sqlserver system i've worked with. production schema, but small test data set. #sqlserver 2019.
i completely understand the value of such a resource. still, at first i was a bit frightened :-)
so my ideas for tomorrow
- DisablePagingExecutive
- LowMemoryThreshold = 256 mb
after testing those options individually in hopes of making remote shutdown more responsive, continue testing lower values of Max Server Memory, looking for test completion w/o error.
i've been batting this problem around for *four* years now. time to make some serious headway.
comes a time when "just keep setting [Max Server Memory] lower if the vm becomes unresponsive" is no longer a palatable strategy.
in this post, the Windows response to low virtual memory is described. but in my tests, virtual memory is not low. Paging space utilizations plateaus ~12% in these tests. #sqlserver with LPIM allocates bpool via AWE, so not eligible for paging.
specifically of interest to me is the Windows Available Memory low water mark, at which point physical memory contents start paging out to pagefile. that's 200 mb, and i want to know if configurable.
och. the stuff i'm looking at today has so many variables and so much variability its about to do me in.
i'm trying to help optimize work in a workload group with a small footprint of concurrent queries and workers while other workload groups are active.
the gray inside the red-dashed-line boxes - that's what i'm trying to help out. without doing unnecessary damage to the other workload groups.
might be a long haul.
its hard to pick out the CPU utilization for just that workload group from 3 am to 6 am.
and the waits it experiences aren't all that easy to eliminate (memory_allocation_ext waits).
there's some good stuff in here about #sqlserver soft-NUMA which applies to autosoftNUMA
~~
Understanding Non-uniform Memory Access
2012 October 4 docs.microsoft.com/en-us/previous…
"Memory nodes are created based on hardware NUMA and therefore not impacted by soft-NUMA."
Thumbs up! soft-numa nodes must be fully contained in memory nodes. so memory nodes enforce something on soft-numa, but not the other way 'round.
"Soft-NUMA does not provide memory to CPU affinity."
Thumbs up, i think. i guess i would probably have said that memory-to-cpu affinity is based on memory node rather than soft-NUMA node.