in this post, the Windows response to low virtual memory is described. but in my tests, virtual memory is not low. Paging space utilizations plateaus ~12% in these tests. #sqlserver with LPIM allocates bpool via AWE, so not eligible for paging.
specifically of interest to me is the Windows Available Memory low water mark, at which point physical memory contents start paging out to pagefile. that's 200 mb, and i want to know if configurable.
increasing that low water mark *won't* necessarily resolve the #sqlserver issue. But it *can* improve the responsiveness of a remote shutdown command.
it is npw 1 hour, 45 minutes from the time the remote shutdown command returned. shutdown is not yet complete.
nothing in this second post that describes the 200 mb LWM for Available Memory.
"nonpaged pool virtual memory is assigned physical memory. Common system data structures stored in nonpaged pool include the kernel [, process & thread objects,] synchronization objects like mutexes, semaphores and events, references to files... and I/O request packets (IRPs)"
this "Pushing the Limits of Windows: Paged and Nonpaged Pool" snippet doesn't help with the 200 mb Available Memory LWM, but does help explain how #sqlserver AWE locked pages can lead to an unresponsive system when only ~12% of paging file is in use...
"There are three performance counters that indicate pool usage:
- Pool nonpaged bytes
- Pool paged bytes (virtual size of paged pool – some may be paged out)
- Pool paged resident bytes (physical size of paged pool)"
note: still no way to account for which process ID owns what footprint in pagefile. even the -l enhancement to the Handle utility only shows "pagefile-backed virtual memory segments", not necessarily which segments *have* been sent to and reside in pagefile.
Alas, nothing in post 3 that illuminates the 200mb Available Memory LWM. i'll link the remaining posts in the series...
here's post 4.
~~
Pushing the Limits of Windows: Processes and Threads @markrussinovich
2009 July 5 techcommunity.microsoft.com/t5/windows-blo…
so... nothing in the "Pushing the Limits" series to illuminate the Available Memory lower water mark that kicks in use of pagefile. In a way, that's not surprising. The series was focused on upper bounds, and upper bound.
But, as physical memory use approaches the physical memory limit on a given system, suddenly the LWM at which to engage page stealing becomes relevant.
And when that LWM is not sufficiently high*, protracted system shutdown may result.
*can also be important that the difference between LWM and HWM, when page stealing disengages from "emergency" measures, be sufficiently large. imo LWM and HWM may require adjust in this case.
Windows very likely calls these marks something other than "low water mark" and "high water mark" -- that's almost certainly a factor in my failing to find them so far.
gotta go with the terminology i know, though. till i learn something else.
hm. rdp connection to the test vm currently silently fails. it's either in a very ate stage of shutdown, or an earl stage of startup.
RDP is still silently failing over 20 minurws later. after entering credentials "Securing remote connection" silently fails. so this is one of the last stages of a *very* long shutdown.
four hours and ten minutes after the remote shutdown command returned.
i have my reasons to believe either that's not the LWM i am interested in... or the value is 200 mb rather than 96 mb.
the Amazon documentation links to this Microsoft page, which mentions low memory notification, but does not specify a threshold for that notification. docs.microsoft.com/en-us/sql/rela…
anyway, i really like this graph. nearly 100% privileged time. Available memory hovering near 200 mb. FZPL hovering near 0. Paging space used plateau near 12%.
i've called this condition memory thrashing before... dunno if that's right.
used to see this condition on HP-UX if the minimum and maximum percentages for filesystem cache were set very close to each other, but not equal to each other. (not necessarily use of paging spce on disk... ++
just the memory amount hovering very tightly (in that case the amount of filesystem cache) and excessively high system/privileged CPU time.
hm. @SQLPoolBoy also mentions the 96 mb available memory LowMemoryResourceNotification threshold in this 2012 April 3 blog post... sqlskills.com/blogs/jonathan…
i guess if the system has triggered page stealing because below 200 mb available memory but NOT LowMemoryResourceNotification because above 96 mb... that could indeed lead to this trouble,
8 hours 15 minute since the remote shutdown was requested.
8 hours 10 minutes since remote shutdown returned without error.
Shutdown is still not complete.
hm. remember the DisablePagingExecutive Windows registry key? Overall i don't think it'll change much in this situation... but i bet it'll make shutdown faster :-)
hm.
on this page, it says WIndows 8 and above no longer requires DisablePagingExecutive for stackwalking.
but... it doesn't say DisablePagingExecutive is without effect on WIndows 8 and above...
~~
Kernel Trace Control API Reference
2017 May 5 docs.microsoft.com/en-us/windows-…
In this 2013 post, @JoeSqlPFE references enabling the DisablePagingExecutive registry key for stackwalking...
~~
Troubleshooting SQL Server High CPU usage using Xperf
Joe McTaggart
2013 March 19 techcommunity.microsoft.com/t5/core-infras…
Aha! Here's a more recent DisablePagingExecutive reference, without the dread "We're no longer updating this content regularly" warning on the WS2003 page.
~~
perhaps it's still disabled by default on servers as @joe_elway mentions...
~~
Memory Page Combining
2014 January 7 aidanfinn.com/?p=15875
in this post, @MSSQLWIKI discusses possibility of alternate values from default for LowMemoryThreshold registry setting.
~~
Max server memory – Do I need to configure?
2013 April 22 mssqlwiki.com/tag/how-to-set…
• • •
Missing some Tweet in this thread? You can try to
force a refresh
the #sqlserver production and perf/scale test systems i work with have recommended configs based on models and observations by my colleagues. those recommendations are solid :-)
but not every non-prod vm needs to have production-level perf/scale.
today i was asked about a non-prod #sqlserver instance used as a first stop for report-writers and developers. For schema exploration, ensuring queries are correct logically.
Two vcpus, 8 gb vRAM.
That's *way* smaller than any production or perf/scale test #sqlserver system i've worked with. production schema, but small test data set. #sqlserver 2019.
i completely understand the value of such a resource. still, at first i was a bit frightened :-)
so my ideas for tomorrow
- DisablePagingExecutive
- LowMemoryThreshold = 256 mb
after testing those options individually in hopes of making remote shutdown more responsive, continue testing lower values of Max Server Memory, looking for test completion w/o error.
i've been batting this problem around for *four* years now. time to make some serious headway.
comes a time when "just keep setting [Max Server Memory] lower if the vm becomes unresponsive" is no longer a palatable strategy.
och. the stuff i'm looking at today has so many variables and so much variability its about to do me in.
i'm trying to help optimize work in a workload group with a small footprint of concurrent queries and workers while other workload groups are active.
the gray inside the red-dashed-line boxes - that's what i'm trying to help out. without doing unnecessary damage to the other workload groups.
might be a long haul.
its hard to pick out the CPU utilization for just that workload group from 3 am to 6 am.
and the waits it experiences aren't all that easy to eliminate (memory_allocation_ext waits).
there's some good stuff in here about #sqlserver soft-NUMA which applies to autosoftNUMA
~~
Understanding Non-uniform Memory Access
2012 October 4 docs.microsoft.com/en-us/previous…
"Memory nodes are created based on hardware NUMA and therefore not impacted by soft-NUMA."
Thumbs up! soft-numa nodes must be fully contained in memory nodes. so memory nodes enforce something on soft-numa, but not the other way 'round.
"Soft-NUMA does not provide memory to CPU affinity."
Thumbs up, i think. i guess i would probably have said that memory-to-cpu affinity is based on memory node rather than soft-NUMA node.