Hastening process cleanup

One of the fundamental invariants of computing is that, regardless of how much memory is installed in a system, it is never enough.

This is especially true of systems with tight performance constraints, where every page of memory is allocated and in use, making it difficult to find more when it is badly needed. One way to make more memory available is to kill one or more processes, freeing their resources for other users. But that often does not work as quickly or reliably as users would like. In an attempt to improve the situation, Suren Baghdasaryan has proposed the addition of a system call named process_mrelease(). 

Systems running mixed workloads, where some tasks are more important than others, are not uncommon. If the system is being run near its maximum capacity, the relatively unimportant tasks may end up using memory that is needed by the more important work, at which point it might be better if the unimportant processes went away. Such systems often run process managers that will kill off the low-priority processes in these situations; perhaps the most widespread example of this pattern is Android, which will kill background apps if the available memory is insufficient for whatever is running in the foreground. Cloud-computing systems will also kill low-priority, best-effort workloads if their memory is needed by more important work. 

Killing a process should, in principle, make its memory immediately available for other users. In the real world, though, things are not so simple. The killed process is, itself, responsible for cleaning up and freeing its resources, a task that is carried out in kernel context. If, however, the killed process finds itself blocked in an uninterruptible sleep, that cleanup work could be delayed indefinitely. There are other factors that can slow down the freeing of memory, including how busy the relevant CPU is and whether that CPU is running in a slow, low-power state. 

When this happens, the system has paid the cost of killing the process (which was presumably doing something useful) without receiving the benefits from that action. Unfortunately, those benefits tend to be needed urgently; the system would not be killing processes otherwise. Delays in process cleanup can have immediate and visible effects on the higher-priority workloads; these can include jerky response on a handset or a delay in the delivery of a cat video to an impatient viewer. 

This problem was encountered years ago in the context of the system's out-of-memory (OOM) killer, which is the kernel's last-resort response when memory runs out. Back in 2015, the development of the OOM reaper addressed this problem by taking the memory cleanup work out of the dying process's hands and making it the responsibility of a separate kernel thread. That made OOM killing significantly more robust, with the ability to free memory quickly even if the chosen process is not able to exit immediately. 

Full verston 


The number of callers of process_mrelease() is likely to be small, but it seems there will be some situations where it will be a useful tool to have.