After I thought I understood memory manager after all, this comes along. I have a behavior I can’t understand or explain. Let’s see if anyone can solve the puzzle.
You have a machine, lets say with 8GB Ram. You only run a small number of process on that machine and you want to add another service.
You run free -m and it looks like this:
total used free shared buffers cached Mem: 7980 4814 3165 0 1 4593 -/+ buffers/cache: 220 7759 Swap: 1759 0 1759
And you think hey, no problem a lot of free memory available. 220MB “real used” memory and 7759MB “freeable memory”. Let’s bring it on.
You start you application (I use memhog from numactl) which eats 4G of RAM and gives you a OOM.
Who knows why?
(The solution is attached as base64 encoded block) To get a peek simple execute the block in a shell.
cat << EOF | openssl base64 -d VGhlIHByb2JsZW0gbGllcyBpbiB0aGUgZmFjdCB0aGF0IHRlbXBmcyBpbiAvZGV2 L3NobSBpcyBmdWxsLgpXaHkgdGhlIGNvbnRlbnQgb2YgL2Rldi9zaG0vIGlzIGNv bnNpZGVyZWQgY2FjaGUgYW5kIG5vdCB1c2VkIGlzIGJleW9uZCBtZS4gQnV0IGl0 IGlzIHRoaXMgd2F5LgoK EOF
I would still love an explanation why it is that way.
I had the same behaviour on F18.
Did you open a ticket on Bugzilla ?
BR
/f
Hi FHornain,
I don’t think it is a bug. Because this behavior occurs on everything from RHEL5 to F18 and F19Alpha. But maybe it is a bug. Have search BZ for this but nothing yet.
I would say that it is a bug; tempfs can not be reclaimed, so it should not be counted as buffer/cache.
This is true, and obvious when you consider it: The pages in tmpfs are cached file pages just like everything else in cache. What’s different is there’s no backing store to flush them to or retrieve them from. So you can think of tmpfs as a normal filesystem where the disk always responds to all read requests with zeroes and always responds to write requests by throwing the data away and returning success. Consequently tmpfs cache pages can’t be evicted because they can’t be read back consistently.
The other issue with considering cache “freeable” is that dirty cache pages have to be laundered before freeing, and most commands don’t make a distinction. It usually takes some pretty intense IO workload to have enough dirty cache pages that you care, but it does happen.
try editing /etc/sysctl.conf
vm.overcommit_ratio=100
vm.overcommit_memory=2
reboot. and try again.
The above will allocate memory + swap.
If the OS can’t allocate memory it informs the calling program it can’t. ie.e. malloc will return an error.
By default Linux will allocate unlimited amount of memory. It assumes you are not going to use it. If you do in fact use it (write to it) and there is not enough memory or swap available to satisfy your request, *then* the oom-killer gets called and nukes semi random processes.
The above change says don’t allocate memory you don’t have (>real+swap). I believe this is the more traditional Unix way of allocating memory.
I might be wrong, but that is how I understand it, Give it a go and see what happens.
No need to reboot. After changing sysctl.conf, just activate it with sysctl -p /etc/sysctl.conf.
Though, if those settings are restricting your active environment too much, you might as well have rebooted the machine 🙂 (same as deactivating a swap partition)