I should add that this isn’t the first time this has happened, but it is the first time since I reduced the allocation of RAM for PostgreSQL in the configuration file. I swore that that was the problem, but I guess not. It’s been almost a full week without any usage spikes or service interruptions of this kind, but all of a sudden, my RAM and CPU are maxing out again at regular intervals. When this occurs, the instance is unreachable until the issue resolves itself, which seemingly takes 5-10 minutes.
The usage spikes only started today out of a seven-day graph; they are far above my idle usage.
I thought the issue was something to do with Lemmy periodically fetching some sort of remote data and slamming the database, which is why I reduced the RAM allocation for PostgreSQL to 1.5 GB instead of the full 2 GB. As you can see in the above graph, my idle resource utilization is really low. Since it’s probably cut off from the image, I’ll add that my disk utilization is currently 25-30%. Everything seemed to be in order for basically an entire week, but this problem showed up again.
Does anyone know what is causing this? Clearly, something is happening that is loading the server more than usual.
Here’s an update. I set up atop on my VPS and waited until the issue occurred again. Here’s the atop log from the event.
ATOP - ip-172-31-7-27 2023/07/22 18:40:02 ----------------- 10m0s elapsed PRC | sys 9m49s | user 12.66s | #proc 134 | #zombie 0 | #exit 3 | CPU | sys 99% | user 0% | irq 0% | idle 0% | wait 0% | MEM | tot 957.1M | free 49.8M | buff 0.1M | slab 95.1M | numnode 1 | SWP | tot 0.0M | free 0.0M | swcac 0.0M | vmcom 2.4G | vmlim 478.6M | PAG | numamig 0 | migrate 0 | swin 0 | swout 0 | oomkill 0 | PSI | cpusome 63% | memsome 99% | memfull 88% | iosome 99% | iofull 0% | DSK | xvda | busy 100% | read 461505 | write 171 | avio 1.30 ms | DSK | xvda1 | busy 100% | read 461505 | write 171 | avio 1.30 ms | NET | transport | tcpi 2004 | tcpo 1477 | udpi 9 | udpo 11 | NET | network | ipi 2035 | ipo 1521 | ipfrw 20 | deliv 2015 | NET | eth0 ---- | pcki 2028 | pcko 1500 | si 4 Kbps | so 1 Kbps | PID SYSCPU USRCPU VGROW RGROW RDDSK WRDSK CPU CMD 41 5m17s 0.00s 0B 0B 0B 0B 53% kswapd0 1 21.87s 0.00s 0B -80.0K 1.2G 0B 4% systemd 21681 20.28s 0.00s 0B 4.0K 4.2G 0B 3% lemmy 435 18.00s 0.00s 0B 392.0K 163.1M 0B 3% snapd 21576 17.20s 0.00s 0B 0B 4.2G 0B 3% pict-rs
The culprit seems to be kswapd0 trying to move memory to swap space, although there is no swap space.
I set memory swappiness to 0 on the system for now, I’ll check if that makes a difference.
Tbh, I haven’t really had this issue in a few weeks. I’m tempted to think it’s usage-related, and could possibly indicate that my memory allocation for the DB is still too high.
Depending on your timezone, it is possibly a peak in traffic from the US, an overlap of July 4th, Reddit userbase jumping in, and the recent surge on shitposting about…sigh… beans.