Just before Christmas, we encountered monitoring issues with two host nodes (Debian) within our virtual cluster. Although all guest virtual machines were running perfectly fine – the nodes had stopped reporting virtual machine usage to our billing platform.
On investigation, we noticed that running pvscan, vgscan or lvscan would take 60+ seconds to run which our reporting cron must run.
We tested lvm by running the command in verbose mode and noticed that it was waiting for a long time at one command.
pvscan -vv vgscan -vv lvscan -vv
We also noticed that /etc/lvm/archive/ contained tens of thousands of files whereas our other nodes had less than 1000 files.
After further reading, we found that this was caused by an intermittent route to one of our iSCSI SAN’s with multipath configured.
Here is how we fixed it
As we do not use LVM snapshots for restore purposes, we do not need LVM archiving.
We edited /etc/lvm/lvm.conf and changed the line
archive = 1
archive = 0
We restarted the lvm service by running
And then deleted all files within the archive folder with the following command
find /etc/lvm/archive/ -type f -delete
If you simply try to
rm -rf * within the folder then you will encounter an argument too long error as there are too many files for rm to handle.