I’ve been making use of VMs for test environment use for the past several months. One of these VMs is a CentOS Linux server. In this particular VM the system clock of the Linux box advances at a rather brisk pace. I think last week it thought it was already the end of March, this after I had just set the clock correctly a few weeks ago. NTPD is running in this instance but even it can’t seem to keep control of the racing clock.
I have been meaning to look into it, but as this is a test environment it is often put on the backburner. Today, while catching up on one of my CentOS mailing lists I stumbled across the issue. Looks like the 2.6 kernel is not officialy supported by some versions of VMware. Here is the informative part from the email post that helped shed some light (credit to Aleksandar Milivojevic):
In current versions of VMware (for example ESX 2.5.x), 2.6 kernels are not yet officially supported. What you described is one of the problems with 2.6 kernels and VMware. Add “clock=pit” kernel option (in grub.conf or lilo.conf, whichever boot loader you use), don’t use NTP to sync time, install vmware-tools onto each guest and enable time synchronization in them (by default it is off). It should keep time in your guests under some controll. The problem is mostly because 2.6 kernels are much stricter in watching the frequency source selected for clock, and they also increased the frequncy of interrupts requested from it from 100Hz to 1000Hz (one global + one per CPU, or something like that). This frequency is compile time kernel option (it is hard coded into the kernel, can’t be changed once kernel is compiled). Furthermore, frequency of interrupts increases with number of processor cores (so if each of your guests is configured with two virtual CPUs, it’s 3000 interrupts per second per 2.6 guest, compared to only 300 per 2.4 guest). With many guest running on bussy box, VMware might not be able to generate all needed virtual interrupts for 2.6 guest operating systems, and you get clock problems you are having. There’s a code in clock code in 2.6 kernel that attempts to correct for missed/skipped interrupts. However under VMware it tends to overcorrect and your clock starts gaining time fast, like you described. This is classic problem you’ll encounter with current versions of VMware and guests running 2.6 kernel. It should be corrected in Vmware ESX 3.x (which should also have official support for 2.6 kernels).
Looks like I still have a little more reading to do, but I love it when I happen to stumble across a solution to an issue I am experiencing at work. Looks like once again my time spent on mailing lists pays off!