Showing posts with label Windows Server 2008. Show all posts
Showing posts with label Windows Server 2008. Show all posts

11 Aug 2010

SP1 Joy for Windows 7 and Server 2008 R2

My main issue with Windows 7 and Server 2008 R2 was the removal of the otherwise simple feature of restoring open folders after a reboot. Currently Windows restores the folders in a random cascaded position forcing you to tidy up your desktop every time you boot up. It appears that through pressure from many users Microsoft have listened. Yay!

SP1, to be released in early 2011, will restore this much missed feature that was present in XP and Server 2003. From the documentation:

“SP1 changes the behavior of the ‘Restore previous folders at logon’ function available in the Folder Options Explorer dialog. Prior to SP1, previous folders would be restored in a cascaded position based on the location of the most recently active folder. That behavior changes in SP1 so that all folders are restored to their previous positions.”

Forget dynamic memory and RemoteFX, this is what I’m missing! :-)

25 May 2010

Windows 2008 Hyper-V / VSS / Backup Bug Part III

Good news! We’ve been issued by Microsoft with a public release of the hotfix KB982210, as it will be known. The fix will only work on 2008 R2 and not with any previous releases of the OS.

So how does the fix work? First let me explain the problem more clearly than in previous blog entries. Whenever a device is attached to Windows the Plug & Play Manager creates an entry for that device in the registry. If it’s a USB device, for example, and you unplug it then its entry will remain in the registry so when it’s plugged back in the computer will recognise it and any settings that have previously been set up for it. The same is true for snapshots created by VSS, the Volume Snapshot Service. A snapshot is treated like a device so one a new snapshot is created so too is a registry entry for the device.

Now here’s the problem. The registry entries are not removed – ever. While many users will never have a problem with that there are a number of power users who generate 1000’s of snapshots over a short period of time. For example, in our case where we use Windows Server Backup (WSB) with a backup schedule set to every 30 minutes which includes backing up 14 VHDs used by several VMs on Hyper-V. VSS will create 14 snapshots (one for each VHD) every time a backup is run. That’s 14 snapshots every 30 minutes. That’s 672 a day and over 20,000 per month. See how quickly they mount up and none of the device entries in the registry are being removed.

Severe problems manifest when the host server is rebooted and the registry is processed, analysing tens of thousands of devices, causing the server to look as if it has hung. It freezes for 2 or 3 hours, possibly more if I had let the server carry on taking more backups.

The registry key you need to check in Windows 2008 R2 is:
HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Enum\STORAGE\Volume

There should be 10 to 50 entries in there for a normal healthy machine, depending on how many devices you have attached. On our server we had 28,000 entries!

Now, how does the fix work?

The new hotfix from Microsoft makes a change to the Plug & Play Manager that adds a timestamp (or a tombstone date as they prefer to call it) for each new snapshot device that’s created. This means Windows is now aware of exactly when a snapshot was created and can make a decision to fully remove the device’s entry from the registry after a certain period of time. I’m not sure how long it waits but from our experience it can be counted in minutes rather than days or weeks.

So well done to Microsoft for creating a smart solution to a critical problem. It took a massive amount of effort to get our case brought to the attention of the right person. Before that I had spent weeks working with a Microsoft support team in India by phone and remote desktop trying to explain what the problem was (and no, the problem will not be resolved by re-installing Windows, thank you!). It wasn’t until a Premiere Support case was opened and a Microsoft account manager from the UK got involved who contacted the right technical person that we started to make rapid progress. Microsoft had been aware of the technical issue for a while but our case seemed to have given them the incentive to fully investigate it. And we are grateful that we were finally listened to and some very expensive new servers can finally be put into service.

Cleaning up the registry

The only problem that remains is for other people experiencing this. You need to clean up your registry before applying the hotfix otherwise the freezing symptom will persist. To do this you need a tool from Microsoft called devnodeclean. Sadly this is not available anywhere on the Internet to download based on my Google and Bing searches. Microsoft should be able to email you a copy if you open a support case with them and refer them to KB982210 and this blog entry for good measure. Run devnodeclean without any switches at first to see what it makes of your registry, then use the /r switch to force it to remove the unwanted devices from the registry.

Next you may need to compact your registry if it has become huge, anything over 50MB I would say. Ours peaked at over 450MB. The “system” hive (found in C:\Windows\System32\config\) can be compacted using regchk using the switches /l /c /r /v. Again, chkreg is only available by request from Microsoft and was in fact developed to repair the registry in Windows 2000 but amazingly still works for 2008 R2. Please note that regchk cannot compact a live “system” file. You need to back up your system settings first and run chkreg on a restored copy of the “system” file (restore it to a new folder somewhere else). Then boot up from the Windows setup DVD and enter the recovery console. Rename “system” to “system.old” and copy the compacted system file into the config directory. Then reboot into Windows.

2 Apr 2010

Windows 2008 Hyper-V or Volume Shadow Copy Bug Part II

Following on from the problem I blogged about a couple of weeks ago, this is definitely a bug in Windows. From what we’ve been told it’s an inheritary, built-in limitation with VSS that prevents it from cleaning up after itself once 9,999 snapshots have been created. Therefore it only manifests if your server has created that many snapshots, so the explanation goes.

Now, not many people run snapshots frequently enough to encounter the problem but since our servers are running a backup every 30 minutes and there are 14 VHDs connected to Hyper-V we would run into the problem in just 15 days. That’s the explanation we were given but that doesn’t explain why after cleaning up the registry using devnodeclean and chkreg the registry will immediately start to bloat during the next backup. Hmmm. And we encountered symptoms (freezing) only after the first week – that’s less than 9999 snapshots. Hmm.

I’m hoping the engineers investigating this can come up with a permanent fix very soon.

15 Mar 2010

2008 Server Freeze, Hyper-V or Volume Shadow Copy Bug?

We have been scratching our heads over a very strange problem for the last 4 weeks which causes two new servers to lock up for up to 2 hours after logging on after a reboot. They’re running Windows 2008 R2 with Hyper-V and Windows Server Backup roles installed.

After trying plenty of ideas to eliminate the problem it was pointed out to us by a Microsoft support guy that our System hive file was 343MB in size. It’s only supposed to be 15 to 20MB. I exported it as an ASCII file from regedit and opened it in Notepad. I counted 24,000 entries for VSS Snapshot devices! When Windows boots it tries to process 24,000 devices which causes it to choke killing the server for two hours – although the VMs limp on underneath and the host responds to pings but both the remote and local console is completely frozen.

Example registry entry:

[HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\Class\{533C5B84-EC70-11D2-9505-00C04F79DE2F}\0349]
"InfPath"="volsnap.inf"
"InfSection"="volume_snapshot_install"
"InfSectionExt"=".NTAMD64"
"ProviderName"="Microsoft"
"DriverDateData"=hex:00,80,8c,a3,c5,94,c6,01
"DriverDate"="6-21-2006"
"DriverVersion"="6.1.7600.16385"
"MatchingDeviceId"="storage\\volumesnapshot"
"DriverDesc"="Generic volume shadow copy"

Trying to delete the snapshots using vssadmin from the command prompt threw this error: “Error: Snapshots were found, but they were outside of your allowed context.  Try removing them with the backup application which created them.

So the question is what is causing 1000’s of VSS (volume shadow copy) snapshots to be created? A clue was found in the system event log when Windows Server Backups runs: “Failed to delete the shadow copy (VSS snapshot) set with id '1A1938A0-1590-4BF4-8173-20DF5FD69E36' in the running virtual machine 'MGT01': Unspecified error (0x80004005). (Virtual machine ID A3F941F1-ED7F-48E9-9CD7-CB7C28A6604A)

We’re using Windows Server Backup (WSB) to take incremental backups every 30 minutes for a bare metal restore of the host and its Virtual Machines. That’s 48 backups a day of 14 VHDs for 42 days that the servers have been running for. Do the maths and that comes to 28,000 VSS snapshots. Taking into account that some backups failed to run and we stopped backups for a few hours here and there, this tallies with the 24,000 devices I counted in the registry. Bingo!

So the bottom line is that the VSS writer creates a snapshot for each VHD at backup time but for some reason isn’t deleting the entries from the registry, although it is deleting the actual snapshots otherwise we’d have run out of disk space by now. Everything points to a bug in either the VSS writer or perhaps WSB or Hyper-V. They’re so tightly integrated during the backup process it’s hard to say which of the 3 is the culprit.

Since this problem is reoccurring on two new servers from Dell we are sure this isn’t a one-off freak incident. There is only 1 other similar incident reported on the web and that was a year ago on a HP server using BackupExec with the Hyper-V aware option. I’m waiting for Microsoft to get back to me, although I’ve been warned that even if they admit it’s a bug it could take a long time to produce a fix. We’d love to know why 1000’s of people who use Hyper-V and take frequent backups aren’t experiencing the same problem. There is no other software installed on the host apart from standard Dell drivers. Weird!