16 Aug 2010

Critical CF hotfix must be applied pronto!

Adobe released a security hotfix on 10th August 2010 and classed it as “important”. However, if security is even moderately important to your ColdFusion server, the file system, database and network then you must think of this hotfix as CRITICAL. Just to clarify, this is CRITICAL. An HP security expert has blogged about it and It also caught the eye of The Register.

Hackers have proven how easy it is to use a vulnerability in CF 7, 8 and 9 to gain access to the CF Administrator. Code to perform the hack has been made freely available (which I won’t directly link to, but others have). Mike Bailey tweeted “It works and it’s scary.” Someone else chimed in and showed how you don’t even need to hack the Administrator’s password by using a cheeky bit of JavaScript.

Why is this really bad? Well, once you have access to CF Admin you can run scheduled tasks to access the OS. Someone has kindly(?) written an FAQ explaining how it works and why it’s so bad.

Now that the world knows how to hack it, everyone running CF must now patch their server. Adobe need to hammer home the seriousness of this problem and how critical the hotfix is. “Important” doesn’t stress it enough.

If you have already made the “administrator” directory inaccessible to the Internet or IP protected it then you should be safe, but it’s a good idea to still apply this critical hotfix.

11 Aug 2010

SP1 Joy for Windows 7 and Server 2008 R2

My main issue with Windows 7 and Server 2008 R2 was the removal of the otherwise simple feature of restoring open folders after a reboot. Currently Windows restores the folders in a random cascaded position forcing you to tidy up your desktop every time you boot up. It appears that through pressure from many users Microsoft have listened. Yay!

SP1, to be released in early 2011, will restore this much missed feature that was present in XP and Server 2003. From the documentation:

“SP1 changes the behavior of the ‘Restore previous folders at logon’ function available in the Folder Options Explorer dialog. Prior to SP1, previous folders would be restored in a cascaded position based on the location of the most recently active folder. That behavior changes in SP1 so that all folders are restored to their previous positions.”

Forget dynamic memory and RemoteFX, this is what I’m missing! :-)

29 May 2010

ColdFusion Bugtracker Bug

How does one submit a bug about a bug submitting tool? Below, the ColdFusion Bugtracker after completing a 2 page form.

I tried this twice, logging in again the second time just to make sure I was logged in. Booo. Now I have 2 bugs to report to Adobe!

27 May 2010

Sessions never expire bug in ColdFusion 8.01

A couple of times a year I’ve encountered a strange problem in our ColdFusion servers where sessions mount up and aren’t removed after they should have expired. Just today we had 100’s of sessions left in memory with all of the session scope variables still there at the end of the day, hours after they should have been deleted. Automated session housekeeping ceased to be.

Two other symptoms drew my attention to the above problem which must surely be related. Emails stopped being sent, the spool directory filled up without any cfmail files leaving. And from the website CF sporadically threw the error “The session is invalid” which was temporarily resolved by closing the browser and logging into the app again.

Restarting the CF services doesn’t resolve the situation because the service refuses to stop if asked politely. A full reboot is the only way to restore normality with confidence.

You’re probably thinking what good is it complaining now because 8.01 isn’t the current release and we should upgrade to 9.x. Well how do we know that 9.x has fixed this problem? Are Adobe aware of the issue? We could run a 30 day trial of 9.x on a test server but we’d have to run it for at least 6 months with a constant load to mimic our production server keeping in mind the rare appearance of the bug.

If you have encountered this problem before or know how to fix it then please let me know.

25 May 2010

Windows 2008 Hyper-V / VSS / Backup Bug Part III

Good news! We’ve been issued by Microsoft with a public release of the hotfix KB982210, as it will be known. The fix will only work on 2008 R2 and not with any previous releases of the OS.

So how does the fix work? First let me explain the problem more clearly than in previous blog entries. Whenever a device is attached to Windows the Plug & Play Manager creates an entry for that device in the registry. If it’s a USB device, for example, and you unplug it then its entry will remain in the registry so when it’s plugged back in the computer will recognise it and any settings that have previously been set up for it. The same is true for snapshots created by VSS, the Volume Snapshot Service. A snapshot is treated like a device so one a new snapshot is created so too is a registry entry for the device.

Now here’s the problem. The registry entries are not removed – ever. While many users will never have a problem with that there are a number of power users who generate 1000’s of snapshots over a short period of time. For example, in our case where we use Windows Server Backup (WSB) with a backup schedule set to every 30 minutes which includes backing up 14 VHDs used by several VMs on Hyper-V. VSS will create 14 snapshots (one for each VHD) every time a backup is run. That’s 14 snapshots every 30 minutes. That’s 672 a day and over 20,000 per month. See how quickly they mount up and none of the device entries in the registry are being removed.

Severe problems manifest when the host server is rebooted and the registry is processed, analysing tens of thousands of devices, causing the server to look as if it has hung. It freezes for 2 or 3 hours, possibly more if I had let the server carry on taking more backups.

The registry key you need to check in Windows 2008 R2 is:

There should be 10 to 50 entries in there for a normal healthy machine, depending on how many devices you have attached. On our server we had 28,000 entries!

Now, how does the fix work?

The new hotfix from Microsoft makes a change to the Plug & Play Manager that adds a timestamp (or a tombstone date as they prefer to call it) for each new snapshot device that’s created. This means Windows is now aware of exactly when a snapshot was created and can make a decision to fully remove the device’s entry from the registry after a certain period of time. I’m not sure how long it waits but from our experience it can be counted in minutes rather than days or weeks.

So well done to Microsoft for creating a smart solution to a critical problem. It took a massive amount of effort to get our case brought to the attention of the right person. Before that I had spent weeks working with a Microsoft support team in India by phone and remote desktop trying to explain what the problem was (and no, the problem will not be resolved by re-installing Windows, thank you!). It wasn’t until a Premiere Support case was opened and a Microsoft account manager from the UK got involved who contacted the right technical person that we started to make rapid progress. Microsoft had been aware of the technical issue for a while but our case seemed to have given them the incentive to fully investigate it. And we are grateful that we were finally listened to and some very expensive new servers can finally be put into service.

Cleaning up the registry

The only problem that remains is for other people experiencing this. You need to clean up your registry before applying the hotfix otherwise the freezing symptom will persist. To do this you need a tool from Microsoft called devnodeclean. Sadly this is not available anywhere on the Internet to download based on my Google and Bing searches. Microsoft should be able to email you a copy if you open a support case with them and refer them to KB982210 and this blog entry for good measure. Run devnodeclean without any switches at first to see what it makes of your registry, then use the /r switch to force it to remove the unwanted devices from the registry.

Next you may need to compact your registry if it has become huge, anything over 50MB I would say. Ours peaked at over 450MB. The “system” hive (found in C:\Windows\System32\config\) can be compacted using regchk using the switches /l /c /r /v. Again, chkreg is only available by request from Microsoft and was in fact developed to repair the registry in Windows 2000 but amazingly still works for 2008 R2. Please note that regchk cannot compact a live “system” file. You need to back up your system settings first and run chkreg on a restored copy of the “system” file (restore it to a new folder somewhere else). Then boot up from the Windows setup DVD and enter the recovery console. Rename “system” to “system.old” and copy the compacted system file into the config directory. Then reboot into Windows.

10 Apr 2010

Nasty Bite From the Apple

Apple have gone bonkers. They have changed their developers license to make sure that Flash can’t run on iphone, itouch and ipad devices. Do they really have it in for Adobe Flash or are they just trying to hide the possibility that they’re going to launch their own software in competition to Flash, or maybe their devices will run really slowly if they were to support Flash?

Considering that Flash is on 98% of all desktop computers and most of the best or most visited websites use Flash this is very strange behaviour exhibited by Apple.

But you know what, I don’t care! Why? Because I don’t own any Apple kit. I never have done and I’m not sure if I ever will. You see, Apple produce fashionable portable devices and I’m not one to follow fashion, especially if it looks far better than it actually performs as far as features are concerned. (I admit the UI is good but the rest of the world is catching up – seen Windows 7 Mobile?) And denying i(pod|pad|touch) users the ability to tap into Flash is denying them a fair bit of functionality. But do the users really care? Probably not since millions of people have already spoken with their wallets.

2 Apr 2010

Windows 2008 Hyper-V or Volume Shadow Copy Bug Part II

Following on from the problem I blogged about a couple of weeks ago, this is definitely a bug in Windows. From what we’ve been told it’s an inheritary, built-in limitation with VSS that prevents it from cleaning up after itself once 9,999 snapshots have been created. Therefore it only manifests if your server has created that many snapshots, so the explanation goes.

Now, not many people run snapshots frequently enough to encounter the problem but since our servers are running a backup every 30 minutes and there are 14 VHDs connected to Hyper-V we would run into the problem in just 15 days. That’s the explanation we were given but that doesn’t explain why after cleaning up the registry using devnodeclean and chkreg the registry will immediately start to bloat during the next backup. Hmmm. And we encountered symptoms (freezing) only after the first week – that’s less than 9999 snapshots. Hmm.

I’m hoping the engineers investigating this can come up with a permanent fix very soon.

15 Mar 2010

2008 Server Freeze, Hyper-V or Volume Shadow Copy Bug?

We have been scratching our heads over a very strange problem for the last 4 weeks which causes two new servers to lock up for up to 2 hours after logging on after a reboot. They’re running Windows 2008 R2 with Hyper-V and Windows Server Backup roles installed.

After trying plenty of ideas to eliminate the problem it was pointed out to us by a Microsoft support guy that our System hive file was 343MB in size. It’s only supposed to be 15 to 20MB. I exported it as an ASCII file from regedit and opened it in Notepad. I counted 24,000 entries for VSS Snapshot devices! When Windows boots it tries to process 24,000 devices which causes it to choke killing the server for two hours – although the VMs limp on underneath and the host responds to pings but both the remote and local console is completely frozen.

Example registry entry:

"DriverDesc"="Generic volume shadow copy"

Trying to delete the snapshots using vssadmin from the command prompt threw this error: “Error: Snapshots were found, but they were outside of your allowed context.  Try removing them with the backup application which created them.

So the question is what is causing 1000’s of VSS (volume shadow copy) snapshots to be created? A clue was found in the system event log when Windows Server Backups runs: “Failed to delete the shadow copy (VSS snapshot) set with id '1A1938A0-1590-4BF4-8173-20DF5FD69E36' in the running virtual machine 'MGT01': Unspecified error (0x80004005). (Virtual machine ID A3F941F1-ED7F-48E9-9CD7-CB7C28A6604A)

We’re using Windows Server Backup (WSB) to take incremental backups every 30 minutes for a bare metal restore of the host and its Virtual Machines. That’s 48 backups a day of 14 VHDs for 42 days that the servers have been running for. Do the maths and that comes to 28,000 VSS snapshots. Taking into account that some backups failed to run and we stopped backups for a few hours here and there, this tallies with the 24,000 devices I counted in the registry. Bingo!

So the bottom line is that the VSS writer creates a snapshot for each VHD at backup time but for some reason isn’t deleting the entries from the registry, although it is deleting the actual snapshots otherwise we’d have run out of disk space by now. Everything points to a bug in either the VSS writer or perhaps WSB or Hyper-V. They’re so tightly integrated during the backup process it’s hard to say which of the 3 is the culprit.

Since this problem is reoccurring on two new servers from Dell we are sure this isn’t a one-off freak incident. There is only 1 other similar incident reported on the web and that was a year ago on a HP server using BackupExec with the Hyper-V aware option. I’m waiting for Microsoft to get back to me, although I’ve been warned that even if they admit it’s a bug it could take a long time to produce a fix. We’d love to know why 1000’s of people who use Hyper-V and take frequent backups aren’t experiencing the same problem. There is no other software installed on the host apart from standard Dell drivers. Weird!