Archive for Frustrations
Yesterday I finally ran into my first failed SBS 2008 install, not one I had picked up from someone else, but one of my own where I’ve managed the source server for years and did all the prep work myself. It was both a good thing and a bad thing: good because we got to finally troubleshoot one where we knew the entire environment up front so we could immediately eliminate several potential causes, bad because we were on a really limited migration timetable for this particular customer and we’ve ended up losing more than just a day on the project.
But the interesting thing about this migration is why it failed, and the blame lies right at the feet of Dell. Short story: the Dell PowerEdge R310 server does not allow you to individually disable the on-board NICs in BIOS. In BIOS, you can choose to disable both NICs, but the individual integrated NIC options include “Enabled,” “Enabled with PXE,” and “Enabled with iSCSI.” No “Disabled” option at all.
I’ll be perfectly honest, this was not a scenario I tested when working on the migration documentation with Microsoft and researching for the SBS 2008 Unleashed book. Since the SBS 2008 requirements list a single NIC only, that’s all the testing that I did. So when I went through with the migration install yesterday, I knew I was taking a bit of a chance, but hoped that since I only had one NIC connected to the network it wouldn’t be a problem.
Well, it was.
During the setup, the SBS portion of the install tried to do a WMI query against DNS for the existing domain. The query succeeded on the connected NIC, but the installer performed the query again on the disconnected NIC and failed. That failure led to the dreaded cascading failure of Exchange and everything else that followed. We were able to get the source server back online quickly since we followed best practices and took a System State Backup immediately prior to launching the SBS 2008 installer on the destination server, but then we faced the dilemma of how to proceed. After discussing the issue with a Microsoft contact, we thought the setup might complete if we connected both NICs to the network, but the better option is to disable the unused NIC in BIOS and do the migration setup the way it’s supposed to be done.
After a 3.5 hour call with Dell, it basically cannot be done on the R310. Apparently someone decided that an all or nothing configuration on the NICs on that particular server was the better solution than letting each NIC be individually disabled as done on every other Dell server I’ve worked with for a decade. The issue has been escalated with the engineering team, but we don’t yet know if there will be a fix or how long it will take to get one if it can be done.
Bottom line, I cannot recommend installation of SBS 2008 on a Dell PowerEdge R310 server until (or if) Dell resolves the issue of disabling the NICs individually in BIOS. I can almost guarantee that a migration setup will fail on this platform, but I don’t know if a clean install will have a similar issue or not. If anyone has successfully installed SBS 2008 on an R310 server, I’d love to hear from you. Since this is a relatively new model from Dell, however, I may well be one of the first to attempt this particular configuration. And I hope that our pain can save someone else from the same…
Seems like every so often, we see a rash of these in the community, and it’s always painful when we do. I have no idea what triggers the cycle, or even if it’s anything more than a coincidence, but it seems to be happening again. A rash of posts in various places where people think that reinstalling IIS on SBS is a good idea, or that they’re trying to recover from having already done it.
Folks, please don’t do this.
I don’t care that KB 320202 gives instructions on how to remove and reinstall IIS on a server running Exchange. If you scroll down to the bottom of that KB, you might notice that Small Business Server is NOT listed in the “Applies To” section. Yeah, yeah, yeah, I know that most of the time we say that SBS is the “same as” Windows 2003 Server Standard, but in this case we are most decidedly NOT.
SBS has so much more tied in with IIS than just Exchange, so if you did decide to remove/reinstall IIS, you’re going to break a whole bunch of things: Backup, Monitoring and Reporting, Companyweb, Remote Web Workplace, ConnectComputer. KB 320202 doesn’t address those tools at all, just Exchange.
If you end up getting your hands on a box that has already had IIS removed and reinstalled, you might be able to fix several things by reinstalling the Administration Tools in the SBS integrated setup, but even that is going to be a longshot at best.
If you’re reading this before you remove and reinstall IIS, good. STOP NOW! Don’t do it. Troubleshoot the actual errors you’re getting and find and fix the problem. Ignore KB320202. And should you be in one of those rare cases where someone affiliated with Microsoft has suggested that you remove and reinstall IIS, please let me know immediately. Or let Marina know over at smallbizserver.net. Or find someone in the SBS community to get a second opinion with. But DON’T UNINSTALL IIS on your SBS box without getting at least a second or third opinion. Please. Chances are, you’ll deeply regret it if you do.
Ran across one today that hasn’t been documented to death in the ether, so it’s worth sharing. Bottom line, if you install Windows Sever 2003 SP2 and have ISA on the box, you dang well better follow KB927695 and disable Receive Side Scaling on your NICs. even better, don’t do the hack in the registry, just modify the properties of the NIC to disable the setting. Here’s another reason why:
I was working with someone who was having a number of problems on his new SBS box. We got a number of them fixed, then we were trying to join a workstation to the domain. He had tried many variations of this previously, but all had failed. Once we resolved his IIS issues, we decided to see what would happen with the ConnectComputer wizard.
Problem 1: You can’t get to the ConnectComputer wizard. We continually got a “page cannot be found” error when trying to connect to http://server/conectcomputer, just like the Add Client Workstation wizard says to do. We could access the page at http://internalIP/connectcomputer, but this doesn’t always work, either. We were finally able to get the page to load and at least get through the main portion of the wizard using https://server/connectcomputer/
Problem 2: The ConnectComputer wizard encountered errors and could not complete. This happened at the end of the wizard as it was trying to change network settings to initiate the reboot that would join the machine to the domain, etc., etc., etc. We looked in the client-side log for the ConnectComputer wizard (which is in C:\Program Files\Microsoft Windows Small Business Server\Clients\SBSNetSetup.log, by the way) and found the following error in the log:
NetJoinDomain() failed 
Google found only a few posts about this specific error, mostly having to do with trying to join a workstation over a VPN when ISA is involved. Well, this server had ISA installed, but this is a local workstation and not over a VPN. Also worth noting is that it’s the first workstation to join the domain. But I digress. We followed the advise about turning off Strict RPC checking in ISA (which I regularly forget to do and hate that I have to in the first place) but that had no effect. Just when I was about to punt, I discovered that SP2 had been installed on the box.
Yes, the dreaded Windows 2003 Server SP2. The one that has actually been causing more issues than MS cares to admit right now. And the only reason he installed it when he built the server? Because it was listed in Microsoft Updates through the web, and since it’s up there, it must be safe to install, right? In this case, I certainly wouldn’t have installed it, but that’s just me. Oops, digressing again.
So I reviewed the Official SBS Blog for stuff about SP2 and found the note on the Receive Side Scaling. The server had broadcomm NICs (which have issues in themselves), so I went into the NIC settings through Network Connection Properties and disabled Receive Side Scaling on both the internal and external NIC. Viola! ConnectComputer not only ran successfully, but we were able to access it through http://server/connectcomputer without SSL.
I’ll be darned if I can understand exactly why changing this setting when SP2 and ISA are on the box had this type of impact on local networking, but as soon as I changed it, everything worked. I liken this to the other bizarre resolution where changing the internal name on a security group allows the Connect to the Internet wizard to run correctly (look down at the last entry in the thread for the real resolution) – can’t explain fully why it works, but it does.
Moral of the story – read everything about SP2 on the SBS blog and even if you think you may not be affected, look at each one of the items listed there. Or don’t put SP2 on any of your boxes just yet. The latter is the direction I’m taking when I have an option.
I just learned the hard way that some Dell on-board PERC RAID controllers, in particular the PERC5/i, do NOT have an audible alarm that will sound when the controller card detects an alertable condition. I just found that one of my servers had a bad drive, but have no idea when the drive went bad, becuase the Alarm I configured in BIOS failed to sound. Only after working with Dell to deal with the failed drive did I find out that this particular on-board PERC controller actually has no alarm, despite what the controller BIOS says.
I don’t rely exclusively on the audible alarm for notification when a drive or array fails, but it is a nice fallback plan, and if you’ve been thinking that your on-board controller supports this, you might want to doulbe-check to make sure. Just because there’s an option there in the BIOS to enable an alarm doesn’t mean that there’s actually an alarm supported.
Just when you thought you weren’t going to get any updates from Microsoft in March of 2007 (some speculated this was a result of the DST fiasco, but maybe not), Microsoft announced on March 13 that Windows 2003 Server SP2 was available. Not only is it available on Windows Update, but two other updates are present as well.
For more information about the SP2 release, see my post on my business blog. There are some interesting gotchas related to SP2, and not just on the SBS platform…
And here you were thinking that your Mac would be immune to the DST problems that seem to be plaguing the rest of the US. Well, not necessarily. Turns out that Microsoft has included updates in Office Update 11.3.3 for Entourage that handles issues related to calendar items and the new DST rules. And if you’re only using Entourage 2004 for POP3 or IMAP accounts, you’re probably going to be OK. But only if you install Update 11.3.3 (at least according to KB924606).
But, if you’re using Entourage 2004 to connect to an Exchange server, your calendar may just get a little funky for a little while. According to a blog post from the MS Higher Education West group, if the Exchange server has not been updated with the patches to fix the DST issue. Where can I learn about these updates, you may be asking? Well, this Microsoft page has information, and http://www.dstpatch.com/Â also has update information.
Bottom line, the patch for Exchange is out, but not necessarily universally installed. The Entourage patch is out, as is the OS update from Apple that allows the core Mac OS to handle the new DST laws. But the Outlook patch is not yet available, so expect that until all three players are updated in a particular location there will be some discrepancies about meetings that are scheduled after March 11, 2007.
My Mac recommendation: go ahead and update to 11.3.3 for Office for the Mac, but be aware of possible DST meeting hiccups until everyone else gets updates.
I’ve been fighting a rash of internal DHCP issues for clients running ISA 2004 SP2 on SBS 2003. Amy Babinchak noted this a while back in her blog, and I’ve been working with her to iron out the issues. One item I ran across this morning seems to tie the problem to a Restrited Access rule. The jury is still out on the why and the root cause of the issue, but I noted one change that I made to one set of rules that allowed DHCP to work correctly on the internal network afterward.
The rules that had been created followed this pattern: Set of domain URLs that the creator of the rule did not want to allow internal workstations to access; default action on the rule was “deny”; applied to all protocols and all users. In one case, I modified the rules to apply to all protocols except those listed, and I included DHCP Request and DHCP Reply in the exceptions list. Once I applied those changes, DHCP started working internally again.
Granted, these rules were high in the order processing, and if they had been lower in the order list, possibly even behind the Protected Network Access rule, we might not have seen the behavior. But something about the way those rules were created seemed to deny internal workstations access to DHCP.
I still have other sites with DHCP issues that I’ve only been able to get around by following the instructions in Amy’s post. But there has been some rumblings about denied access rules, so we’re starting down that road first to see what’s happening.
I’ll post updates here as I get them. Stay tuned…
Today I was reminded (again) why using the default controller drivers when doing an SBS setup is a Bad Thing. I was rebuilding a server for a client who needed to install an updated RAID controller card. I got the drivers off the CD that came with the controller and copied them to a floppy disk. I booted the server from the installation CD, pressed F6 to tell setup that I had drivers to load, and selected the proper card drivers from the floppy. As setup continued and started copying files to disk, I got an error that it couldn’t copy a particular file. It was a catalog file for the controller driver, and it was apparently not on the floppy. Well, not to be deterred, I restarted setup, pressed F6 again, but this time when it prompted me about the drivers and let me know that the drivers I had on floppy were newer than what setup had on CD, I opted to let setup use the drivers on CD and continue.
Continue it did. Setup got all the way through both the text mode and GUI portions of install and rebooted the system to load the OS for the first time. Then everything went south. I got the Windows 2003 spash screen for about 2 seconds, and the server rebooted. I’ve seen this behavior before (and I’ll document it in a later post or possibly use it as script material for an IT Horror flick) so I knew it was a problem with the controller driver. I went to the manufacturer web site, downloaded the latest driver disk set and used that for a repair install of the server. The repair install completed successfully, and the server booted normally after that.
Out of curiousity, I compared the driver files on CD to the driver files I downloaded from the web site. They were the same, save for one file – the catalog file that setup complained about being unable to copy.
Just goes to show, again, that you should really grab the latest drivers from the controller manufacturer web site before building a server, espcially if you’re working with a “white box” server. I’ve learned my lession. Again…