Very esoteric SharePoint engineering note today. We recently had to rebuild a SharePoint 2010 server that had had some security changes in its domain environment. These changes led to the dreaded "Cannot connect to the configuration database" error screens.
During the recovery, we manually rebuilt the local WSS_ADMIN_WPG, WSS_RESTRICTED_WPG_V4 and WSS_WPG groups. After these groups were redefined and service accounts assigned to the proper groups, the SharePoint Farm and Configuration Wizard ("the grey wizard") was able to complete successfully. One of the things the Wizard does is reset the permissions to the appropriate points in the file system, registry, databases, etc. All set, and the site was accessible to the browser.
Except for one snagging detail. Farm solution deployment for new solutions was hanging without ever completing. This is usually a Timer Service problem – you need to make sure the Windows SharePoint Timer Service is running and assigned to the right account (usually the farm account). That was already set. Stopping and restarting the Timer service had no impact – no timer jobs were running anywhere in the environment.
No problem, we thought. If you run any kind of Bing (or Google search) you'll find no shortage of other posts on rebuilding the local cache. These can be good techniques, and I won't reprint them here.
http://blogs.msdn.com/b/josrod/archive/2007/12/12/clear-the-sharepoint-configuration-cache-for-timer-job-and-psconfig-errors.aspx
http://wingleungchan.blogspot.com/2011/01/user-profile-service-application-stuck.html
However, this still didn't work. Once the cache was cleared, no new jobs showed up in the cache, and nothing executed. Why not?
Well, let's talk about the local cache. The local cache directory contains XML job descriptions to be executed by the local server that are dynamically pulled down from the farm configuration database. In normal production, it looks like this:
This one stayed blank. Why? Take a look at that ugly GUID based file path. Turns out that directory GUID no longer matched the farm GUID. What's the farm GUID? Well, take a look at the registry key:
HKLM\Software\Microsoft\Shared Tools\Web Server Extensions\14.0\Secure\ConfigDB.
You will see a key value there called ID. Copy the GUID value of this key and make sure you have, or create a directory with the same name at C:\ProgramData\Microsoft\SharePoint\Config
This directory should also have permissions, at a minimum, of Full Control for WSS_ADMIN_WPG and Read/Execute for WSS_WPG. Once this is set, the cache rebuild will work, and your timer jobs will start to flow again…