I spent weeks troubleshooting several apps and packages that were getting stuck in “In progress” state and were not successfully distributing to the DPs. When looking at the despool.log on the primary server, I see...
This package[MEM01026]'s information hasn't arrived yet for this version . Retry later ...
Created retry instruction for job 000082F9
Despooler failed to execute the instruction, error code = 12
And some were getting stuck on certain distribution points (DPs), which most likely from some sort of corruption that occurred during transit from the CAS down thru the Primary servers... Other apps and packages we had were fine, except for the troublesome ones. Oh and my Google-fu wasn't getting me much of hits... Of course, recreating these objects and blasting them back to the DPs would have probably been easier. But i wanted to know or find out how i could granularly or selectively reset these packages and resent them to only the DPs that were originally failing, so that we don't incur unnecessary traffic across the network.
So I looked at all basic app/package properties and validated them that everything was setup properly; I checked the source paths, DT settings, distribution settings, content settings, etc. I have even tried redistributing, validating, removing the DPs from the app/packages i was working on, waited for a few, readded them back, and no dice. I even tried changing the source paths and I could see that CAS was able to grab the content from the source location, packs it to PCK, replicates the app/package settings to child primary servers via sender, and DRS. But when it gets to the primary servers, despooler component was unhappy… It wouldn’t process the .sni files properly, even though I see the TRY files despoolr.box, but Error=12 insisted for these objects. These objects just kept falling and stuck in retry state! Ugh!
So I started digging, and compared a successful app vs. a failing one, and I was surprised to find this on CAS’s pkgstatus SQL view. The app that was deployed successfully to the DPs only had one row per Primary, one for the CAS, and its DPs that’s deployed to starting with “["Display=\\DP1.jeff.com\"]MSWNET:…”. This bad application happened to have extra rows per primary server along with CAS’s fqdn in PkgServer column! And if you look closely below, their “Update times” were older with different or old PKID.
UPDATE (12/30/20): This is no longer the case with the current release (2010). The pkgstatus table values have significantly changed... But the fix is essentially the same, it's all about pkgstatus status, ha!.
Time to try to fix this!
- I made certain the bad package or app is removed from the DPs.
- I then proceeded by deleting these extra rows by executing below on the CAS and on the Primary servers’ DB, via SSMS. NOTE: MS doesn’t support you modifying the DB, so be careful and make sure you have a valid backup before doing so! J
DELETE FROM pkgstatus where id = 'BADPKGID' and PkgServer like '%DPServer.jeff.com%'
3. Then simply redistribute the bad app or package back to the DPs.
4. Voilà! The bad application got processed by the despooler and deployed to the targeted DPs successfully!
CMCB, Stuck CM Packages, Despooler Error 12, Stuck CM Apps