[launchd-dev] Stray process with PGID equal to this dead job

Johannes list100 at hoerburger.org
Mon Jul 27 11:26:43 PDT 2009


Sorry for reopening a thread that was already dealt with, but I ran  
into a serious problem that I'm not able to workaround in any way.

I have a perlscript (let's call it backupfather.pl) that calls a  
second perlscript (backupchild.pl) 15 times for initiating rsync over  
ssh backups from 15 different client-hosts.

The callerscript knows  how to deal with open processes (keep them  
running for a designated time, kill them if the target host where it  
fetches the backup from isn't reachable or the backup script is  
overtime, don't launch the child script for a specific host if one  
process like that is still running)

Additionally it knows how to deal with the child processes of  
backupchild.pl (rsync and ssh) building up pid and parent pid trees  
and killing all involved before launching again with the same arguments.

So the processmanagement itself is clean.

In order to start the backupchild.pl script 15 times I need to  
background them. They may rund serveral hours so the backupfather.pl  
has to get free to initiate the finished backups again after an hour,  
not initiating the still running ones.

I've tried several ways to get out of the launchctl processgroup  
prison the backupfather.pl is running in.

setpgrp doesn't work on Mac OS X (that way it would be possible to let  
it run under the root's processgroups ID).

forking and exiting the backupchild.pl didn't help eighter, nore did  
creating 15 instances for each client backup and launching them as  
necessary. As soon as it comes to backgrounding I'm stuck in:

"Stray process with PGID equal to this dead job: PID _pidnumber_ PPID  
_parentpidnumber_ perl"

Any ideas how to get the backupchild.pl processes out of the launchctl  
prison?

Thanx in advance for any help,
Johannes
PS: If you're wondering why I don't use TimeMachine for Backups: We've  
used it till AppleFileServer started eating 100% CPU on the Server,  
rendering the Fileservice for the regular AFP Fileservice Clients  
unusable, while SSH at the same time had fullspeed and no problems. So  
splitting backup from fileservice was the deal for making fileservice  
useable again...


More information about the launchd-dev mailing list