[launchd-dev] Stray process with PGID equal to this dead job

Damien Sorresso dsorresso at apple.com
Tue Jul 28 08:58:00 PDT 2009


On Jul 27, 2009, at 11:26 AM, Johannes wrote:
> Sorry for reopening a thread that was already dealt with, but I ran  
> into a serious problem that I'm not able to workaround in any way.
>
> I have a perlscript (let's call it backupfather.pl) that calls a  
> second perlscript (backupchild.pl) 15 times for initiating rsync  
> over ssh backups from 15 different client-hosts.
>
> The callerscript knows  how to deal with open processes (keep them  
> running for a designated time, kill them if the target host where it  
> fetches the backup from isn't reachable or the backup script is  
> overtime, don't launch the child script for a specific host if one  
> process like that is still running)
>
> Additionally it knows how to deal with the child processes of backupchild.pl 
>  (rsync and ssh) building up pid and parent pid trees and killing  
> all involved before launching again with the same arguments.
>
> So the processmanagement itself is clean.

Why bother creating your own tree? You already have a process tree  
maintained by the kernel. "Clean" process management is having the  
parent kill and reap its children as necessary and allowing the  
semantics of POSIX parent-child relationships to create a chain of  
responsibility.

So if your job sends SIGTERM to its immediate child, that child will,  
in turn, forward the SIGTERM to its children, wait for them to exit  
and then die accordingly.

> In order to start the backupchild.pl script 15 times I need to  
> background them. They may rund serveral hours so the backupfather.pl  
> has to get free to initiate the finished backups again after an  
> hour, not initiating the still running ones.
>
> I've tried several ways to get out of the launchctl processgroup  
> prison the backupfather.pl is running in.

"launchctl process group prison"? I think you mean "POSIX standard  
behavior". Your child processes inherit a PGID equal to that of their  
parent's PID. launchd has nothing to do with this behavior; it's  
enforced by the kernel. launchd complains about this because your  
job's has attempted to daemonize without calling setsid(2), setpgrp 
(2), etc.

> setpgrp doesn't work on Mac OS X (that way it would be possible to  
> let it run under the root's processgroups ID).

Mac OS X conforms to POSIX, so setpgrp(2) does work. What errno does  
it set when you call it? Are you calling it from the child process  
with 0 as the first argument, or from the parent with the child's PID  
as the first argument?

Also, please read setsid(2).

> forking and exiting the backupchild.pl didn't help eighter, nore did  
> creating 15 instances for each client backup and launching them as  
> necessary. As soon as it comes to backgrounding I'm stuck in:
>
> "Stray process with PGID equal to this dead job: PID _pidnumber_  
> PPID _parentpidnumber_ perl"
>
> Any ideas how to get the backupchild.pl processes out of the  
> launchctl prison?


I would recommend making backupchild.pl into a launchd job that your  
main job kicks off by running `launchctl start`. If you need multiple  
instances of the job, you can create them on the fly with `launchctl  
submit`, and launchd will take care of the process management.
-- 
Damien Sorresso
BSD Engineering
Apple Inc.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2425 bytes
Desc: not available
URL: <http://lists.macosforge.org/pipermail/launchd-dev/attachments/20090728/d654ef3c/attachment.bin>


More information about the launchd-dev mailing list