[launchd-dev] Stray process with PGID equal to this dead job
Damien Sorresso
dsorresso at apple.com
Tue Jul 28 08:58:00 PDT 2009
On Jul 27, 2009, at 11:26 AM, Johannes wrote:
> Sorry for reopening a thread that was already dealt with, but I ran
> into a serious problem that I'm not able to workaround in any way.
>
> I have a perlscript (let's call it backupfather.pl) that calls a
> second perlscript (backupchild.pl) 15 times for initiating rsync
> over ssh backups from 15 different client-hosts.
>
> The callerscript knows how to deal with open processes (keep them
> running for a designated time, kill them if the target host where it
> fetches the backup from isn't reachable or the backup script is
> overtime, don't launch the child script for a specific host if one
> process like that is still running)
>
> Additionally it knows how to deal with the child processes of backupchild.pl
> (rsync and ssh) building up pid and parent pid trees and killing
> all involved before launching again with the same arguments.
>
> So the processmanagement itself is clean.
Why bother creating your own tree? You already have a process tree
maintained by the kernel. "Clean" process management is having the
parent kill and reap its children as necessary and allowing the
semantics of POSIX parent-child relationships to create a chain of
responsibility.
So if your job sends SIGTERM to its immediate child, that child will,
in turn, forward the SIGTERM to its children, wait for them to exit
and then die accordingly.
> In order to start the backupchild.pl script 15 times I need to
> background them. They may rund serveral hours so the backupfather.pl
> has to get free to initiate the finished backups again after an
> hour, not initiating the still running ones.
>
> I've tried several ways to get out of the launchctl processgroup
> prison the backupfather.pl is running in.
"launchctl process group prison"? I think you mean "POSIX standard
behavior". Your child processes inherit a PGID equal to that of their
parent's PID. launchd has nothing to do with this behavior; it's
enforced by the kernel. launchd complains about this because your
job's has attempted to daemonize without calling setsid(2), setpgrp
(2), etc.
> setpgrp doesn't work on Mac OS X (that way it would be possible to
> let it run under the root's processgroups ID).
Mac OS X conforms to POSIX, so setpgrp(2) does work. What errno does
it set when you call it? Are you calling it from the child process
with 0 as the first argument, or from the parent with the child's PID
as the first argument?
Also, please read setsid(2).
> forking and exiting the backupchild.pl didn't help eighter, nore did
> creating 15 instances for each client backup and launching them as
> necessary. As soon as it comes to backgrounding I'm stuck in:
>
> "Stray process with PGID equal to this dead job: PID _pidnumber_
> PPID _parentpidnumber_ perl"
>
> Any ideas how to get the backupchild.pl processes out of the
> launchctl prison?
I would recommend making backupchild.pl into a launchd job that your
main job kicks off by running `launchctl start`. If you need multiple
instances of the job, you can create them on the fly with `launchctl
submit`, and launchd will take care of the process management.
--
Damien Sorresso
BSD Engineering
Apple Inc.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2425 bytes
Desc: not available
URL: <http://lists.macosforge.org/pipermail/launchd-dev/attachments/20090728/d654ef3c/attachment.bin>
More information about the launchd-dev
mailing list