I have a previously-cronned job that I'm moving to launchd on my new 10.5 server. It runs every 10 minutes, and causes the following entry to show up in the syslog: Feb 21 15:43:45 server1 com.apple.launchd[1] (edu.buffalo.hwi.raidutil[43977]): Stray process with PGID equal to this dead job: PID 43983 PPID 1 sendmail The script the launchd job runs calls sendmail a the end of it, so my guess is that the script finishes, while sendmail is still running, and then launchd has to clean up after the script exits, so tries to kill the sendmail process. However, my emails are all going through and seemingly working properly. Additionally, I specified the AbandonProccessGrop <true/> key in the plist so it wouldn't try cleaning up the other processes spawned. Shouldn't this take care of the log message? Geoff Franks Sr. Systems Administrator Hauptman Woodward Institute
On Feb 21, 2008, at 12:52 PM, Geoff Franks wrote:
I have a previously-cronned job that I'm moving to launchd on my new 10.5 server. It runs every 10 minutes, and causes the following entry to show up in the syslog:
Feb 21 15:43:45 server1 com.apple.launchd[1] (edu.buffalo.hwi.raidutil[43977]): Stray process with PGID equal to this dead job: PID 43983 PPID 1 sendmail
The script the launchd job runs calls sendmail a the end of it, so my guess is that the script finishes, while sendmail is still running, and then launchd has to clean up after the script exits, so tries to kill the sendmail process. However, my emails are all going through and seemingly working properly.
That is probably just luck.
Additionally, I specified the AbandonProccessGrop <true/> key in the plist so it wouldn't try cleaning up the other processes spawned. Shouldn't this take care of the log message?
Nope. We hope that you'll someday update the script to either wait for descendant processes to finish, or alternatively, properly daemonize them. davez
On 2/21/08 6:31 PM, "Dave Zarzycki" <zarzycki@apple.com> wrote:
On Feb 21, 2008, at 12:52 PM, Geoff Franks wrote:
I have a previously-cronned job that I'm moving to launchd on my new 10.5 server. It runs every 10 minutes, and causes the following entry to show up in the syslog:
Feb 21 15:43:45 server1 com.apple.launchd[1] (edu.buffalo.hwi.raidutil[43977]): Stray process with PGID equal to this dead job: PID 43983 PPID 1 sendmail
The script the launchd job runs calls sendmail a the end of it, so my guess is that the script finishes, while sendmail is still running, and then launchd has to clean up after the script exits, so tries to kill the sendmail process. However, my emails are all going through and seemingly working properly.
That is probably just luck.
Additionally, I specified the AbandonProccessGrop <true/> key in the plist so it wouldn't try cleaning up the other processes spawned. Shouldn't this take care of the log message?
Nope. We hope that you'll someday update the script to either wait for descendant processes to finish, or alternatively, properly daemonize them.
davez
This is a 10 line bash script that I ran as a cron job previously. It emailed on failures. Why would I daemonize something like that? Geoff Franks Sr. Systems Administrator Hauptman Woodward Institute
On Feb 22, 2008, at 6:15 AM, Geoff Franks wrote:
On 2/21/08 6:31 PM, "Dave Zarzycki" <zarzycki@apple.com> wrote:
On Feb 21, 2008, at 12:52 PM, Geoff Franks wrote:
I have a previously-cronned job that I'm moving to launchd on my new 10.5 server. It runs every 10 minutes, and causes the following entry to show up in the syslog:
Feb 21 15:43:45 server1 com.apple.launchd[1] (edu.buffalo.hwi.raidutil[43977]): Stray process with PGID equal to this dead job: PID 43983 PPID 1 sendmail
The script the launchd job runs calls sendmail a the end of it, so my guess is that the script finishes, while sendmail is still running, and then launchd has to clean up after the script exits, so tries to kill the sendmail process. However, my emails are all going through and seemingly working properly.
That is probably just luck.
Additionally, I specified the AbandonProccessGrop <true/> key in the plist so it wouldn't try cleaning up the other processes spawned. Shouldn't this take care of the log message?
Nope. We hope that you'll someday update the script to either wait for descendant processes to finish, or alternatively, properly daemonize them.
davez
This is a 10 line bash script that I ran as a cron job previously. It emailed on failures. Why would I daemonize something like that?
Unless you're consciously backgrounding sendmail, the bug is more likely in the sendmail program itself. davez
On 2/22/08 10:45 AM, "Dave Zarzycki" <zarzycki@apple.com> wrote:
On Feb 22, 2008, at 6:15 AM, Geoff Franks wrote:
On 2/21/08 6:31 PM, "Dave Zarzycki" <zarzycki@apple.com> wrote:
On Feb 21, 2008, at 12:52 PM, Geoff Franks wrote:
I have a previously-cronned job that I'm moving to launchd on my new 10.5 server. It runs every 10 minutes, and causes the following entry to show up in the syslog:
Feb 21 15:43:45 server1 com.apple.launchd[1] (edu.buffalo.hwi.raidutil[43977]): Stray process with PGID equal to this dead job: PID 43983 PPID 1 sendmail
The script the launchd job runs calls sendmail a the end of it, so my guess is that the script finishes, while sendmail is still running, and then launchd has to clean up after the script exits, so tries to kill the sendmail process. However, my emails are all going through and seemingly working properly.
That is probably just luck.
Additionally, I specified the AbandonProccessGrop <true/> key in the plist so it wouldn't try cleaning up the other processes spawned. Shouldn't this take care of the log message?
Nope. We hope that you'll someday update the script to either wait for descendant processes to finish, or alternatively, properly daemonize them.
davez
This is a 10 line bash script that I ran as a cron job previously. It emailed on failures. Why would I daemonize something like that?
Unless you're consciously backgrounding sendmail, the bug is more likely in the sendmail program itself.
davez
/usr/sbin/raidutil list status | /usr/bin/mail -s "RAID: `hostname`" <myaddress> Is all I'm calling. I've had reports of a few other people on the MacEnterprise list with similar issues regarding postfix/sendmail and launchd jobs. Should I file a bug report for this? Geoff Franks Sr. Systems Administrator Hauptman Woodward Institute
On Feb 22, 2008, at 8:42 AM, Geoff Franks wrote:
On 2/22/08 10:45 AM, "Dave Zarzycki" <zarzycki@apple.com> wrote:
On Feb 22, 2008, at 6:15 AM, Geoff Franks wrote:
On 2/21/08 6:31 PM, "Dave Zarzycki" <zarzycki@apple.com> wrote:
On Feb 21, 2008, at 12:52 PM, Geoff Franks wrote:
I have a previously-cronned job that I'm moving to launchd on my new 10.5 server. It runs every 10 minutes, and causes the following entry to show up in the syslog:
Feb 21 15:43:45 server1 com.apple.launchd[1] (edu.buffalo.hwi.raidutil[43977]): Stray process with PGID equal to this dead job: PID 43983 PPID 1 sendmail
The script the launchd job runs calls sendmail a the end of it, so my guess is that the script finishes, while sendmail is still running, and then launchd has to clean up after the script exits, so tries to kill the sendmail process. However, my emails are all going through and seemingly working properly.
That is probably just luck.
Additionally, I specified the AbandonProccessGrop <true/> key in the plist so it wouldn't try cleaning up the other processes spawned. Shouldn't this take care of the log message?
Nope. We hope that you'll someday update the script to either wait for descendant processes to finish, or alternatively, properly daemonize them.
davez
This is a 10 line bash script that I ran as a cron job previously. It emailed on failures. Why would I daemonize something like that?
Unless you're consciously backgrounding sendmail, the bug is more likely in the sendmail program itself.
davez
/usr/sbin/raidutil list status | /usr/bin/mail -s "RAID: `hostname`" <myaddress>
Is all I'm calling. I've had reports of a few other people on the MacEnterprise list with similar issues regarding postfix/sendmail and launchd jobs. Should I file a bug report for this?
Yes, please. http://bugreport.apple.com/ davez
At 11:42 -0500 22/2/08, Geoff Franks wrote:
/usr/sbin/raidutil list status | /usr/bin/mail -s "RAID: `hostname`" <myaddress>
Is all I'm calling. I've had reports of a few other people on the MacEnterprise list with similar issues regarding postfix/sendmail and launchd jobs. Should I file a bug report for this?
Geoff, did you ever file a bug about this? I got asked about this on another mailing list and I pretty much worked out what's going on (see below). If you did file a bug, I'd like to add some info to the bug. I looked, and couldn't find it. The following is in the context of <x-man-page://8/periodic>, but the basics should also apply to your situation.
1. launchd starts the "periodic" script
2. In this config, "periodic" runs <x-man-page://1/mail> to deliver the report
3. "mail" ends up invoking <x-man-page://1/sendmail> to do the real work
4. "mail" does not /wait/ for "sendmail" to complete
5. "mail" quits
6. "periodic" quits
7. launchd cleans up the job
If "sendmail" has not quit at this point, launchd garbage collects it and you get this message (and no mail).
One potential workaround is to set the "verbose" environment variable. This causes "mail" to wait for the "sendmail" to terminate before it quits. I'm not sure how that would play in the context of "periodic".
S+E -- Quinn "The Eskimo!" <http://www.apple.com/developer/> Apple Developer Relations, Developer Technical Support, Core OS/Hardware
On Feb 21, 2008, at 12:52 PM, Geoff Franks wrote:
Feb 21 15:43:45 server1 com.apple.launchd[1] (edu.buffalo.hwi.raidutil[43977]): Stray process with PGID equal to this dead job: PID 43983 PPID 1 sendmail
NetAuthAgent and something mysteriously called "Locum" die this exact same way constantly on 10.5.2, usually right before LAN traffic grinds to a halt for no apparent reason. Looking at the chunk of code that prints these messages I've never been able to figure out *exactly* what set of conditions precipitates them (I have an idea, but that idea doesn't always match what I see going on), but they happen a lot, even with shipping Apple code. So often that I'd be in favor of a more verbose error message if such an idea were ever kicked around the office.
At 15:59 -0800 21/2/08, Nathan Duran wrote:
NetAuthAgent and something mysteriously called "Locum" die this exact same way constantly on 10.5.2, usually right before LAN traffic grinds to a halt for no apparent reason.
Locum is a privileged tool that Finder uses when it needs to do privileged file system operations. /System/Library/PrivateFrameworks/DesktopServicesPriv.framework/Versions/A/Resources/Locum I have no idea why you're seeing this message for Locum and NetAuthAgent, but it's obvious that something has gone terribly wrong. Regardless, it's not relevant to Geoff's original question.
Looking at the chunk of code that prints these messages I've never been able to figure out *exactly* what set of conditions precipitates them (I have an idea, but that idea doesn't always match what I see going on), but they happen a lot, even with shipping Apple code. So often that I'd be in favor of a more verbose error message if such an idea were ever kicked around the office.
The following section of TN2083 "Daemons and Agents" gives a bunch of background material. <http://developer.apple.com/technotes/tn2005/tn2083.html#SECCAREFULWITHTHATFORKEUGENE> Dave can fill in the specific details, but here's the general gist of things... When launchd starts a job, it puts the process in a unique process group. When that process dies, launchd SIGTERMs the process group to 'garbage collect' any processes that the main process might have launched. The idea is as follows: o If the main process launches a helper tool, the helper tool won't daemonise itself. Thus, it will be in the same process group, and thus it will be killed by the SIGTERM. This is generally a good thing because it means that, the next time launchd starts the main process, the main process starts with a clean slate. o If the main process starts some other process that's meant to run /after/ the main process, the other process /must/ daemonise itself. That puts the other process in its own process group, and launchd will leave it alone. It's likely that Geoff's launchd job is starting sendmail but not waiting for it to run to completion. Thus, launchd is trying to garbage collect sendmail. This is bad. Frankly, it's just luck that it works at all. S+E -- Quinn "The Eskimo!" <http://www.apple.com/developer/> Apple Developer Relations, Developer Technical Support, Core OS/Hardware
On Feb 22, 2008, at 1:54 AM, Quinn wrote:
I have no idea why you're seeing this message for Locum and NetAuthAgent, but it's obvious that something has gone terribly wrong. Regardless, it's not relevant to Geoff's original question.
It's happening to a lot of people, so I'd agree that many things are terribly wrong with 10.5.2. However, at times when your users and the people who sign your paychecks are one and the same, the ability to determine whether or not a given error message is your fault or Apple's can become surprisingly relevant. I believe that launchd's numerous undocumented features and behaviors coupled with its oftentimes cryptically terse function names and error messages ("Workaround Bonjour" stall anyone?) are responsible for a lot of people resisting the "use it or else" mantra, thereby slowing its adoption rate both internally and externally. I've seen invalid Apple-shipped plists, Apple-shipped daemons that don't cooperate with launchd at all (try keeping vpnd alive) and plenty of stuff like this: launchctl: Please convert the following to launchd: /etc/mach_init.d/ chum.plist launchctl: Please convert the following to launchd: /etc/mach_init.d/ dashboardadvisoryd.plist launchctl: Please convert the following to launchd: /etc/mach_init.d/ pilotfish.plist Did these guys just not get the memo, or was the memo difficult to get through? If those "Stray process" messages conveyed the whole of what had transpired in easily digestible terms, Geoff probably wouldn't have had a question to begin with. Here's a rather extreme example from another department that certainly leaves nothing to the imagination: "This application is trying to draw a very large combo box, 145 points tall. Vertically resizable combo boxes are not supported, but it happens that 10.4 and previous drew something that looked kind of sort of okay. The art in 10.5 does not break up in a way that supports that drawing. To avoid breaking existing apps, NSComboBox in 10.5 will use the 10.4 art for large combo boxes, but it won't exactly match the rest of the system. This application should be revised to stop using large combo boxes. This warning will appear once per app launch." It's just a hunch, but I'd bet nobody's asking what the heck that means on their list ;)
participants (4)
-
Dave Zarzycki
-
Geoff Franks
-
Nathan Duran
-
Quinn