[launchd-dev] Hanging system calls with non-system users

Axel Luttgens AxelLuttgens at swing.be
Sat Mar 2 08:42:46 PST 2013


Going a bit further with the topics of my previous post. ;-)

As hinted in the log excerpt, it could well be that Dovecot invokes getpwuid(3), directly or indirectly:

> 	dovecot[97622] <Info>: master: Dovecot v2.1.14 starting up (core dumps disabled)
> 	[...]
> 	dovecot[97624] <Debug>: pop3(user1): Debug: Namespace : Using permissions from /_Data/Mailstores/100016/mboxes: mode=0700 gid=-1
> 	com.apple.launchd[1] (com.apple.launchd.peruser.100016[97633]) <Error>: getpwuid("100016") failed
> 	com.apple.launchd[1] (com.apple.launchd.peruser.100016[97633]) <Notice>: Job failed to exec(3). Setting up event to tell us when to try again: 3: No such process
> 	com.apple.launchd[1] (com.apple.launchd.peruser.100016[97633]) <Notice>: Job failed to exec(3) for weird reason: 3

In this precise case (opening a first time pop connection), this seems to be an indirect call to  getpwuid(3). Anyway, keeping pseudomaster.plist and pseudomaster.c unchanged, I quickly modified pseudochild.c so that it now calls getpwuid(3).

When loading pseudomaster.plist for the first time, this is what get written in the logs:

	ALMba.local pseudomaster[66978] <Notice>: Master: started.
	ALMba.local pseudomaster[66980] <Notice>: Child: forked.
	ALMba.local pseudochild[66980] <Notice>: Pseudochild: started.
	ALMba com.apple.launchd[1] (com.apple.launchd.peruser.100018[66981]) <Error>: getpwuid("100018") failed
	ALMba com.apple.launchd[1] (com.apple.launchd.peruser.100018[66981]) <Notice>: Job failed to exec(3). Setting up event to tell us when to try again: 3: No such process
	ALMba com.apple.launchd[1] (com.apple.launchd.peruser.100018[66981]) <Notice>: Job failed to exec(3) for weird reason: 3

and process pseudochild is just hanging.

Note the process number 66981, which is not pseudochild's one (66980), as if there were an attempt to spawn a subprocess. And this is a very elusive one; for example, no way to catch it with execsnoop or similar tools. Even launchd appears somewhat lost.

Subsequent unloads/reloads of pseudomaster.plist always end with a hanging pseudochild process, yet without those com.apple.launchd.peruser messages anymore.
In fact, in order to get those messages back, one has to remove the job bearing label "com.apple.launchd.peruser.100018".

And this is without mentioning the directories /var/log/com.apple.launchd.peruser.100018 and /var/db/launchd/com.apple.launchd.peruser.100018 created under such circumstances; moreover, those directories seem to persist across reboots...

So, all of this looks quite similar to the problems encountered with Dovecot; the fact that the stack of the hung pop process ends around a gethostbyname call could thus just be a red herring.

On the other hand, Dovecot manages to go a bit further than my pseudochild.c code: with an uid/gid pair such as 100018/20, Dovecot's pop3 process doesn't hang and performs without a glitch, while pseudochild desperately insists on entering in a stuck state.

Currently, I'm with a wtf? mood...
I would really appreciate some hints, some explanations for all those phenomenona...

TIA,
Axel



pseudochild.c
=============
#include <syslog.h>
#include <unistd.h>
#include <sys/errno.h>
#include <sys/types.h>
#include <pwd.h>
#include <uuid/uuid.h>

int   main( int argc, const char * argv[]) 
{
	struct passwd * pw;
	gid_t gidset[1];

	uid_t uid = 100018;
	gid_t gid = 100018;
	
	gidset[0] = gid;

	syslog(LOG_NOTICE|LOG_MAIL, "Pseudochild: started.");
	
	if (setgid(gid) != 0)
	{
		syslog(LOG_NOTICE|LOG_MAIL, "Pseudochild: setgid() failed.");
		_exit(1);
	}
	if (setgroups(1, gidset) != 0)
	{
		syslog(LOG_NOTICE|LOG_MAIL, "Pseudochild: setgroups() failed.");
		_exit(1);
	}
	if (setuid(uid) != 0)
	{
		syslog(LOG_NOTICE|LOG_MAIL, "Pseudochild: setuid() failed.");
		_exit(1);
	}
	
	errno = 0;
	pw = getpwuid(uid);
	if ( pw != NULL )
	{
		syslog(LOG_NOTICE|LOG_MAIL, "Pseudochild: getpwuid() succeeded");
		_exit(0);
	}
	else
	{
		syslog(LOG_NOTICE|LOG_MAIL, "Pseudochild: getpwuid() failed with rc %i", errno);
		_exit(1);
	}
}




More information about the launchd-dev mailing list