[launchd-dev] Hanging system calls with non-system users

Axel Luttgens AxelLuttgens at swing.be
Mon Feb 25 02:29:55 PST 2013


Hello,

I'm encountering a problem that seems to be launchd-related, hence my post here.
Beside Dovecot's case, I tend to believe this could be of interest for everyone devising daemons.

Dovecot may be configured with pure "virtual" users, i.e. users with an uid/gid pair unknown in the system; this allows to fetch those users from, say, a database while providing increased security (virtual user A should not be able to access the mailbox of virtual user B).

On Mac OS X (10.8.2 here), with Dovecot 2.1.14, this works up to a certain point; for example, connecting as "user1" (100016/100016) for the first time, one gets this in the logs:

	dovecot[97622] <Info>: master: Dovecot v2.1.14 starting up (core dumps disabled)
	dovecot[97624] <Info>: pop3-login: Login: user=<user1>, method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, mpid=97632, secured, session=<rziWFXnW0gB/AAAB>
	dovecot[97624] <Debug>: pop3(user1): Debug: Effective uid=100016, gid=100016, home=/_Data/Mailstores/100016
	dovecot[97624] <Debug>: pop3(user1): Debug: Home dir not found: /_Data/Mailstores/100016
	dovecot[97624] <Debug>: pop3(user1): Debug: fs: root=/_Data/Mailstores/100016/mboxes, index=, control=, inbox=/_Data/Mailstores/100016/mboxes/inbox, alt=
	dovecot[97624] <Debug>: pop3(user1): Debug: Namespace : /_Data/Mailstores/100016/mboxes doesn't exist yet, using default permissions
	dovecot[97624] <Debug>: pop3(user1): Debug: Namespace : Using permissions from /_Data/Mailstores/100016/mboxes: mode=0700 gid=-1
	dovecot[97624] <Debug>: pop3(user1): Debug: Namespace : /_Data/Mailstores/100016/mboxes/inbox doesn't exist yet, using default permissions
	dovecot[97624] <Debug>: pop3(user1): Debug: Namespace : Using permissions from /_Data/Mailstores/100016/mboxes: mode=0700 gid=-1
	dovecot[97624] <Debug>: pop3(user1): Debug: Namespace : /_Data/Mailstores/100016/mboxes/inbox doesn't exist yet, using default permissions
	dovecot[97624] <Debug>: pop3(user1): Debug: Namespace : Using permissions from /_Data/Mailstores/100016/mboxes: mode=0700 gid=-1
	dovecot[97624] <Debug>: pop3(user1): Debug: In my_hostdomain: 100016 100016 100016 100016
	com.apple.launchd[1] (com.apple.launchd.peruser.100016[97633]) <Error>: getpwuid("100016") failed
	com.apple.launchd[1] (com.apple.launchd.peruser.100016[97633]) <Notice>: Job failed to exec(3). Setting up event to tell us when to try again: 3: No such process
	com.apple.launchd[1] (com.apple.launchd.peruser.100016[97633]) <Notice>: Job failed to exec(3) for weird reason: 3

and the pop3 process then hangs forever; but many things have nevertheless been done for user1, especially the creation of needed files/directories.

Subsequent connections as "user1" (after having restarted Dovecot and sent KILL to the pop3 process) systematically end with hanging pop3 processes, but without those com.apple.launchd.peruser messages in the logs anymore.

Debugging has been quite a challenge (at least, with my poor skills), since all of this becomes quite elusive with the silent interposition of launchd; but it is now almost certain that the hangs occur on a call to gethostbyname() from the code of the pop3 executable.
I suspect this to be more general and to be liable to occur with other system calls as well (probably those from LibInfo).

So, I've devised the pieces of code at the end of this message so as to provide a skeleton of how Dovecot's master process spawns its children, for example a child pop daemon upon a client connection on port 110.

They really are a skeleton, as Dovecot performs much more initializations (pre-exec() and post-exec() setups in the comments below).

Yet, they may be viewed as a proof of concept: a child process running as a virtual user may invoke a system call such as gethostbyname() without encountering any problem; in the logs:

	pseudomaster[11946] <Notice>: Master: started.
	pseudomaster[11947] <Notice>: Child: forked.
	Unknown[11947] <Notice>: Pseudochild: started.
	Unknown[11947] <Notice>: Pseudochild: gethostbyname() returned almba.local
	pseudomaster[11946] <Notice>: Master: child done.
	pseudomaster[11946] <Notice>: Master: looping.
	pseudomaster[11949] <Notice>: Child: forked.
	Unknown[11949] <Notice>: Pseudochild: started.
	Unknown[11949] <Notice>: Pseudochild: gethostbyname() returned almba.local
	pseudomaster[11946] <Notice>: Master: child done.
	pseudomaster[11946] <Notice>: Master: looping.
	[...]

So, the question is to know what Dovecot may do so as to trigger hangs of gethostbyname().
Note that the problem disappear with Dovecot as soon as one makes use of a "real" gid, such as the one of group "staff": a "semi-virtual" user with 100016/20 may connect without any problem; this may perhaps provide a clue about the kind of settings Dovecot makes use of.

Conversely, what kind of additional settings should I consider in my skeleton so as to trigger similar hangs in the child process?

But more generally, what exactly is the rationale for having launchd entering the game?

Many thanks in advance,
Axel


pseudomaster.c
==============
#include <syslog.h>
#include <unistd.h>
#include <stdlib.h>

int   main( int argc, const char * argv[]) 
{
	pid_t pid;
	int retstatus;
	char * dummyargs[] = { NULL };

	syslog(LOG_NOTICE|LOG_MAIL, "Master: started.");

	for (;;)
	{
		pid = fork();
	
		if (pid < 0)
		{
			syslog(LOG_NOTICE|LOG_MAIL, "Master: fork() failed.");
		}
		else if (pid == 0)
		{
			syslog(LOG_NOTICE|LOG_MAIL, "Child: forked.");

			// Pre-exec() setups, e.g. wrt file descriptors.
		
			if (execv("/_COD/pseudochild", dummyargs) < 0)
			{
				syslog(LOG_NOTICE|LOG_MAIL, "Child: execv() failed.");
				_exit(1);
			}
		}

		waitpid(-1, & retstatus, 0);
		syslog(LOG_NOTICE|LOG_MAIL, "Master: child done.");
		sleep(5);
		syslog(LOG_NOTICE|LOG_MAIL, "Master: looping.");
	}
}


pseudochild.c
=============
#include <syslog.h>
#include <netdb.h>
#include <unistd.h>
#include <stdlib.h>

int   main( int argc, const char * argv[]) 
{
	struct hostent *hent;
	gid_t gidset[1];

	uid_t uid = 100016;
	gid_t gid = 100016;
	
	gidset[0] = gid;

	syslog(LOG_NOTICE|LOG_MAIL, "Pseudochild: started.");
	
	if (setgid(gid) != 0)
	{
		syslog(LOG_NOTICE|LOG_MAIL, "Pseudochild: setgid() failed.");
		_exit(1);
	}
	if (setgroups(1, gidset) != 0)
	{
		syslog(LOG_NOTICE|LOG_MAIL, "Pseudochild: setgroups() failed.");
		_exit(1);
	}
	if (setuid(uid) != 0)
	{
		syslog(LOG_NOTICE|LOG_MAIL, "Pseudochild: setuid() failed.");
		_exit(1);
	}

	// Other post-exec() setups
	
	hent = gethostbyname("ALMba.local");
	if ( hent != NULL )
	{
		syslog(LOG_NOTICE|LOG_MAIL, "Pseudochild: gethostbyname() returned %s\n", hent->h_name);
		return(0);
	}
	else
	{
		syslog(LOG_NOTICE|LOG_MAIL, "Pseudochild: gethostbyname() failed.");
		return(1);
	}
}


pseudomaster.plist
==================
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>ServiceDescription</key>
	<string>Master test</string>
	<key>Label</key>
	<string>pseudomaster</string>
	<key>Disabled</key>
	<true/>
	<key>KeepAlive</key>
	<true/>
	<key>ProgramArguments</key>
	<array>
			<string>/_COD/pseudomaster</string>
	</array>
</dict>
</plist>




More information about the launchd-dev mailing list