[launchd-dev] launch_msg(): Socket is not connected error in Leopard

Tue Feb 16 01:39:27 PST 2010

At 19:04 +0530 9/2/10, Arun wrote:
>This daemon writes multiple plist files in /Library/LaunchAgents and 
>tries to load the same for all the users currently logged in.  The 
>loading of the agents is failing with "launch_msg(): Socket is not 
>connected".

Before reading this response, you /really/ need to read and 
understand TN2083 "Daemons and Agents".  It describes the background 
to this problem, and you won't be able to understand my response 
without understanding the key points of that technote first.

<http://developer.apple.com/mac/library/technotes/tn2005/tn2083.html>

You can't just run an agent from an arbitrary context.  There are 
multiple instances of launchd on the system, and each instance can be 
running multiple contexts.  When you run launchctl, you're end up 
talking to one specific context, and that context is determined by 
two things:

A. if you run launchctl as root (specifically, the RUID must be 0), 
you always end up talking to the global context managed by the root 
launchd (PID 1)

B. otherwise, the context is determined by the current Mach bootstrap 
namespace (as described in detail in TN2083)

This present a nasty chicken and egg problem.  You can't load an 
agent in a particular context unless you're already running in that 
context.  This is a well-known gotcha, and it's being tracked by us 
as <rdar://problem/5476420>.

The only fully supported solution to this problem is to force a 
restart.  If that's not acceptable, you have to break the problem 
down as follows:

o upgrade -- There /is/ a reasonable way to handle the upgrade 
scenario.  Most agents are running on behalf of a daemon.  In that 
case you can overwrite the agent on disk and then tell the daemon to 
signal all of its agents to quit.  They will be relaunched by their 
respective launchd's, this time running the new code.  You have to be 
a little careful, but this approach works reasonably well.

o uninstall -- You can handle uninstall in much the same way as you 
handle upgrade.  You have your daemon tell its connected agents about 
the uninstall.  The agents can then run launchctl to unload the job 
from their specific context.

o first install -- There isn't a good way to handle the first install 
case.  You can start the agent in the GUI context that you're running 
in, but starting agents in other existing GUI contexts is tricky.  In 
general I'd recommend you /not/ try to solve this.

To expand on this a little, consider the common case of an installer 
running in the foreground user's context.  From that context, it can 
load the agent for that particular user.  OTOH, non-foreground GUI 
users will have to log out and log back in to pick up the agent. If 
that's a problem (for example, you're installing security software), 
force a restart in this case.

Keep in mind that multiple GUI logins via fast user switching is an 
edge case that 90% of your users will never encounter.

However, if you're installing from a daemon you can't directly load 
any agents.  If restarting isn't an acceptable solution, you have to 
stray into the the stuff we really don't support.  Specifically, you 
can talk to a given launchd context by switching your bootstrap 
namespace to match that of a process running in that context.  You 
can find out all the GUI login sessions using the techniques from 
QA1133 "Determining console user login status".

<http://developer.apple.com/qa/qa2001/qa1133.html>

You can then use launchctl's bsexec command to run commands in the 
context associated with a process in the GUI context.

For example, if you're running as root, and there's a logged in GUI 
user called "apple" whose "loginwindow" process is PID 3073, you can 
run a copy of launchctl in their context with the following command:

# launchctl bsexec 3073 chroot -u apple / launchctl list

[Note that the chroot doesn't actually affect the root directory in 
this command; I'm using it solely to switch the user ID.  In real 
software it would be better to create a tiny helper tool to handle 
this task.]

There are two gotchas with this approach:

o It's /so/ not supported.  It works on current systems (10.5.x and 
10.6.x), and is likely to be compatible with 10.6.x software updates, 
but I can't predict how well it will work beyond that.

o It's very hard to target the context used by pre-login launchd 
agents.  The issue here is that those agents run as root so, when you 
try to adopt the user ID associated with your agents, you end up 
talking to the global context (see point A, above).

Just to repeat, there's only one way to solve the "first install 
problem" that's guaranteed to be compatible in the long term: force a 
restart.  Any other solutions is probably going to require you to 
revisit this issue in the future (hopefully to adopt a nice, 
well-supported API, eh Damien? :-).

S+E
-- 
Quinn "The Eskimo!"                    <http://www.apple.com/developer/>
Apple Developer Relations, Developer Technical Support, Core OS/Hardware