quartz-wm goes crazy on macbook air
Hi Jeremy, I am on 2.6.1, xorg-server 1.9.5 and 10.6.7 - most of the time everything is normal, except just now quartz-wm went crazy - taking all cpu time - I had to kill it - no message on the console, not sure if the sample helps. It happens just once :-) ciao christof Sampling process 842 for 3 seconds with 1 millisecond of run time between samples Sampling completed, processing symbols... Analysis of sampling quartz-wm (pid 842) every 1 millisecond Call graph: 2313 Thread_8344 DispatchQueue_1: com.apple.main-thread (serial) 2313 start 2313 main 2313 CFRunLoopRun 2313 CFRunLoopRunSpecific 2313 __CFRunLoopRun 2313 __CFRunLoopDoSources0 2313 __CFSocketPerformV0 2313 __CFSocketDoCallback 2313 x_input_run 2313 XPending 2313 _XEventsQueued 2313 _XIOError 2313 x_init_error_handler 2313 exit 2313 __cxa_finalize 2313 dyld::runTerminators(void*) 2313 ImageLoaderMachO::doTermination(ImageLoader::LinkContext const&) 2313 __KerberosInternal_krb5int_mutex_alloc 2313 0x7fff5fbfd290 2313 _sigtramp 2313 signal_handler 2313 exit 2313 __cxa_finalize 2313 __tcf_0 2313 __spin_lock 2313 Thread_8345 DispatchQueue_2: com.apple.libdispatch-manager (serial) 2313 start_wqthread 2313 _pthread_wqthread 2313 _dispatch_worker_thread2 2313 _dispatch_queue_invoke 2313 _dispatch_mgr_invoke 2313 kevent 2313 Thread_8347: com.apple.CFSocket.private 2313 thread_start 2313 _pthread_start 2313 __CFSocketManager 2313 select$DARWIN_EXTSN 2313 Thread_19389 2313 start_wqthread 2313 __spin_lock Total number in stack (recursive counted multiple, when >=5): Sort by top of stack, same collapsed (when >= 5): __spin_lock 4626 kevent 2313 select$DARWIN_EXTSN 2313 Sample analysis of process 842 written to file /dev/stdout -- public key www.hfph.mwn.de/~chwolf/ch.wolf.asc Please respect my privacy and do not make my contact information available to third parties. This email is UNCLASSIFIED.
The "going crazy" is not a bug in quartz-wm. What version of the OS are you on? I'm curious how quartz-wm got into that state though. It's in an error handler, so it didn't like something it got from the server. On Apr 7, 2011, at 3:32 AM, Christof Wolf wrote:
Hi Jeremy,
I am on 2.6.1, xorg-server 1.9.5 and 10.6.7 - most of the time everything is normal, except just now quartz-wm went crazy - taking all cpu time - I had to kill it - no message on the console, not sure if the sample helps. It happens just once :-)
ciao christof
Sampling process 842 for 3 seconds with 1 millisecond of run time between samples Sampling completed, processing symbols... Analysis of sampling quartz-wm (pid 842) every 1 millisecond Call graph: 2313 Thread_8344 DispatchQueue_1: com.apple.main-thread (serial) 2313 start 2313 main 2313 CFRunLoopRun 2313 CFRunLoopRunSpecific 2313 __CFRunLoopRun 2313 __CFRunLoopDoSources0 2313 __CFSocketPerformV0 2313 __CFSocketDoCallback 2313 x_input_run 2313 XPending 2313 _XEventsQueued 2313 _XIOError 2313 x_init_error_handler 2313 exit 2313 __cxa_finalize 2313 dyld::runTerminators(void*) 2313 ImageLoaderMachO::doTermination(ImageLoader::LinkContext const&) 2313 __KerberosInternal_krb5int_mutex_alloc 2313 0x7fff5fbfd290 2313 _sigtramp 2313 signal_handler 2313 exit 2313 __cxa_finalize 2313 __tcf_0 2313 __spin_lock 2313 Thread_8345 DispatchQueue_2: com.apple.libdispatch-manager (serial) 2313 start_wqthread 2313 _pthread_wqthread 2313 _dispatch_worker_thread2 2313 _dispatch_queue_invoke 2313 _dispatch_mgr_invoke 2313 kevent 2313 Thread_8347: com.apple.CFSocket.private 2313 thread_start 2313 _pthread_start 2313 __CFSocketManager 2313 select$DARWIN_EXTSN 2313 Thread_19389 2313 start_wqthread 2313 __spin_lock
Total number in stack (recursive counted multiple, when >=5):
Sort by top of stack, same collapsed (when >= 5): __spin_lock 4626 kevent 2313 select$DARWIN_EXTSN 2313 Sample analysis of process 842 written to file /dev/stdout
-- public key www.hfph.mwn.de/~chwolf/ch.wolf.asc
Please respect my privacy and do not make my contact information available to third parties.
This email is UNCLASSIFIED.
On Apr 7, 2011, at 9:16 AM, Jeremy Huddleston wrote:
The "going crazy" is not a bug in quartz-wm. What version of the OS are you on?
And you already answered that. I read too fast. Please file a bug report at http://bugreport.apple.com ... since you said this only happened once, it's not much use asking you to reproduce on Lion, but if you do have access to the Lion seed, it it would be interesting to know if the issue is still there...
I'm curious how quartz-wm got into that state though. It's in an error handler, so it didn't like something it got from the server.
On Apr 7, 2011, at 3:32 AM, Christof Wolf wrote:
Hi Jeremy,
I am on 2.6.1, xorg-server 1.9.5 and 10.6.7 - most of the time everything is normal, except just now quartz-wm went crazy - taking all cpu time - I had to kill it - no message on the console, not sure if the sample helps. It happens just once :-)
ciao christof
Sampling process 842 for 3 seconds with 1 millisecond of run time between samples Sampling completed, processing symbols... Analysis of sampling quartz-wm (pid 842) every 1 millisecond Call graph: 2313 Thread_8344 DispatchQueue_1: com.apple.main-thread (serial) 2313 start 2313 main 2313 CFRunLoopRun 2313 CFRunLoopRunSpecific 2313 __CFRunLoopRun 2313 __CFRunLoopDoSources0 2313 __CFSocketPerformV0 2313 __CFSocketDoCallback 2313 x_input_run 2313 XPending 2313 _XEventsQueued 2313 _XIOError 2313 x_init_error_handler 2313 exit 2313 __cxa_finalize 2313 dyld::runTerminators(void*) 2313 ImageLoaderMachO::doTermination(ImageLoader::LinkContext const&) 2313 __KerberosInternal_krb5int_mutex_alloc 2313 0x7fff5fbfd290 2313 _sigtramp 2313 signal_handler 2313 exit 2313 __cxa_finalize 2313 __tcf_0 2313 __spin_lock 2313 Thread_8345 DispatchQueue_2: com.apple.libdispatch-manager (serial) 2313 start_wqthread 2313 _pthread_wqthread 2313 _dispatch_worker_thread2 2313 _dispatch_queue_invoke 2313 _dispatch_mgr_invoke 2313 kevent 2313 Thread_8347: com.apple.CFSocket.private 2313 thread_start 2313 _pthread_start 2313 __CFSocketManager 2313 select$DARWIN_EXTSN 2313 Thread_19389 2313 start_wqthread 2313 __spin_lock
Total number in stack (recursive counted multiple, when >=5):
Sort by top of stack, same collapsed (when >= 5): __spin_lock 4626 kevent 2313 select$DARWIN_EXTSN 2313 Sample analysis of process 842 written to file /dev/stdout
-- public key www.hfph.mwn.de/~chwolf/ch.wolf.asc
Please respect my privacy and do not make my contact information available to third parties.
This email is UNCLASSIFIED.
_______________________________________________ Xquartz-dev mailing list Xquartz-dev@lists.macosforge.org http://lists.macosforge.org/mailman/listinfo.cgi/xquartz-dev
Is the function named signal_handler part of quartz-wm? Because it should be calling _exit, not exit. Though that's just a followup bug I suppose. -- Pelle Johansson 7 apr 2011 kl. 18.16 skrev Jeremy Huddleston:
The "going crazy" is not a bug in quartz-wm. What version of the OS are you on?
I'm curious how quartz-wm got into that state though. It's in an error handler, so it didn't like something it got from the server.
On Apr 7, 2011, at 3:32 AM, Christof Wolf wrote:
Hi Jeremy,
I am on 2.6.1, xorg-server 1.9.5 and 10.6.7 - most of the time everything is normal, except just now quartz-wm went crazy - taking all cpu time - I had to kill it - no message on the console, not sure if the sample helps. It happens just once :-)
ciao christof
Sampling process 842 for 3 seconds with 1 millisecond of run time between samples Sampling completed, processing symbols... Analysis of sampling quartz-wm (pid 842) every 1 millisecond Call graph: 2313 Thread_8344 DispatchQueue_1: com.apple.main-thread (serial) 2313 start 2313 main 2313 CFRunLoopRun 2313 CFRunLoopRunSpecific 2313 __CFRunLoopRun 2313 __CFRunLoopDoSources0 2313 __CFSocketPerformV0 2313 __CFSocketDoCallback 2313 x_input_run 2313 XPending 2313 _XEventsQueued 2313 _XIOError 2313 x_init_error_handler 2313 exit 2313 __cxa_finalize 2313 dyld::runTerminators(void*) 2313 ImageLoaderMachO::doTermination(ImageLoader::LinkContext const&) 2313 __KerberosInternal_krb5int_mutex_alloc 2313 0x7fff5fbfd290 2313 _sigtramp 2313 signal_handler 2313 exit 2313 __cxa_finalize 2313 __tcf_0 2313 __spin_lock 2313 Thread_8345 DispatchQueue_2: com.apple.libdispatch-manager (serial) 2313 start_wqthread 2313 _pthread_wqthread 2313 _dispatch_worker_thread2 2313 _dispatch_queue_invoke 2313 _dispatch_mgr_invoke 2313 kevent 2313 Thread_8347: com.apple.CFSocket.private 2313 thread_start 2313 _pthread_start 2313 __CFSocketManager 2313 select$DARWIN_EXTSN 2313 Thread_19389 2313 start_wqthread 2313 __spin_lock
Total number in stack (recursive counted multiple, when >=5):
Sort by top of stack, same collapsed (when >= 5): __spin_lock 4626 kevent 2313 select$DARWIN_EXTSN 2313 Sample analysis of process 842 written to file /dev/stdout
-- public key www.hfph.mwn.de/~chwolf/ch.wolf.asc
Please respect my privacy and do not make my contact information available to third parties.
This email is UNCLASSIFIED.
_______________________________________________ Xquartz-dev mailing list Xquartz-dev@lists.macosforge.org http://lists.macosforge.org/mailman/listinfo.cgi/xquartz-dev
On Apr 7, 2011, at 9:41 AM, Pelle Johansson wrote:
Is the function named signal_handler part of quartz-wm? Because it should be calling _exit, not exit. Though that's just a followup bug I suppose.
Ah, good call. That is our handler. That still doesn't explain: 1) Why are we in x_init_error_handler while in CFRunLoop? 2) Why did ImageLoaderMachO::doTermination barf? As for us moving over to _exit(2) or _Exit(3) rather than exit(3), that would fix one of the problems, that's actually one of the least offensive of the signal_handler's crimes. It looks like we're calling some *very* non-reentrant routines from within the SIGINT and SIGTERM signal handler, such as various calls into Xlib to make the window state safe for us to exit. Yuck! At least SIGHUP (for reloading preferences) is safe.
-- Pelle Johansson
7 apr 2011 kl. 18.16 skrev Jeremy Huddleston:
The "going crazy" is not a bug in quartz-wm. What version of the OS are you on?
I'm curious how quartz-wm got into that state though. It's in an error handler, so it didn't like something it got from the server.
On Apr 7, 2011, at 3:32 AM, Christof Wolf wrote:
Hi Jeremy,
I am on 2.6.1, xorg-server 1.9.5 and 10.6.7 - most of the time everything is normal, except just now quartz-wm went crazy - taking all cpu time - I had to kill it - no message on the console, not sure if the sample helps. It happens just once :-)
ciao christof
Sampling process 842 for 3 seconds with 1 millisecond of run time between samples Sampling completed, processing symbols... Analysis of sampling quartz-wm (pid 842) every 1 millisecond Call graph: 2313 Thread_8344 DispatchQueue_1: com.apple.main-thread (serial) 2313 start 2313 main 2313 CFRunLoopRun 2313 CFRunLoopRunSpecific 2313 __CFRunLoopRun 2313 __CFRunLoopDoSources0 2313 __CFSocketPerformV0 2313 __CFSocketDoCallback 2313 x_input_run 2313 XPending 2313 _XEventsQueued 2313 _XIOError 2313 x_init_error_handler 2313 exit 2313 __cxa_finalize 2313 dyld::runTerminators(void*) 2313 ImageLoaderMachO::doTermination(ImageLoader::LinkContext const&) 2313 __KerberosInternal_krb5int_mutex_alloc 2313 0x7fff5fbfd290 2313 _sigtramp 2313 signal_handler 2313 exit 2313 __cxa_finalize 2313 __tcf_0 2313 __spin_lock 2313 Thread_8345 DispatchQueue_2: com.apple.libdispatch-manager (serial) 2313 start_wqthread 2313 _pthread_wqthread 2313 _dispatch_worker_thread2 2313 _dispatch_queue_invoke 2313 _dispatch_mgr_invoke 2313 kevent 2313 Thread_8347: com.apple.CFSocket.private 2313 thread_start 2313 _pthread_start 2313 __CFSocketManager 2313 select$DARWIN_EXTSN 2313 Thread_19389 2313 start_wqthread 2313 __spin_lock
Total number in stack (recursive counted multiple, when >=5):
Sort by top of stack, same collapsed (when >= 5): __spin_lock 4626 kevent 2313 select$DARWIN_EXTSN 2313 Sample analysis of process 842 written to file /dev/stdout
-- public key www.hfph.mwn.de/~chwolf/ch.wolf.asc
Please respect my privacy and do not make my contact information available to third parties.
This email is UNCLASSIFIED.
_______________________________________________ Xquartz-dev mailing list Xquartz-dev@lists.macosforge.org http://lists.macosforge.org/mailman/listinfo.cgi/xquartz-dev
_______________________________________________ Xquartz-dev mailing list Xquartz-dev@lists.macosforge.org http://lists.macosforge.org/mailman/listinfo.cgi/xquartz-dev
On Apr 7, 2011, at 10:02 AM, Jeremy Huddleston wrote:
On Apr 7, 2011, at 9:41 AM, Pelle Johansson wrote:
Is the function named signal_handler part of quartz-wm? Because it should be calling _exit, not exit. Though that's just a followup bug I suppose.
Ah, good call. That is our handler. That still doesn't explain:
1) Why are we in x_init_error_handler while in CFRunLoop? 2) Why did ImageLoaderMachO::doTermination barf?
As for us moving over to _exit(2) or _Exit(3) rather than exit(3), that would fix one of the problems, that's actually one of the least offensive of the signal_handler's crimes. It looks like we're calling some *very* non-reentrant routines from within the SIGINT and SIGTERM signal handler, such as various calls into Xlib to make the window state safe for us to exit. Yuck! At least SIGHUP (for reloading preferences) is safe.
Ok, well I have a fix for the reentrancy snafu. It'll be in 2.6.2_beta1 (probably tomorrow or early next week). --Jeremy
On Apr 7, 2011, at 9:16 AM, Jeremy Huddleston wrote:
The "going crazy" is not a bug in quartz-wm. What version of the OS are you on?
I'm curious how quartz-wm got into that state though. It's in an error handler, so it didn't like something it got from the server.
So this is a bit confusing. The x_init_error_handler is only the error handler for a short period of time *before* we enter the CFRunLoop. Furthermore, it is set via XSetErrorHandler, not XSetIOErrorHandler ... so we should never be calling into x_init_error_handler from XIOError, and we should never be calling into that handler from within the CFRunLoop. This is a very puzzling backtrace... are you sure there is nothing in /var/log/system.log that would be helpful? I would expect to see, "quartz-wm: another window manager is running; exiting" because that is printed by x_init_error_handler before the exit... An XIOError usually means that the server stopped responding. Did this happen as you were exiting X11?
On Apr 7, 2011, at 3:32 AM, Christof Wolf wrote:
Hi Jeremy,
I am on 2.6.1, xorg-server 1.9.5 and 10.6.7 - most of the time everything is normal, except just now quartz-wm went crazy - taking all cpu time - I had to kill it - no message on the console, not sure if the sample helps. It happens just once :-)
ciao christof
Sampling process 842 for 3 seconds with 1 millisecond of run time between samples Sampling completed, processing symbols... Analysis of sampling quartz-wm (pid 842) every 1 millisecond Call graph: 2313 Thread_8344 DispatchQueue_1: com.apple.main-thread (serial) 2313 start 2313 main 2313 CFRunLoopRun 2313 CFRunLoopRunSpecific 2313 __CFRunLoopRun 2313 __CFRunLoopDoSources0 2313 __CFSocketPerformV0 2313 __CFSocketDoCallback 2313 x_input_run 2313 XPending 2313 _XEventsQueued 2313 _XIOError 2313 x_init_error_handler 2313 exit 2313 __cxa_finalize 2313 dyld::runTerminators(void*) 2313 ImageLoaderMachO::doTermination(ImageLoader::LinkContext const&) 2313 __KerberosInternal_krb5int_mutex_alloc 2313 0x7fff5fbfd290 2313 _sigtramp 2313 signal_handler 2313 exit 2313 __cxa_finalize 2313 __tcf_0 2313 __spin_lock 2313 Thread_8345 DispatchQueue_2: com.apple.libdispatch-manager (serial) 2313 start_wqthread 2313 _pthread_wqthread 2313 _dispatch_worker_thread2 2313 _dispatch_queue_invoke 2313 _dispatch_mgr_invoke 2313 kevent 2313 Thread_8347: com.apple.CFSocket.private 2313 thread_start 2313 _pthread_start 2313 __CFSocketManager 2313 select$DARWIN_EXTSN 2313 Thread_19389 2313 start_wqthread 2313 __spin_lock
Total number in stack (recursive counted multiple, when >=5):
Sort by top of stack, same collapsed (when >= 5): __spin_lock 4626 kevent 2313 select$DARWIN_EXTSN 2313 Sample analysis of process 842 written to file /dev/stdout
-- public key www.hfph.mwn.de/~chwolf/ch.wolf.asc
Please respect my privacy and do not make my contact information available to third parties.
This email is UNCLASSIFIED.
_______________________________________________ Xquartz-dev mailing list Xquartz-dev@lists.macosforge.org http://lists.macosforge.org/mailman/listinfo.cgi/xquartz-dev
On Apr 7, 2011, at 11:46 AM, Jeremy Huddleston wrote:
On Apr 7, 2011, at 9:16 AM, Jeremy Huddleston wrote:
The "going crazy" is not a bug in quartz-wm. What version of the OS are you on?
I'm curious how quartz-wm got into that state though. It's in an error handler, so it didn't like something it got from the server.
So this is a bit confusing. The x_init_error_handler is only the error handler for a short period of time *before* we enter the CFRunLoop. Furthermore, it is set via XSetErrorHandler, not XSetIOErrorHandler ... so we should never be calling into x_init_error_handler from XIOError, and we should never be calling into that handler from within the CFRunLoop.
This is a very puzzling backtrace...
The above sounds like "sample" misattributing a code address to the wrong symbol because the right symbol has been stripped. Unfortunately, it takes some sleuthing in the assembly code to figure out the real stack trace, if it's even possible. Regards, Ken
On Apr 7, 2011, at 2:20 PM, Ken Thomases wrote:
On Apr 7, 2011, at 11:46 AM, Jeremy Huddleston wrote:
On Apr 7, 2011, at 9:16 AM, Jeremy Huddleston wrote:
The "going crazy" is not a bug in quartz-wm. What version of the OS are you on?
I'm curious how quartz-wm got into that state though. It's in an error handler, so it didn't like something it got from the server.
So this is a bit confusing. The x_init_error_handler is only the error handler for a short period of time *before* we enter the CFRunLoop. Furthermore, it is set via XSetErrorHandler, not XSetIOErrorHandler ... so we should never be calling into x_init_error_handler from XIOError, and we should never be calling into that handler from within the CFRunLoop.
This is a very puzzling backtrace...
The above sounds like "sample" misattributing a code address to the wrong symbol because the right symbol has been stripped. Unfortunately, it takes some sleuthing in the assembly code to figure out the real stack trace, if it's even possible.
Right. But without a stackshot, I don't really know the offsets from those symbols, so I'm stuck with intuition. We certainly are in the CFRunLoop, so I trust that. Furthermore, x_init_error_handler is very small, and there is another unstripped symbol *right* after it in the assembly, so I trust that as well... but both can't be correct... there's always a possibility of stack corruption, but I don't see any real evidence of that. I'm just going to chalk this backtrace up to a bizarre mystery and move on. The reentrancy issue is fixed, and if the instigating bug comes back, we will hopefully have a better backtrace to work from. --Jeremy
When you killed quartz-wm, that should've generated a crash report (in ~/Library/Logs/DiagnosticReports). Could you send that to me as well? Thanks On Apr 7, 2011, at 3:32 AM, Christof Wolf wrote:
Hi Jeremy,
I am on 2.6.1, xorg-server 1.9.5 and 10.6.7 - most of the time everything is normal, except just now quartz-wm went crazy - taking all cpu time - I had to kill it - no message on the console, not sure if the sample helps. It happens just once :-)
ciao christof
Sampling process 842 for 3 seconds with 1 millisecond of run time between samples Sampling completed, processing symbols... Analysis of sampling quartz-wm (pid 842) every 1 millisecond Call graph: 2313 Thread_8344 DispatchQueue_1: com.apple.main-thread (serial) 2313 start 2313 main 2313 CFRunLoopRun 2313 CFRunLoopRunSpecific 2313 __CFRunLoopRun 2313 __CFRunLoopDoSources0 2313 __CFSocketPerformV0 2313 __CFSocketDoCallback 2313 x_input_run 2313 XPending 2313 _XEventsQueued 2313 _XIOError 2313 x_init_error_handler 2313 exit 2313 __cxa_finalize 2313 dyld::runTerminators(void*) 2313 ImageLoaderMachO::doTermination(ImageLoader::LinkContext const&) 2313 __KerberosInternal_krb5int_mutex_alloc 2313 0x7fff5fbfd290 2313 _sigtramp 2313 signal_handler 2313 exit 2313 __cxa_finalize 2313 __tcf_0 2313 __spin_lock 2313 Thread_8345 DispatchQueue_2: com.apple.libdispatch-manager (serial) 2313 start_wqthread 2313 _pthread_wqthread 2313 _dispatch_worker_thread2 2313 _dispatch_queue_invoke 2313 _dispatch_mgr_invoke 2313 kevent 2313 Thread_8347: com.apple.CFSocket.private 2313 thread_start 2313 _pthread_start 2313 __CFSocketManager 2313 select$DARWIN_EXTSN 2313 Thread_19389 2313 start_wqthread 2313 __spin_lock
Total number in stack (recursive counted multiple, when >=5):
Sort by top of stack, same collapsed (when >= 5): __spin_lock 4626 kevent 2313 select$DARWIN_EXTSN 2313 Sample analysis of process 842 written to file /dev/stdout
-- public key www.hfph.mwn.de/~chwolf/ch.wolf.asc
Please respect my privacy and do not make my contact information available to third parties.
This email is UNCLASSIFIED.
_______________________________________________ Xquartz-dev mailing list Xquartz-dev@lists.macosforge.org http://lists.macosforge.org/mailman/listinfo.cgi/xquartz-dev
participants (4)
-
Christof Wolf
-
Jeremy Huddleston
-
Ken Thomases
-
Pelle Johansson