[Xquartz-dev] Help requested debugging rgl under XQuartz

Duncan Murdoch murdoch.duncan at gmail.com
Sat Feb 27 02:52:54 PST 2021


I asked the developer of the R quartz() device code for some help, and 
he thinks it's related to a conflict between event loops.

My bug only occurs when R is running in the terminal.  Before starting 
rgl, R has no Cocoa event loop.  I imagine XQuartz starts one when I 
initialize things.  Then the quartz() device initializes Cocoa and 
starts its own event loop, and somehow the two event loops clash.

When R is running its GUI R.app, it already has an event loop and the 
bug doesn't happen.

I have a workaround:  making sure quartz() starts first.  I think I'll 
be satisfied with that.

Duncan Murdoch




On 25/02/2021 12:31 a.m., Jeremy Huddleston Sequoia wrote:
> Yeah, if gldAttachDrawable 
> does gpuiReleaseDrawable, IOAccelGLContextClearDrawable, and then 
> returns the error on its error out paths.
> 
> gldAttachDrawable takes the drawable type as the second argument. The 
> types are:
> 
> none (0)
> pbuffer  (90)
> window (80)
> offscreen (53)
> fullscreen (54)
> 
> 
> When gldAttachDrawable() is called with type none, it looks to me like 
> it should just do its error out path (release / clear / return error). 
>   I don't see how it would call gfxIODataBindSurface in that case.  Of 
> course, I'm not necessarilary looking at the exact same implementation 
> as is on your system.  Can you provide the output of `image list` from 
> lldb, so I can see the UUIDs of the various dylibs to look at the exact 
> source version?
> 
> In any event, we need to figure out why we're getting a type of none 
> into gldAttachDrawable.
> 
> gliAttachDrawableWithOptions takes a type and passes it straight 
> through, so it's not that.
> 
> CGLSetSurface just takes a context, connection id, window ID, and 
> surface ID.
> 
> The context is passed straight from the input to xp_attach_gl_context, 
> and the xp_surface_id input to xp_attach_gl_context maps to the wid / 
> sid passed to CGLSetSurface.
> 
> ---
> 
> Would you be able to reduce this issue to a very small X11 + GLX 
> application that I could use to debug deeper myself?
> 
>> On Feb 24, 2021, at 12:38, Duncan Murdoch <murdoch.duncan at gmail.com 
>> <mailto:murdoch.duncan at gmail.com>> wrote:
>>
>> On 24/02/2021 3:10 p.m., Duncan Murdoch wrote:
>>> The only call it makes is to libGFXShared.dylib`gfxIODataBindSurface,
>>> and when it returns from that it jumps to the error exit.
>>> Inside that function, it checks whether a pointer is non-null, then uses
>>> it to jump to libGPUSupportMercury.dylib`gldAttachDrawable.
>>> In gldAttachDrawable, it looks like it is detecting something wrong,
>>> then it calls gpuiReleaseDrawable, IOAccelGLContextClearDrawable, and
>>> then returns the 0x2715 = 10005 = kCGLBadDrawable value.
>>> I don't have the source (do I?), and I don't know the argument passing
>>> conventions.  Can you tell me how type would be passed in?
>>
>> I've found some info that says the 2nd argument would be passed in 
>> RSI.  If that's the case, then what I'm seeing is the following:
>>
>> - When things are working properly, the value 0x50 = 80 is passed in 
>> several times.
>>
>> - After calling quartz(), the value is 0, and the error is triggered.
>>
>> Duncan Murdoch
>>
>>> I don't think we get to either of the other functions.
>>> Duncan Murdoch
>>> On 24/02/2021 12:59 p.m., Jeremy Huddleston Sequoia wrote:
>>>> IOAccelGLContextClearDrawable is called on the error-out path of 
>>>> that function, so yeah, we need to see how we got there.
>>>>
>>>> enum32_t gldAttachDrawable(GLDContext ctx, enum32_t type, const 
>>>> GLDDrawable drawable, bitfield32_t options, GLTDimensions *size_ret)
>>>>
>>>> Can you tell me what the type is here?
>>>>
>>>> Do we get to IOAccelGLContextSetDrawable()?  If so, what does it return?
>>>> Do we get to gpulUpdateDrawableDepth()?  If so, what does it return?
>>>>
>>>>
>>>>> On Feb 24, 2021, at 09:46, Duncan Murdoch <murdoch.duncan at gmail.com 
>>>>> <mailto:murdoch.duncan at gmail.com>> wrote:
>>>>>
>>>>> Yes, I did get it wrong.  It looks like the error was detected 
>>>>> before the call to IOAccelGLContextClearDrawable and the stack 
>>>>> checking.  I'll see if I can figure out where.
>>>>>
>>>>> Duncan Murdoch
>>>>>
>>>>> On 24/02/2021 12:11 p.m., Duncan Murdoch wrote:
>>>>>> I don't see any calls to __stack_chk_fail .  It's possible I
>>>>>> misinterpreted what was going on after the 
>>>>>> IOAccelGLContextClearDrawable
>>>>>> call.  I'll take another look.
>>>>>> Duncan Murdoch
>>>>>> On 24/02/2021 11:41 a.m., Jeremy Huddleston Sequoia wrote:
>>>>>>> __stack_chk_guard is part of stack protector.
>>>>>>>
>>>>>>> If it's not liking the value in __stack_chk_guard, it means the stack
>>>>>>> was smashed.
>>>>>>>
>>>>>>> When this is detected, the compiler runtime should
>>>>>>> call __stack_chk_fail() if implemented or abort if not.  Given that
>>>>>>> we're not crashing, I wonder if there's a handler somewhere that 
>>>>>>> ends up
>>>>>>> causing us to return the bad value instead of crashing.
>>>>>>>
>>>>>>> Can you break on __stack_chk_fail and see if that gives us 
>>>>>>> anything useful?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On Feb 24, 2021, at 06:26, Duncan Murdoch 
>>>>>>>> <murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>
>>>>>>>> <mailto:murdoch.duncan at gmail.com 
>>>>>>>> <mailto:murdoch.duncan at gmail.com>>> wrote:
>>>>>>>>
>>>>>>>> Tracing in with lldb, it appears to be this sequence of calls 
>>>>>>>> leading
>>>>>>>> to the 10005 error value:
>>>>>>>>
>>>>>>>> r
>>>>>>>>   * frame #0: 0x00007fff5afc19e0
>>>>>>>> libGPUSupportMercury.dylib`gldAttachDrawable + 1
>>>>>>>>     frame #1: 0x00007fff4467f396 
>>>>>>>> GLEngine`gliAttachDrawableWithOptions
>>>>>>>> + 251
>>>>>>>>     frame #2: 0x00007fff4465d9f5
>>>>>>>> OpenGL`___lldb_unnamed_symbol40$$OpenGL + 972
>>>>>>>>     frame #3: 0x00007fff446618e2
>>>>>>>> OpenGL`___lldb_unnamed_symbol59$$OpenGL + 82
>>>>>>>>     frame #4: 0x00007fff44661c29 OpenGL`CGLSetSurface + 330
>>>>>>>>     frame #5: 0x00007fff70c6ca63
>>>>>>>> libXplugin.1.dylib`xp_attach_gl_context + 95
>>>>>>>>     frame #6: 0x0000000108590dee 
>>>>>>>> libGL.1.dylib`surface_make_current + 206
>>>>>>>>     frame #7: 0x000000010858df6a
>>>>>>>> libGL.1.dylib`apple_glx_make_current_context + 1274
>>>>>>>>     frame #8: 0x0000000108574579 
>>>>>>>> libGL.1.dylib`applegl_bind_context + 185
>>>>>>>>     frame #9: 0x000000010856237e 
>>>>>>>> libGL.1.dylib`MakeContextCurrent + 414
>>>>>>>>     frame #10: 0x00000001085621d9 libGL.1.dylib`glXMakeCurrent + 41
>>>>>>>>
>>>>>>>>
>>>>>>>> The libGPUSupportMercury.dylib`gldAttachDrawable function calls
>>>>>>>>
>>>>>>>> IOAccelGLContextClearDrawable
>>>>>>>>
>>>>>>>> then does some sort of check of __stack_chk_guard and doesn't like
>>>>>>>> what it sees, and sets the error.
>>>>>>>>
>>>>>>>> Does this give any hint about what's wrong, or a way to fix it?
>>>>>>>>
>>>>>>>> Duncan Murdoch
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 23/02/2021 4:31 p.m., Duncan Murdoch wrote:
>>>>>>>>> On 23/02/2021 3:47 p.m., Jeremy Huddleston Sequoia wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On Feb 23, 2021, at 06:14, Duncan Murdoch 
>>>>>>>>>>> <murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>
>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com 
>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com>>
>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com 
>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com>
>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com 
>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 23/02/2021 12:47 a.m., Jeremy Huddleston Sequoia wrote:
>>>>>>>>>>>>> On Feb 22, 2021, at 14:38, Duncan Murdoch
>>>>>>>>>>>>> <murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com> 
>>>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com 
>>>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com>>
>>>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com 
>>>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com>
>>>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com 
>>>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com>>>
>>>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com 
>>>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com> 
>>>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com 
>>>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com>>
>>>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com 
>>>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com>
>>>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com 
>>>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've made a little bit of progress.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The message "error: xp_attach_gl_context returned: 2" comes 
>>>>>>>>>>>>> from the
>>>>>>>>>>>>> Mesa routine surface_make_current, which calls 
>>>>>>>>>>>>> xp_attach_gl_context.
>>>>>>>>>>>>>   I haven't found where xp_attach_gl_context is defined.
>>>>>>>>>>>> xp_attach_gl_context is in libXplugin (check Xplugin.h in 
>>>>>>>>>>>> the SDK).
>>>>>>>>>>>> 2 is XP_BadValue, which is returned if cgl_ctx is NULL... so I'd
>>>>>>>>>>>> suggest looking into why mesa is calling 
>>>>>>>>>>>> xp_attach_gl_context with a
>>>>>>>>>>>> NULL context.
>>>>>>>>>>>
>>>>>>>>>>> Thanks, that's helpful.  The context is not NULL, so I need 
>>>>>>>>>>> to think
>>>>>>>>>>> of other ways it could be "bad".
>>>>>>>>>>
>>>>>>>>>> Ok, well xp_attach_gl_context is just a wrapper around 
>>>>>>>>>> CGLSetSurface(),
>>>>>>>>>> which is an internal function to do exactly what we're trying 
>>>>>>>>>> to do
>>>>>>>>>> here.  If it returns any error, xp_attach_gl_context returns 
>>>>>>>>>> bad value.
>>>>>>>>>>
>>>>>>>>>> Are you able to capture this in the debugger and figure out 
>>>>>>>>>> what the
>>>>>>>>>> return value from CGLSetSurface() is?  That will tell us what the
>>>>>>>>>> underlying CGLError is, which might help shed some light on this.
>>>>>>>>> I believe it's returning  0x0000000000002715 when there's an error.
>>>>>>>>> That's 10005, kCGLBadDrawable.  So now I need to find out what 
>>>>>>>>> happened
>>>>>>>>> to the drawable.
>>>>>>>>> This feels like progress!  Thanks again.
>>>>>>>>> Duncan
>>>>>>>>>>
>>>>>>>>>>> Here's what I see with LIBGL_DIAGNOSTIC=1.  For a successful 
>>>>>>>>>>> open,
>>>>>>>>>>>
>>>>>>>>>>>> rgl.open()
>>>>>>>>>>> function is no-op
>>>>>>>>>>> Debug     ../src/glx/apple/apple_glx_context.c:205
>>>>>>>>>>> apple_glx_create_context(4295810496): 
>>>>>>>>>>> apple_glx_create_context: ac
>>>>>>>>>>> 0x100a10a00 ac->context_obj 0x107cdce00
>>>>>>>>>>> 2021-02-23 08:23:00.041711-0500 R[45754:1283995]
>>>>>>>>>>> apple_glx_create_context: ac 0x100a10a00 ac->context_obj 
>>>>>>>>>>> 0x107cdce00
>>>>>>>>>>> Debug     ../src/glx/apple/apple_glx_drawable.c:342
>>>>>>>>>>> apple_glx_drawable_create(4295810496): 
>>>>>>>>>>> apple_glx_drawable_create: new
>>>>>>>>>>> drawable 0x107ce0e00
>>>>>>>>>>> 2021-02-23 08:23:00.042235-0500 R[45754:1283995]
>>>>>>>>>>> apple_glx_drawable_create: new drawable 0x107ce0e00
>>>>>>>>>>> Debug     ../src/glx/apple/apple_glx_surface.c:154
>>>>>>>>>>> create_surface(4295810496): create_surface: created a surface for
>>>>>>>>>>> drawable 0x600066 with uid 621
>>>>>>>>>>> 2021-02-23 08:23:00.044773-0500 R[45754:1283995] create_surface:
>>>>>>>>>>> created a surface for drawable 0x600066 with uid 621
>>>>>>>>>>> Debug     ../src/glx/apple/apple_glx_surface.c:69
>>>>>>>>>>> surface_make_current(4295810496): surface_make_current:
>>>>>>>>>>> ac->context_obj 0x107cdce00 s->surface_id 9
>>>>>>>>>>> 2021-02-23 08:23:00.044839-0500 R[45754:1283995] 
>>>>>>>>>>> surface_make_current:
>>>>>>>>>>> ac->context_obj 0x107cdce00 s->surface_id 9
>>>>>>>>>>> Debug     ../src/glx/apple/apple_glx_surface.c:89
>>>>>>>>>>> surface_make_current(4295810496): surface_make_current: drawable
>>>>>>>>>>> 0x600066
>>>>>>>>>>> 2021-02-23 08:23:00.045680-0500 R[45754:1283995] 
>>>>>>>>>>> surface_make_current:
>>>>>>>>>>> drawable 0x600066
>>>>>>>>>>> ... (more lines deleted)
>>>>>>>>>>>
>>>>>>>>>>> After I run quartz(), I see this:
>>>>>>>>>>>
>>>>>>>>>>>> rgl.open()
>>>>>>>>>>> Debug     ../src/glx/apple/apple_glx_context.c:205
>>>>>>>>>>> apple_glx_create_context(4295810496): 
>>>>>>>>>>> apple_glx_create_context: ac
>>>>>>>>>>> 0x10262bb00 ac->context_obj 0x1058c4800
>>>>>>>>>>> 2021-02-23 08:23:35.666675-0500 R[45754:1283995]
>>>>>>>>>>> apple_glx_create_context: ac 0x10262bb00 ac->context_obj 
>>>>>>>>>>> 0x1058c4800
>>>>>>>>>>> Debug     ../src/glx/apple/apple_glx_drawable.c:342
>>>>>>>>>>> apple_glx_drawable_create(4295810496): 
>>>>>>>>>>> apple_glx_drawable_create: new
>>>>>>>>>>> drawable 0x107648000
>>>>>>>>>>> 2021-02-23 08:23:35.667040-0500 R[45754:1283995]
>>>>>>>>>>> apple_glx_drawable_create: new drawable 0x107648000
>>>>>>>>>>> Debug     ../src/glx/apple/apple_glx_surface.c:154
>>>>>>>>>>> create_surface(4295810496): create_surface: created a surface for
>>>>>>>>>>> drawable 0x6000c9 with uid 629
>>>>>>>>>>> 2021-02-23 08:23:35.669119-0500 R[45754:1283995] create_surface:
>>>>>>>>>>> created a surface for drawable 0x6000c9 with uid 629
>>>>>>>>>>> Debug     ../src/glx/apple/apple_glx_surface.c:69
>>>>>>>>>>> surface_make_current(4295810496): surface_make_current:
>>>>>>>>>>> ac->context_obj 0x1058c4800 s->surface_id 13
>>>>>>>>>>> 2021-02-23 08:23:35.669195-0500 R[45754:1283995] 
>>>>>>>>>>> surface_make_current:
>>>>>>>>>>> ac->context_obj 0x1058c4800 s->surface_id 13
>>>>>>>>>>> error: xp_attach_gl_context returned: 2
>>>>>>>>>>> Debug     ../src/glx/applegl_glx.c:60
>>>>>>>>>>> applegl_bind_context(4295810496): applegl_bind_context: error YES
>>>>>>>>>>> 2021-02-23 08:23:35.669834-0500 R[45754:1283995] 
>>>>>>>>>>> applegl_bind_context:
>>>>>>>>>>> error YES
>>>>>>>>>>>
>>>>>>>>>>> and then I get my own messages from the failure of 
>>>>>>>>>>> glXMakeCurrent().
>>>>>>>>>>>   As far as I can see, everything appears fine until the call to
>>>>>>>>>>> xp_attach_gl_context.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Everything looks very similar up to the failure of
>>>>>>>>>>> xp_attach_gl_context.  Any idea I why the value returned a 
>>>>>>>>>>> few lines
>>>>>>>>>>> earlier from apple_glx_create_context() should be a bad value?
>>>>>>>>>>>
>>>>>>>>>>> Duncan Murdoch
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>
>>
>> _______________________________________________
>> Xquartz-dev mailing list
>> Xquartz-dev at lists.macosforge.org <mailto:Xquartz-dev at lists.macosforge.org>
>> https://lists.macosforge.org/mailman/listinfo/xquartz-dev
>>
> 
> 
> 
> _______________________________________________
> Xquartz-dev mailing list
> Xquartz-dev at lists.macosforge.org
> https://lists.macosforge.org/mailman/listinfo/xquartz-dev
> 



More information about the Xquartz-dev mailing list