[Xquartz-dev] Help requested debugging rgl under XQuartz

Jeremy Huddleston Sequoia jeremyhu at apple.com
Wed Feb 24 21:31:49 PST 2021


Yeah, if gldAttachDrawable does gpuiReleaseDrawable, IOAccelGLContextClearDrawable, and then returns the error on its error out paths.

gldAttachDrawable takes the drawable type as the second argument.  The types are:

none (0)
pbuffer  (90)
window (80)
offscreen (53)
fullscreen (54)


When gldAttachDrawable() is called with type none, it looks to me like it should just do its error out path (release / clear / return error).  I don't see how it would call gfxIODataBindSurface in that case.  Of course, I'm not necessarilary looking at the exact same implementation as is on your system.  Can you provide the output of `image list` from lldb, so I can see the UUIDs of the various dylibs to look at the exact source version?

In any event, we need to figure out why we're getting a type of none into gldAttachDrawable.

gliAttachDrawableWithOptions takes a type and passes it straight through, so it's not that.

CGLSetSurface just takes a context, connection id, window ID, and surface ID.

The context is passed straight from the input to xp_attach_gl_context, and the xp_surface_id input to xp_attach_gl_context maps to the wid / sid passed to CGLSetSurface.

---

Would you be able to reduce this issue to a very small X11 + GLX application that I could use to debug deeper myself?

> On Feb 24, 2021, at 12:38, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
> 
> On 24/02/2021 3:10 p.m., Duncan Murdoch wrote:
>> The only call it makes is to libGFXShared.dylib`gfxIODataBindSurface,
>> and when it returns from that it jumps to the error exit.
>> Inside that function, it checks whether a pointer is non-null, then uses
>> it to jump to libGPUSupportMercury.dylib`gldAttachDrawable.
>> In gldAttachDrawable, it looks like it is detecting something wrong,
>> then it calls gpuiReleaseDrawable, IOAccelGLContextClearDrawable, and
>> then returns the 0x2715 = 10005 = kCGLBadDrawable value.
>> I don't have the source (do I?), and I don't know the argument passing
>> conventions.  Can you tell me how type would be passed in?
> 
> I've found some info that says the 2nd argument would be passed in RSI.  If that's the case, then what I'm seeing is the following:
> 
> - When things are working properly, the value 0x50 = 80 is passed in several times.
> 
> - After calling quartz(), the value is 0, and the error is triggered.
> 
> Duncan Murdoch
> 
>> I don't think we get to either of the other functions.
>> Duncan Murdoch
>> On 24/02/2021 12:59 p.m., Jeremy Huddleston Sequoia wrote:
>>> IOAccelGLContextClearDrawable is called on the error-out path of that function, so yeah, we need to see how we got there.
>>> 
>>> enum32_t gldAttachDrawable(GLDContext ctx, enum32_t type, const GLDDrawable drawable, bitfield32_t options, GLTDimensions *size_ret)
>>> 
>>> Can you tell me what the type is here?
>>> 
>>> Do we get to IOAccelGLContextSetDrawable()?  If so, what does it return?
>>> Do we get to gpulUpdateDrawableDepth()?  If so, what does it return?
>>> 
>>> 
>>>> On Feb 24, 2021, at 09:46, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
>>>> 
>>>> Yes, I did get it wrong.  It looks like the error was detected before the call to IOAccelGLContextClearDrawable and the stack checking.  I'll see if I can figure out where.
>>>> 
>>>> Duncan Murdoch
>>>> 
>>>> On 24/02/2021 12:11 p.m., Duncan Murdoch wrote:
>>>>> I don't see any calls to __stack_chk_fail .  It's possible I
>>>>> misinterpreted what was going on after the IOAccelGLContextClearDrawable
>>>>> call.  I'll take another look.
>>>>> Duncan Murdoch
>>>>> On 24/02/2021 11:41 a.m., Jeremy Huddleston Sequoia wrote:
>>>>>> __stack_chk_guard is part of stack protector.
>>>>>> 
>>>>>> If it's not liking the value in __stack_chk_guard, it means the stack
>>>>>> was smashed.
>>>>>> 
>>>>>> When this is detected, the compiler runtime should
>>>>>> call __stack_chk_fail() if implemented or abort if not.  Given that
>>>>>> we're not crashing, I wonder if there's a handler somewhere that ends up
>>>>>> causing us to return the bad value instead of crashing.
>>>>>> 
>>>>>> Can you break on __stack_chk_fail and see if that gives us anything useful?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Feb 24, 2021, at 06:26, Duncan Murdoch <murdoch.duncan at gmail.com
>>>>>>> <mailto:murdoch.duncan at gmail.com>> wrote:
>>>>>>> 
>>>>>>> Tracing in with lldb, it appears to be this sequence of calls leading
>>>>>>> to the 10005 error value:
>>>>>>> 
>>>>>>> r
>>>>>>>   * frame #0: 0x00007fff5afc19e0
>>>>>>> libGPUSupportMercury.dylib`gldAttachDrawable + 1
>>>>>>>     frame #1: 0x00007fff4467f396 GLEngine`gliAttachDrawableWithOptions
>>>>>>> + 251
>>>>>>>     frame #2: 0x00007fff4465d9f5
>>>>>>> OpenGL`___lldb_unnamed_symbol40$$OpenGL + 972
>>>>>>>     frame #3: 0x00007fff446618e2
>>>>>>> OpenGL`___lldb_unnamed_symbol59$$OpenGL + 82
>>>>>>>     frame #4: 0x00007fff44661c29 OpenGL`CGLSetSurface + 330
>>>>>>>     frame #5: 0x00007fff70c6ca63
>>>>>>> libXplugin.1.dylib`xp_attach_gl_context + 95
>>>>>>>     frame #6: 0x0000000108590dee libGL.1.dylib`surface_make_current + 206
>>>>>>>     frame #7: 0x000000010858df6a
>>>>>>> libGL.1.dylib`apple_glx_make_current_context + 1274
>>>>>>>     frame #8: 0x0000000108574579 libGL.1.dylib`applegl_bind_context + 185
>>>>>>>     frame #9: 0x000000010856237e libGL.1.dylib`MakeContextCurrent + 414
>>>>>>>     frame #10: 0x00000001085621d9 libGL.1.dylib`glXMakeCurrent + 41
>>>>>>> 
>>>>>>> 
>>>>>>> The libGPUSupportMercury.dylib`gldAttachDrawable function calls
>>>>>>> 
>>>>>>> IOAccelGLContextClearDrawable
>>>>>>> 
>>>>>>> then does some sort of check of __stack_chk_guard and doesn't like
>>>>>>> what it sees, and sets the error.
>>>>>>> 
>>>>>>> Does this give any hint about what's wrong, or a way to fix it?
>>>>>>> 
>>>>>>> Duncan Murdoch
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 23/02/2021 4:31 p.m., Duncan Murdoch wrote:
>>>>>>>> On 23/02/2021 3:47 p.m., Jeremy Huddleston Sequoia wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Feb 23, 2021, at 06:14, Duncan Murdoch <murdoch.duncan at gmail.com
>>>>>>>>>> <mailto:murdoch.duncan at gmail.com>
>>>>>>>>>> <mailto:murdoch.duncan at gmail.com
>>>>>>>>>> <mailto:murdoch.duncan at gmail.com>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> On 23/02/2021 12:47 a.m., Jeremy Huddleston Sequoia wrote:
>>>>>>>>>>>> On Feb 22, 2021, at 14:38, Duncan Murdoch
>>>>>>>>>>>> <murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>
>>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com
>>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com>>
>>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>
>>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com
>>>>>>>>>>>> <mailto:murdoch.duncan at gmail.com>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> I've made a little bit of progress.
>>>>>>>>>>>> 
>>>>>>>>>>>> The message "error: xp_attach_gl_context returned: 2" comes from the
>>>>>>>>>>>> Mesa routine surface_make_current, which calls xp_attach_gl_context.
>>>>>>>>>>>>   I haven't found where xp_attach_gl_context is defined.
>>>>>>>>>>> xp_attach_gl_context is in libXplugin (check Xplugin.h in the SDK).
>>>>>>>>>>> 2 is XP_BadValue, which is returned if cgl_ctx is NULL... so I'd
>>>>>>>>>>> suggest looking into why mesa is calling xp_attach_gl_context with a
>>>>>>>>>>> NULL context.
>>>>>>>>>> 
>>>>>>>>>> Thanks, that's helpful.  The context is not NULL, so I need to think
>>>>>>>>>> of other ways it could be "bad".
>>>>>>>>> 
>>>>>>>>> Ok, well xp_attach_gl_context is just a wrapper around CGLSetSurface(),
>>>>>>>>> which is an internal function to do exactly what we're trying to do
>>>>>>>>> here.  If it returns any error, xp_attach_gl_context returns bad value.
>>>>>>>>> 
>>>>>>>>> Are you able to capture this in the debugger and figure out what the
>>>>>>>>> return value from CGLSetSurface() is?  That will tell us what the
>>>>>>>>> underlying CGLError is, which might help shed some light on this.
>>>>>>>> I believe it's returning  0x0000000000002715 when there's an error.
>>>>>>>> That's 10005, kCGLBadDrawable.  So now I need to find out what happened
>>>>>>>> to the drawable.
>>>>>>>> This feels like progress!  Thanks again.
>>>>>>>> Duncan
>>>>>>>>> 
>>>>>>>>>> Here's what I see with LIBGL_DIAGNOSTIC=1.  For a successful open,
>>>>>>>>>> 
>>>>>>>>>>> rgl.open()
>>>>>>>>>> function is no-op
>>>>>>>>>> Debug     ../src/glx/apple/apple_glx_context.c:205
>>>>>>>>>> apple_glx_create_context(4295810496): apple_glx_create_context: ac
>>>>>>>>>> 0x100a10a00 ac->context_obj 0x107cdce00
>>>>>>>>>> 2021-02-23 08:23:00.041711-0500 R[45754:1283995]
>>>>>>>>>> apple_glx_create_context: ac 0x100a10a00 ac->context_obj 0x107cdce00
>>>>>>>>>> Debug     ../src/glx/apple/apple_glx_drawable.c:342
>>>>>>>>>> apple_glx_drawable_create(4295810496): apple_glx_drawable_create: new
>>>>>>>>>> drawable 0x107ce0e00
>>>>>>>>>> 2021-02-23 08:23:00.042235-0500 R[45754:1283995]
>>>>>>>>>> apple_glx_drawable_create: new drawable 0x107ce0e00
>>>>>>>>>> Debug     ../src/glx/apple/apple_glx_surface.c:154
>>>>>>>>>> create_surface(4295810496): create_surface: created a surface for
>>>>>>>>>> drawable 0x600066 with uid 621
>>>>>>>>>> 2021-02-23 08:23:00.044773-0500 R[45754:1283995] create_surface:
>>>>>>>>>> created a surface for drawable 0x600066 with uid 621
>>>>>>>>>> Debug     ../src/glx/apple/apple_glx_surface.c:69
>>>>>>>>>> surface_make_current(4295810496): surface_make_current:
>>>>>>>>>> ac->context_obj 0x107cdce00 s->surface_id 9
>>>>>>>>>> 2021-02-23 08:23:00.044839-0500 R[45754:1283995] surface_make_current:
>>>>>>>>>> ac->context_obj 0x107cdce00 s->surface_id 9
>>>>>>>>>> Debug     ../src/glx/apple/apple_glx_surface.c:89
>>>>>>>>>> surface_make_current(4295810496): surface_make_current: drawable
>>>>>>>>>> 0x600066
>>>>>>>>>> 2021-02-23 08:23:00.045680-0500 R[45754:1283995] surface_make_current:
>>>>>>>>>> drawable 0x600066
>>>>>>>>>> ... (more lines deleted)
>>>>>>>>>> 
>>>>>>>>>> After I run quartz(), I see this:
>>>>>>>>>> 
>>>>>>>>>>> rgl.open()
>>>>>>>>>> Debug     ../src/glx/apple/apple_glx_context.c:205
>>>>>>>>>> apple_glx_create_context(4295810496): apple_glx_create_context: ac
>>>>>>>>>> 0x10262bb00 ac->context_obj 0x1058c4800
>>>>>>>>>> 2021-02-23 08:23:35.666675-0500 R[45754:1283995]
>>>>>>>>>> apple_glx_create_context: ac 0x10262bb00 ac->context_obj 0x1058c4800
>>>>>>>>>> Debug     ../src/glx/apple/apple_glx_drawable.c:342
>>>>>>>>>> apple_glx_drawable_create(4295810496): apple_glx_drawable_create: new
>>>>>>>>>> drawable 0x107648000
>>>>>>>>>> 2021-02-23 08:23:35.667040-0500 R[45754:1283995]
>>>>>>>>>> apple_glx_drawable_create: new drawable 0x107648000
>>>>>>>>>> Debug     ../src/glx/apple/apple_glx_surface.c:154
>>>>>>>>>> create_surface(4295810496): create_surface: created a surface for
>>>>>>>>>> drawable 0x6000c9 with uid 629
>>>>>>>>>> 2021-02-23 08:23:35.669119-0500 R[45754:1283995] create_surface:
>>>>>>>>>> created a surface for drawable 0x6000c9 with uid 629
>>>>>>>>>> Debug     ../src/glx/apple/apple_glx_surface.c:69
>>>>>>>>>> surface_make_current(4295810496): surface_make_current:
>>>>>>>>>> ac->context_obj 0x1058c4800 s->surface_id 13
>>>>>>>>>> 2021-02-23 08:23:35.669195-0500 R[45754:1283995] surface_make_current:
>>>>>>>>>> ac->context_obj 0x1058c4800 s->surface_id 13
>>>>>>>>>> error: xp_attach_gl_context returned: 2
>>>>>>>>>> Debug     ../src/glx/applegl_glx.c:60
>>>>>>>>>> applegl_bind_context(4295810496): applegl_bind_context: error YES
>>>>>>>>>> 2021-02-23 08:23:35.669834-0500 R[45754:1283995] applegl_bind_context:
>>>>>>>>>> error YES
>>>>>>>>>> 
>>>>>>>>>> and then I get my own messages from the failure of glXMakeCurrent().
>>>>>>>>>>   As far as I can see, everything appears fine until the call to
>>>>>>>>>> xp_attach_gl_context.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Everything looks very similar up to the failure of
>>>>>>>>>> xp_attach_gl_context.  Any idea I why the value returned a few lines
>>>>>>>>>> earlier from apple_glx_create_context() should be a bad value?
>>>>>>>>>> 
>>>>>>>>>> Duncan Murdoch
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> 
> 
> 
> _______________________________________________
> Xquartz-dev mailing list
> Xquartz-dev at lists.macosforge.org
> https://lists.macosforge.org/mailman/listinfo/xquartz-dev
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.macosforge.org/pipermail/xquartz-dev/attachments/20210224/83e888f8/attachment-0001.htm>


More information about the Xquartz-dev mailing list