Skip to content

NULL deref + Double Py_DECREF in pyexpat ExternalEntityParserCreate() #144984

@raminfp

Description

@raminfp

Bug report

Bug description:

When ExternalEntityParserCreate() hits an error path (allocation failure), Py_DECREF(new_parser) is called on a partially-initialized object. This has two problems:

  1. new_parser->handlers is NULL (set to 0 at line 1094), so xmlparse_dealloc to clear_handlers dereferences NULL to SEGV
  2. new_parser->parent holds a strong ref to self, so Py_DECREF(new_parser) already decrements self via Py_CLEAR(parent) in dealloc. The explicit Py_DECREF(self) on the next line is a double decrement.

All three error paths in the function have the same issue (lines 1100-1101, 1106-1107, 1120-1121).

Modules/pyexpat.c, pyexpat_xmlparser_ExternalEntityParserCreate_impl():

Py_INCREF(self);                                    // line 1082

new_parser->parent = (PyObject *)self;              // line 1093
new_parser->handlers = 0;                           // line 1094 - NULL

// error path (e.g. buffer malloc failure):
if (new_parser->buffer == NULL) {
    Py_DECREF(new_parser);                          // line 1100 => dealloc
                                                    //   clear_handlers(NULL) => SEGV
                                                    //   Py_CLEAR(parent) => Py_DECREF(self) [1st]
    Py_DECREF(self);                                // line 1101 [2nd, over-decrement]
    return PyErr_NoMemory();
}

Uses _testcapi.set_nomemory() to force an allocation failure inside ExternalEntityParserCreate():

import xml.parsers.expat
import _testcapi

parser = xml.parsers.expat.ParserCreate()
parser.buffer_text = True

_testcapi.set_nomemory(1, 10)
sub = parser.ExternalEntityParserCreate(None)

With an ASAN build (./configure --with-address-sanitizer --with-pydebug):

./python pyexpat.py

AddressSanitizer:DEADLYSIGNAL
=================================================================
==43632==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7d907b981b73 bp 0x7ffecf69fa50 sp 0x7ffecf69fa30 T0)
==43632==The signal is caused by a READ memory access.
==43632==Hint: address points to the zero page.
    #0 0x7d907b981b73 in clear_handlers Modules/pyexpat.c:2497
    #1 0x7d907b981c2c in xmlparse_clear Modules/pyexpat.c:1548
    #2 0x7d907b989554 in xmlparse_dealloc Modules/pyexpat.c:1562
    #3 0x5c347de9f287 in _Py_Dealloc Objects/object.c:3271
    #4 0x7d907b981994 in Py_DECREF Include/refcount.h:403
    #5 0x7d907b9878fd in pyexpat_xmlparser_ExternalEntityParserCreate_impl Modules/pyexpat.c:1106
    #6 0x7d907b987fe5 in pyexpat_xmlparser_ExternalEntityParserCreate Modules/clinic/pyexpat.c.h:313
    #7 0x5c347ddfb9f2 in method_vectorcall_FASTCALL_KEYWORDS_METHOD Objects/descrobject.c:381
    #8 0x5c347dddbd0c in _PyObject_VectorcallTstate Include/internal/pycore_call.h:136
    #9 0x5c347dddbdff in PyObject_Vectorcall Objects/call.c:327
    #10 0x5c347e059777 in _Py_VectorCallInstrumentation_StackRefSteal Python/ceval.c:769
    #11 0x5c347e069714 in _PyEval_EvalFrameDefault Python/generated_cases.c.h:1817
    #12 0x5c347e0a0961 in _PyEval_EvalFrame Include/internal/pycore_ceval.h:118
    #13 0x5c347e0a0cc7 in _PyEval_Vector Python/ceval.c:2132
    #14 0x5c347e0a0f7d in PyEval_EvalCode Python/ceval.c:680
    #15 0x5c347e1a4631 in run_eval_code_obj Python/pythonrun.c:1366
    #16 0x5c347e1a4977 in run_mod Python/pythonrun.c:1469
    #17 0x5c347e1a58ac in pyrun_file Python/pythonrun.c:1294
    #18 0x5c347e1a86e2 in _PyRun_SimpleFileObject Python/pythonrun.c:518
    #19 0x5c347e1a898e in _PyRun_AnyFileObject Python/pythonrun.c:81
    #20 0x5c347e1fd936 in pymain_run_file_obj Modules/main.c:410
    #21 0x5c347e1fdba3 in pymain_run_file Modules/main.c:429
    #22 0x5c347e1ff3a1 in pymain_run_python Modules/main.c:691
    #23 0x5c347e1ffa37 in Py_RunMain Modules/main.c:772
    #24 0x5c347e1ffc23 in pymain_main Modules/main.c:802
    #25 0x5c347e1fffa8 in Py_BytesMain Modules/main.c:826
    #26 0x5c347dc66675 in main Programs/python.c:15
    #27 0x7d907c42a3b7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #28 0x7d907c42a47a in __libc_start_main_impl ../csu/libc-start.c:360
    #29 0x5c347dc665a4 in _start (/home/raminfp/Projects/cpython/python+0x2ed5a4) (BuildId: a8bec32f918132a019758b07f370e97d9e763c6f)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV Modules/pyexpat.c:2497 in clear_handlers
==43632==ABORTING

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions