Skip to content

[Python] Set memory policy to "strict"#13593

Open
guitargeek wants to merge 2 commits intoroot-project:masterfrom
guitargeek:pyroot_memory_policy_1
Open

[Python] Set memory policy to "strict"#13593
guitargeek wants to merge 2 commits intoroot-project:masterfrom
guitargeek:pyroot_memory_policy_1

Conversation

@guitargeek
Copy link
Contributor

@guitargeek guitargeek commented Sep 4, 2023

The ROOT Python interfaces have many memory leaks, which is a major pain point for people using it for long-running scripts in batch jobs.

One source of memory leaks was indentified to be the "heuristic memory policy" of cppyy. This means that cppyy assumes that every non-const pointer member function argument was interpreted as the object taking ownership if the argument.

For examle, take the non-owning RooLinkedList container. It has a RooLinkedList::Add(RooAbsArg *arg) method. ROOT wrongly assumes that this means the RooLinkedList takes ownership of arg, and it drops the ROOT ownership. Nobody feels responsible for deleting the object anymore, and there is a memory leak or arg.

That particular leak was reported in this forum post:
https://root-forum.cern.ch/t/memory-leak-in-fits/56249

Function parameters of type T * are very common in ROOT, and only rarely do they imply ownership transfer. So changing the memory policy to "strict" would surely fix also many other memory leaks that are not reported so far. In fact, upstream cppyy doesn't even enable this heuristic memory policy anymore by default! So moving ROOT also to the strict memory policy closes the gap between ROOT and cppyy.

The potential drawback of this change are crashes in usercode if memory is not properly managed. But these problems should either be fixed by:

  • the user

  • dedicated pythonizations for these methods to manage shared ownership via Python reference counters (i.e., setting the parameter as an attribute of the object that the member function was called on)

This follows up on PR #4294, in particular it reverts guitargeek@3a12063.

Note that owning TCollection, TSeqCollection, and TList instances are Pythonized to preserve the old behavior of dropping Python ownership of added elements, as that was the case where the memory heuristic was correct.

@guitargeek guitargeek self-assigned this Sep 4, 2023
@guitargeek guitargeek changed the title [PyROOT] Set memory policy to strict [PyROOT] Set memory policy to "strict" Sep 4, 2023
@guitargeek guitargeek force-pushed the pyroot_memory_policy_1 branch from 7b2132c to a2fe3d5 Compare September 4, 2023 13:06
@guitargeek guitargeek force-pushed the pyroot_memory_policy_1 branch from a2fe3d5 to f3aa24f Compare September 4, 2023 16:26
@root-project root-project deleted a comment from phsft-bot Sep 4, 2023
@root-project root-project deleted a comment from phsft-bot Sep 4, 2023
@root-project root-project deleted a comment from phsft-bot Sep 4, 2023
@root-project root-project deleted a comment from phsft-bot Sep 4, 2023
@root-project root-project deleted a comment from phsft-bot Sep 4, 2023
@root-project root-project deleted a comment from phsft-bot Sep 4, 2023
@root-project root-project deleted a comment from phsft-bot Sep 4, 2023
@root-project root-project deleted a comment from phsft-bot Sep 4, 2023
@root-project root-project deleted a comment from phsft-bot Sep 4, 2023
@root-project root-project deleted a comment from phsft-bot Sep 4, 2023
@root-project root-project deleted a comment from phsft-bot Sep 4, 2023
@root-project root-project deleted a comment from phsft-bot Sep 4, 2023
@root-project root-project deleted a comment from phsft-bot Sep 4, 2023
@root-project root-project deleted a comment from phsft-bot Sep 4, 2023
@guitargeek guitargeek force-pushed the pyroot_memory_policy_1 branch from f3aa24f to b147011 Compare September 4, 2023 22:48
@guitargeek guitargeek force-pushed the pyroot_memory_policy_1 branch 2 times, most recently from b7f0c8f to 4eadd14 Compare September 5, 2023 08:40
@root-project root-project deleted a comment from phsft-bot Sep 5, 2023
@root-project root-project deleted a comment from phsft-bot Sep 5, 2023
@root-project root-project deleted a comment from phsft-bot Feb 12, 2024
@guitargeek guitargeek force-pushed the pyroot_memory_policy_1 branch from 2c1dc5d to fe2b100 Compare February 12, 2024 17:26
@root-project root-project deleted a comment from phsft-bot Feb 12, 2024
@root-project root-project deleted a comment from phsft-bot Feb 12, 2024
@guitargeek
Copy link
Contributor Author

Hi @vepadulano, thanks a lot for your review! I have updated the PR and the associated roottest PR

@guitargeek guitargeek force-pushed the pyroot_memory_policy_1 branch from fe2b100 to 819ccd4 Compare August 27, 2024 10:31
@root-project root-project deleted a comment from phsft-bot Aug 27, 2024
@couet couet removed their request for review December 4, 2024 09:01
@guitargeek guitargeek force-pushed the pyroot_memory_policy_1 branch from 819ccd4 to bb5d158 Compare January 2, 2025 08:36
@dpiparo dpiparo closed this Jan 2, 2025
@dpiparo dpiparo reopened this Jan 2, 2025
@dpiparo
Copy link
Member

dpiparo commented Jan 2, 2025

restarted all builds due to the issues with the server serving files for the tests.

Copy link
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm coming back to this old PR, I still agree with my previous statement that this is the right direction, but I see more potential misunderstandings and issues creeping up. I'm thinking about approaches we could employ to mitigate them, e.g. by running also these changes in CMSSW CI, or by thinking about more Pythonizations to include (e.g. in TList), or by thinking about if we can add more warnings for users in situations that lead to dangling pointers.

Also, a question that becomes more important now, are all of these changes still required in light of the recent and future improvements to the cppyy we use?

Comment on lines +98 to +101
**Note:** You can change back to the old policy by calling
`ROOT.SetMemoryPolicy(ROOT.kMemoryHeuristics)` after importing ROOT, but this
should be only used for debugging purposes and this function might be removed
in the future!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this perhaps explicit at usage by also raising a warning from the function when calling it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We learned from the ROOT workshop that our users also prefer clear deprecation and removal schedules. Should we aim for removing ROOT.SetMemoryPolicy in 6.42? If people really want to change the policy, they can still use the Cppyy API after the removal of the ROOT wrapper though, so there is an escape hatch that makes the removal seem a bit less drastic. Of course we don't mention that in the release notes 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a clear deprecation and removal warning to the release notes and to ROOT.SetMemoryPolicy() in the latest iteration of this PR.

@guitargeek
Copy link
Contributor Author

Hi @vepadulano, thanks for coming back to this! Let's try to make progress then and merge this soon 🙂

I'm coming back to this old PR, I still agree with my previous statement that this is the right direction, but I see more potential misunderstandings and issues creeping up. I'm thinking about approaches we could employ to mitigate them, e.g. by running also these changes in CMSSW CI, or by thinking about more Pythonizations to include (e.g. in TList), or by thinking about if we can add more warnings for users in situations that lead to dangling pointers.

Good points. I'll address them once we see where we stand with a fresh CI run.

Also, a question that becomes more important now, are all of these changes still required in light of the recent and future improvements to the cppyy we use?

The upcoming cppyy rebase on libinterop is more happening in the backend, and we have no frontend development lined up that change the behavior of these memory heuristics. But what we should consider is that it's better to not change both backend and frontend too much in one release, because it makes potential regressions harder to fix. If we assume that the backend upgrade will hit in 6.42, then this means the best time to update the memory policy is now for 6.40. Otherwise, it would be wiser to wait for 6.44, but this is quite far away.

@guitargeek
Copy link
Contributor Author

Hi @vepadulano, I have suggested a TList Pythonization now that might address your concerns.

The proposal is that when adding to owning TCollections (checked with IsOwner()), the ownership is always relinquished on the Python side, just like the old memory heuristics implied.

So that means that the behavior change is only for non-owning TCollections, which now add borrowed references instead of letting the elements leak when added from Python.

Is that an acceptable compromise? (no need to rush an answer, I still have to fix test failures with Python 3.14 for Fedora Rawhide 🙂 )

Copy link
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I like the direction, I left a few more comments/questions.

@guitargeek
Copy link
Contributor Author

So from the latest changes I guess that the default behaviour of TList is that it owns its elements, do you agree?

That's true only for lists returned by this create_tseqcollection function in the test, where it called SetOwner(True) inside. So one has to call SetOwner(False) afterwards to get a non-owning collection.

I agree that this is confusing. I'll make some improvement suggestions.

@guitargeek
Copy link
Contributor Author

Okay I have figured out a simpler way to manage the object ownership in these T(Seq)Collection tests, which is flexible enough so it can also be spun of into a separate PR:

@guitargeek
Copy link
Contributor Author

Hi @vepadulano! This PR is greatly simplified now, because I factored the technical changes beyond the one line that changes the memory policy out into separate PRs that are merged now.

Once our own CI is green, I'll request a CMSSW test run and then we can proceed here from my side.

Clear the TList at the end of the `test_delitem` unit test in
tseqcollection_itemaccess.py to make sure the contained list elements
are still alive when the list is cleared. Otherwise, the elements might
be deleted before the containing list, depending on the order or garbage
collection.
The ROOT Python interfaces have many memory leaks, which is a major pain
point for people using it for long-running scripts in batch jobs.

One source of memory leaks was indentified to be the "heuristic memory
policy" of cppyy. This means that cppyy assumes that every non-const
pointer member function argument was interpreted as the object taking
ownership if the argument.

For examle, take the non-owning RooLinkedList container. It has a
`RooLinkedList::Add(RooAbsArg *arg)` method. ROOT wrongly assumes that
this means the RooLinkedList takes ownership of arg, and it drops the
ROOT overship. Nobody feels responsible for deleting the object
anymore, and there is a memory leak or `arg`.

That particular leak was reported in this forum post:
https://root-forum.cern.ch/t/memory-leak-in-fits/56249

Function parameters of type `T *` are very common in ROOT, and only
rarely do they imply ownership transfer. So changing the memory policy
to "strict" would surely fix also many other memory leaks that are not
reported so far. In fact, upstream cppyy doesn't even enable this
heuristic memory policy anymore by default! So moving ROOT also to the
strict memory policy closes the gap between ROOT and cppyy.

The potential drawback of this change are crashes in usercode if memory
is not properly managed. But these problems should either be fixed by:

  * the user

  * dedicated pythonizations for these methods to manage shared
    ownership via Python reference counters (i.e., setting the parameter
    as an attribute of the object that the member function was called
    on)

This follows up on PR root-project#4294, in particular it reverts 3a12063.

Note that owning **TCollection**, **TSeqCollection**, and **TList**
instances are Pythonized to preserve the old behavior of dropping Python
ownership of added elements, as that was the case where the memory
heuristic was correct.
@guitargeek
Copy link
Contributor Author

Hi @smuzaffar, could you please test this PR with CMSSW? It's a potentially fragile change of behavior in our Python interface. Thanks!

Copy link
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Although the changes are much fewer now, I would still run the CI with a clean build and with the Python wheel workflow.

Comment on lines +90 to +94
msg = """ROOT.SetHeuristicMemoryPolicy() is deprecated and will be removed in ROOT 6.42.
Since ROOT 6.40, the heuristic memory policy is disabled by default, and with
ROOT 6.42 it won't be possible to re-enable it with ROOT.SetHeuristicMemoryPolicy(),
which was only meant to be used during a transition period and will be removed.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should not force a too short deprecation period, following multiple suggestions from the ROOT workshop. 2 ROOT releases should probably be the minimum, i.e. 6.44

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we need to keep track of when this should be removed either internally or by a condition to turn the warning into an error based on the ROOT version here.

```Python
obj = ROOT.MyClass()
my_instance.foo(obj)
del obj # C++ may still hold a pointer: dangling reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be more explicit here, it's not clear what the problem is in practice.

Suggested change
del obj # C++ may still hold a pointer: dangling reference
del obj # C++ may still hold a pointer: dangling reference. When the Python object is deleted, the memory associated with the C++ object is also freed. But the C++ side may want to delete the object as well, for instance if the destructor of the class of `my_instance` also calls `delete obj`.


1. **Keep the Python object alive**

Assign the object to a Python variable that lives at least as long as the C++ reference.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I would show an example of a Pythonization lifeline


after importing ROOT.

This option is intended **for debugging only and will be removed in ROOT 6.42**.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I would be more relaxed with the deprecation period.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants