Skip to content

[gui] reenable fit panel testing and fix nullptr access#21882

Open
ferdymercury wants to merge 10 commits intoroot-project:masterfrom
ferdymercury:patch-20
Open

[gui] reenable fit panel testing and fix nullptr access#21882
ferdymercury wants to merge 10 commits intoroot-project:masterfrom
ferdymercury:patch-20

Conversation

@ferdymercury
Copy link
Copy Markdown
Collaborator

@ferdymercury ferdymercury commented Apr 10, 2026

This Pull request:

Changes or fixes:

enables old test based on Makefiles by migrating to CMake

step towards #9090

Checklist:

  • tested changes locally
  • updated the docs (if necessary)

@ferdymercury ferdymercury changed the title [asimage] prevent pointing to invalid memory [gui,asimage] prevent pointing to invalid memory Apr 10, 2026
Comment thread graf2d/asimage/src/TASImage.cxx Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 10, 2026

Test Results

    22 files      22 suites   3d 7h 50m 19s ⏱️
 3 839 tests  3 838 ✅  1 💤 0 ❌
75 777 runs  75 759 ✅ 18 💤 0 ❌

Results for commit a7898cf.

♻️ This comment has been updated with latest results.

Comment thread gui/gui/src/TGGC.cxx Outdated
@ferdymercury ferdymercury marked this pull request as ready for review April 13, 2026 08:55
@linev
Copy link
Copy Markdown
Member

linev commented Apr 13, 2026

I checkout your branch and try to run test.
There are many issues with normal GUI mode.

  1. Last two tests TestTree2D and TestTreeND do not work for me - while fit panel does not have entries like "gaus2d" or "gausND".
  2. Also running test in batch possible only with root -b --web=off UnitTesting.cxx. Produced test executable crashed much earlier because TGClient::Instance() returns nullptr
  3. When run inside root, one sees lot of errors about missing images.

Comment thread graf2d/asimage/src/TASImage.cxx Outdated
@ferdymercury
Copy link
Copy Markdown
Collaborator Author

  • Last two tests TestTree2D and TestTreeND do not work for me - while fit panel does not have entries like "gaus2d" or "gausND".

Yes, same here, I had deactivated (commented) locally those old tests for the moment and was focusing on whether it was running at all without crashing.

  • Also running test in batch possible only with root -b --web=off UnitTesting.cxx. Produced test executable crashed much earlier because TGClient::Instance() returns nullptr

Weird, where? I see a different behavior on Linux. It just crashes when deleting the TGTooltip at shutdown for me.

  • When run inside root, one sees lot of errors about missing images.

Same here, that's what I would say "expected", a lot of errors in batch mode of not being able to load icons, but no crashes.

@linev
Copy link
Copy Markdown
Member

linev commented Apr 14, 2026

Weird, where? I see a different behavior on Linux. It just crashes when deleting the TGTooltip at shutdown for me.

I mean when I run compiled executable.

  • When run inside root, one sees lot of errors about missing images.

Same here, that's what I would say "expected", a lot of errors in batch mode of not being able to load icons, but no crashes.

This is most challenging part - try to understand if there double deletion or just wrong pointer released.
Because of lot of errors I cannot exclude incomplete initialization of some GUI classes

@linev
Copy link
Copy Markdown
Member

linev commented Apr 14, 2026

@ferdymercury @bellenot

I probably found the reason why it crashes.
Seems to be it is wrongly set flag TGLabel::fHasOwnFont.
In batch it is incorrect and TGLabel destructor tries to free TGGC object which does not belong to the label.

Also side problem is that in TGGC::TGGC(GCValues_t *values) constructor ref counter not always set correctly.

Fixing this two problems makes test running for me

@linev
Copy link
Copy Markdown
Member

linev commented Apr 14, 2026

First step is #21907

@ferdymercury
Can you extract your changes in TASImage to separate PR.

After such two steps we will be able to enable this and may be other GUI tests in CI

@ferdymercury ferdymercury changed the title [gui,asimage] prevent pointing to invalid memory [gui] reenable fit panel testing and fix nullptr access Apr 14, 2026
Comment thread gui/gui/src/TGLabel.cxx Outdated
@linev
Copy link
Copy Markdown
Member

linev commented Apr 15, 2026

@ferdymercury

After #21907 is merged you can try rebase your code and see if now fit panel test can be enabled.

@linev
Copy link
Copy Markdown
Member

linev commented Apr 15, 2026

@ferdymercury

Please remove all changes in gui classes from this PR.

And at the end of UnitTesting.cxx one should have:

int UnitTesting()
{
   gROOT->SetWebDisplay("off");

   FitEditorUnitTesting fUT;

   return fUT.UnitTesting();
}

// The main function. It is VERY important that it is run using the
// TApplication.
int main(int argc, char** argv)
{
   TApplication theApp("App",&argc,argv);

   // force creation of client
   if (!TGClient::Instance())
      new TGClient();

   int ret = UnitTesting();

   theApp.Terminate();

   return ret;
}

And please check TestTree1D. It fails while reference values for fir parameters do not match.

Comment thread gui/fitpanel/test/UnitTesting.cxx Outdated
result += MakeTest("TestTree1D.........", &FitEditorUnitTesting::TestTree1D);

result += MakeTest("TestTree2D.........", &FitEditorUnitTesting::TestTree2D);
// result += MakeTest("TestTree2D.........", &FitEditorUnitTesting::TestTree2D);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented-out lines

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of coming back / reenabling these failing tests once we get the current test up and running on all platforms.
(It just needs the definition of a missing function or functor gausNd)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please mark them with a comment explaining that

Comment thread gui/fitpanel/test/UnitTesting.cxx Outdated
@linev
Copy link
Copy Markdown
Member

linev commented Apr 15, 2026

We have several platforms failing.

For Windows probably we should force batch mode. One need to add -b flag when start executable.

But several linux and mac platforms need to be investigated deeper

Comment thread gui/fitpanel/test/UnitTesting.cxx Outdated
Comment thread gui/fitpanel/test/UnitTesting.cxx Outdated
@bellenot
Copy link
Copy Markdown
Member

So adding gROOT->SetBatch(kTRUE); as:

int main(int argc, char** argv)
{
   gROOT->SetBatch(kTRUE);
   TApplication theApp("App",&argc,argv);

Gives:

C:\root-dev\build\x64\patch-20>ctest -VV -C RelWithDebInfo -R test-fitpanel-UnitTesting
UpdateCTestConfiguration  from :C:/root-dev/build/x64/patch-20/DartConfiguration.tcl
Parse Config file:C:/root-dev/build/x64/patch-20/DartConfiguration.tcl
Test project C:/root-dev/build/x64/patch-20
Constructing a list of tests
Ignore test: test-tcollex
Ignore test: roottest-root-rint-TabCom
Ignore test: roottest-root-rint-BackslashNewline
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 137
    Start 137: test-fitpanel-UnitTesting

137: Test command: "C:\Program Files\CMake\bin\cmake.exe" "-DCMD=C:/root-dev/build/x64/patch-20/gui/fitpanel/test/RelWithDebInfo/fitPanelUnitTesting.exe" "-DSYS=C:/root-dev/build/x64/patch-20" "-P" "C:/root-dev/git/patch-20/cmake/modules/RootTestDriver.cmake"
137: Working Directory: C:/root-dev/build/x64/patch-20/gui/fitpanel/test
137: Environment variables:
137:  ROOT_HIST=0
137: Test timeout computed to be: 1500
137: Info in <Minuit2>: MnSeedGenerator Computing seed using NumericalGradient calculator
137: Info in <Minuit2>: MnSeedGenerator Evaluated function and gradient in 24.8 ╬╝s
137: Info in <Minuit2>: MnSeedGenerator Initial state: FCN =       60.85761878 Edm =       2.042788606 NCalls =     23
137: Info in <Minuit2>: MnSeedGenerator Initial state
137:   Minimum value : 60.85761878
137:   Edm           : 2.042788606
137:   Internal parameters:     [    -0.8645583492      45.84274927     -13.32134506      13.80774266              0.2                1]
137:   Internal gradient  :     [     0.4275838346     0.4487206721     0.8008641365    -0.6674368442      53.51638712      153.2778371]
137:   Internal covariance matrix:
137: [[     0.30404743              0              0              0              0              0]
137:  [              0     0.16270541              0              0              0              0]
137:  [              0              0    0.027215553              0              0              0]
137:  [              0              0              0      1.9959967              0              0]
137:  [              0              0              0              0   0.0011646905              0]
137:  [              0              0              0              0              0  0.00016346709]]]
137: Info in <Minuit2>: VariableMetricBuilder Start iterating until Edm is < 2e-05 with call limit = 1780
137: Info in <Minuit2>: VariableMetricBuilder    0 - FCN =       60.85761878 Edm =       2.042788606 NCalls =     23
137: Info in <Minuit2>: VariableMetricBuilder    1 - FCN =       59.09847664 Edm =      0.1924298902 NCalls =     40
137: Info in <Minuit2>: VariableMetricBuilder    2 - FCN =       58.95990622 Edm =     0.01887091831 NCalls =     54
137: Info in <Minuit2>: VariableMetricBuilder    3 - FCN =        58.9375889 Edm =    0.009530429137 NCalls =     68
137: Info in <Minuit2>: VariableMetricBuilder    4 - FCN =       58.93183994 Edm =    0.001565170756 NCalls =     82
137: Info in <Minuit2>: VariableMetricBuilder    5 - FCN =       58.92877036 Edm =   0.0004383416895 NCalls =     96
137: Info in <Minuit2>: VariableMetricBuilder    6 - FCN =       58.92841786 Edm =    9.08224474e-07 NCalls =    110
137: Info in <Minuit2>: MnHesse Done after 47.5 ╬╝s
137: Info in <Minuit2>: VariableMetricBuilder After Hessian
137: Info in <Minuit2>: VariableMetricBuilder    7 - FCN =       58.92841786 Edm =   3.179417721e-05 NCalls =    150
137: Info in <Minuit2>: VariableMetricBuilder Tolerance not sufficient, continue minimization; Edm 3.17942e-05 Required 2e-05
137: Info in <Minuit2>: VariableMetricBuilder    8 - FCN =       58.92838606 Edm =   4.484031494e-13 NCalls =    163
137: Info in <Minuit2>: VariableMetricBuilder Stop iterating after 335.2 ╬╝s
137: Warning in <TASImage::GetMask>: No image
137: Warning in <TASImage::GetMask>: No image
137: Info in <Minuit2>: MnSeedGenerator Computing seed using NumericalGradient calculator
137: Info in <Minuit2>: MnSeedGenerator Evaluated function and gradient in 46 ╬╝s
137: Info in <Minuit2>: MnSeedGenerator Initial state: FCN =       213.8105565 Edm =       1.003103411 NCalls =      9
137: Info in <Minuit2>: MnSeedGenerator Initial state
137:   Minimum value : 213.8105565
137:   Edm           : 1.003103411
137:   Internal parameters:     [                1                2]
137:   Internal gradient  :     [     -6.856749983     -2.602027129]
137:   Internal covariance matrix:
137: [[           0.08              0]
137:  [              0    0.037103663]]]
137: Info in <Minuit2>: VariableMetricBuilder Start iterating until Edm is < 1e-05 with call limit = 1345
137: Info in <Minuit2>: VariableMetricBuilder    0 - FCN =       213.8105565 Edm =       1.003103411 NCalls =      9
137: Info in <Minuit2>: VariableMetricBuilder    1 - FCN =       212.8488423 Edm =    0.004903858932 NCalls =     14
137: Info in <Minuit2>: VariableMetricBuilder    2 - FCN =       212.8432651 Edm =   2.384901655e-05 NCalls =     20
137: Info in <Minuit2>: VariableMetricBuilder    3 - FCN =       212.8432411 Edm =   1.008891819e-08 NCalls =     25
137: Info in <Minuit2>: MnHesse Done after 53.5 ╬╝s
137: Info in <Minuit2>: VariableMetricBuilder After Hessian
137: Info in <Minuit2>: VariableMetricBuilder    4 - FCN =       212.8432411 Edm =    1.04445246e-08 NCalls =     37
137: Info in <Minuit2>: VariableMetricBuilder Stop iterating after 250.1 ╬╝s
137: Error in <TList::Clear>: A list is accessing an object (000002B4451CD940) already deleted (list name = TList)
137:
137: **STARTING TFitEditor Unit Tests**
137:
137: TestHistogramFit.............OK
137: TestGSLFit...................OK
137: TestUpdate...................OK
137: TestGraph....................OK
137: TestGraphError...............OK
137: TestGraph2D..................OK
137: TestGraph2DError.............OK
137: TestUpdateTree...............OK
137: i: 1, j: 0, e: 1, diff 0.239877
137: i: 2, j: 0, e: 1, diff 0.0332284
137: TestTree1D...................FAILED
137:
137: Remember to also check outputUnitTesting.txt for more detailed information
137:
1/1 Test #137: test-fitpanel-UnitTesting ........   Passed    1.04 sec

The following tests passed:
        test-fitpanel-UnitTesting

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   1.70 sec

But it's weird, we have 137: TestTree1D...................FAILED and 1/1 Test #137: test-fitpanel-UnitTesting ........ Passed 1.04 sec

@ferdymercury
Copy link
Copy Markdown
Collaborator Author

Error in TList::Clear: A list is accessing an object (000002B4451CD940) already deleted (list name = TList)
137:

I also saw this error on Linux, which I think is why the script sometimes crashes with free or munmap_chunk error.

@bellenot
Copy link
Copy Markdown
Member

OK, got it, it's the TCanvas fListOfSignals in TCanvas::Close() when emitting Closed() in (TQObject::EmitVA()). Looking how to fix that

@bellenot
Copy link
Copy Markdown
Member

BTW, the error is also here in non-batch mode. Can you check on Linux (or any non-Windows OS)?

@bellenot
Copy link
Copy Markdown
Member

Can you try the following:

diff --git a/gui/fitpanel/src/TFitEditor.cxx b/gui/fitpanel/src/TFitEditor.cxx
index 74a089bbdc9..99237d8d787 100644
--- a/gui/fitpanel/src/TFitEditor.cxx
+++ b/gui/fitpanel/src/TFitEditor.cxx
@@ -1256,7 +1256,8 @@ void TFitEditor::CloseWindow()

 void TFitEditor::Terminate()
 {
-   TQObject::Disconnect("TCanvas", "Closed()");
+   if (HasConnection("DoNoSelection()"))
+      TQObject::Disconnect("TCanvas", "Closed()");
    delete fgFitDialog;
    fgFitDialog = 0;
 }

@linev
Copy link
Copy Markdown
Member

linev commented Apr 16, 2026

@ferdymercury

Sorry, I push my commit into wrong origin.
I just experimenting with your changes on Windows and hit compilation problem.
I submit these changes as separate PR.
Please restore your commits again

@linev
Copy link
Copy Markdown
Member

linev commented Apr 16, 2026

I run on Windows fitPanelUnitTesting.exe -b and it only report TestTree1D FAILED, but does not crash.

Comment thread gui/fitpanel/test/CMakeLists.txt Outdated
Co-authored-by: Sergey Linev <S.Linev@gsi.de>
@ferdymercury
Copy link
Copy Markdown
Collaborator Author

We're still getting the free-invalid-pointer on Unix

@linev
Copy link
Copy Markdown
Member

linev commented Apr 16, 2026

We're still getting the free-invalid-pointer on Unix

There are only two platforms failing - ubuntu2404 and mac14 - which is already grate.
Nobody says that it will be easy. We never tried test gui classes in batch before

@ferdymercury
Copy link
Copy Markdown
Collaborator Author

There are only two platforms failing - ubuntu2404 and mac14 - which is already grate. Nobody says that it will be easy. We never tried test gui classes in batch before

Indeed :) It's great.
This started all with removing an old legacy Makefile and has evolved into something bigger hehe

Comment thread gui/fitpanel/test/CMakeLists.txt Outdated
ferdymercury and others added 3 commits April 16, 2026 16:53
@linev
Copy link
Copy Markdown
Member

linev commented Apr 21, 2026

@ferdymercury

Did you manage to reproduce failures?
Probably you can try to fully reproduce environment with docker - you can find some info in builder logs.

But I have other idea about ROOT gui testing.
Now we try to enable gui test in true batch mode.
But may be we can try other direction?

Since a while I using xvfb to run JSROOT tests in batch.
It is because of gl module in node.js which requires DISPLAY and some basic X11 functions.
But we also can use xvfb-run for ROOT gui tests.
It provides good X11 emulation and probably better approach for us.
Then we can also try to enable stressGUI tests in CI.
Execution can look like:

xvb-run -a -s "-ac -screen 0 1280x1024x24" ./UnitTesting

So root application will recognize display and will run in normal GUI mode.
Should we try this?

@ferdymercury
Copy link
Copy Markdown
Collaborator Author

ferdymercury commented Apr 21, 2026

Did you manage to reproduce failures?

I also see 100% reproducible stack-smashing when I run in interpreted mode rather than compiled mode. (Batch Or Non-Batch).
Always in line 2238 of TFitEditor.cxx

Valgrind says:

==720346== Process terminating with default action of signal 6 (SIGABRT)
==720346==    at 0x52CF9FC: __pthread_kill_implementation (pthread_kill.c:44)
==720346==    by 0x52CF9FC: __pthread_kill_internal (pthread_kill.c:78)
==720346==    by 0x52CF9FC: pthread_kill@@GLIBC_2.34 (pthread_kill.c:89)
==720346==    by 0x527B475: raise (raise.c:26)
==720346==    by 0x5261884: abort (abort.c:100)
==720346==    by 0x52C2676: __libc_message (libc_fatal.c:156)
==720346==    by 0x536F619: __fortify_fail (fortify_fail.c:26)
==720346==    by 0x536F5E5: __stack_chk_fail (stack_chk_fail.c:24)
==720346==    by 0x2F3E2BE5: TFitEditor::DoFit() (TFitEditor.cxx:2238)
==720346==    by 0x29297F9D: ???
==720346==    by 0x29295CDC: ???
==720346==    by 0x2929519D: ???
==720346==    by 0x29294A2D: ???
==720346==    by 0x4FE7049: ???

I seldom also see those munmap_chunk or free invalid pointer failures locally when in compiled mode. Not very often, it's a bit random. It seems that it only happens if the program is fast (goes away if I put debugging points). If I recall correctly, it was in the same place, when deleting the TGToolTip that then calls ~TGGC

With valgrind ROOT build Release Mode. I get:

Processing /opt/root_src/gui/fitpanel/test/UnitTesting.cxx+...
==849553== Syscall param msync(start) points to unaddressable byte(s)
==849553==    at 0x5107B57: msync (msync.c:25)
==849553==    by 0x6822C85: ??? (in /tmp/rbld/lib/libCling.so)
==849553==    by 0x6711171: cling_runtime_internal_throwIfInvalidPointer (in /tmp/rbld/lib/libCling.so)
==849553==    by 0x29FBA04B: ???
==849553==    by 0x678F697: ??? (in /tmp/rbld/lib/libCling.so)
==849553==    by 0x6714692: ??? (in /tmp/rbld/lib/libCling.so)
==849553==    by 0x67150A6: ??? (in /tmp/rbld/lib/libCling.so)
==849553==    by 0x6806AC1: ??? (in /tmp/rbld/lib/libCling.so)
==849553==    by 0x6815FFA: ??? (in /tmp/rbld/lib/libCling.so)
==849553==    by 0x68178E8: ??? (in /tmp/rbld/lib/libCling.so)
==849553==    by 0x67FF34F: ??? (in /tmp/rbld/lib/libCling.so)
==849553==    by 0x65EC42B: ??? (in /tmp/rbld/lib/libCling.so)
==849553==  Address 0x5475018 is 0 bytes after a block of size 72 alloc'd
==849553==    at 0x4849013: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==849553==    by 0x4A999DD: TStorage::ObjectAlloc(unsigned long) (in /tmp/rbld/lib/libCore.so)
==849553==    by 0x4A325EF: _GLOBAL__sub_I_TQConnection.cxx (in /tmp/rbld/lib/libCore.so)
==849553==    by 0x400647D: call_init.part.0 (dl-init.c:70)
==849553==    by 0x4006567: call_init (dl-init.c:33)
==849553==    by 0x4006567: _dl_init (dl-init.c:117)
==849553==    by 0x40202C9: ??? (in /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
==849553==    by 0x2: ???
==849553==    by 0x1FFF0004CE: ???
==849553==    by 0x1FFF0004DD: ???
==849553==    by 0x1FFF0004E0: ???

so something might be off when initializing TQConnection ?

xvfb-run: I also use that for my application, so sounds good.
But I think we should first try a bit more with valgrind to see if we can fix the current failure.

@ferdymercury
Copy link
Copy Markdown
Collaborator Author

@linev my proposal would be to merge this 'as is' now (squashed). Disabling the 1D tree test seems to avoid the free/munmap/stack smashing issue in the fit editor. One can then later go back into this TTree issue interaction with the TFitEditor, it's marked as TODO.
At least we have enabled 60% of tests that were no longer running, so it's better than nothing :)

@linev
Copy link
Copy Markdown
Member

linev commented Apr 21, 2026

At least we have enabled 60% of tests that were no longer running, so it's better than nothing :)

Yes, it sounds reasonable.

Lets wait for 6.40 branching and will continue.
But idea with xvfb-run also can be investigated.

@ferdymercury ferdymercury added this to the 6.42.00 milestone Apr 21, 2026
@ferdymercury
Copy link
Copy Markdown
Collaborator Author

But idea with xvfb-run also can be investigated.

Yep.

xvfb-run root UnitTesting.cxx

also fails with *** stack smashing detected ***: terminated

whereas xvfb-run root UnitTesting.cxx+ seems ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants