Track external accumulators in tracer instead of using SparkInfo values#10553
Draft
charlesmyu wants to merge 1 commit intomasterfrom
Draft
Track external accumulators in tracer instead of using SparkInfo values#10553charlesmyu wants to merge 1 commit intomasterfrom
charlesmyu wants to merge 1 commit intomasterfrom
Conversation
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 60 metrics, 11 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.60.0-SNAPSHOT~e413d1d9cf, baseline=1.60.0-SNAPSHOT~af8b84438c
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.069 s) : 0, 1068747
Total [baseline] (10.877 s) : 0, 10877012
Agent [candidate] (1.072 s) : 0, 1071665
Total [candidate] (10.947 s) : 0, 10947384
section appsec
Agent [baseline] (1.242 s) : 0, 1242403
Total [baseline] (11.105 s) : 0, 11104793
Agent [candidate] (1.239 s) : 0, 1238697
Total [candidate] (11.013 s) : 0, 11013326
section iast
Agent [baseline] (1.241 s) : 0, 1240914
Total [baseline] (11.166 s) : 0, 11166215
Agent [candidate] (1.233 s) : 0, 1233413
Total [candidate] (11.314 s) : 0, 11314238
section profiling
Agent [baseline] (1.192 s) : 0, 1192055
Total [baseline] (11.016 s) : 0, 11015633
Agent [candidate] (1.193 s) : 0, 1193069
Total [candidate] (10.926 s) : 0, 10926312
gantt
title petclinic - break down per module: candidate=1.60.0-SNAPSHOT~e413d1d9cf, baseline=1.60.0-SNAPSHOT~af8b84438c
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.211 ms) : 0, 1211
crashtracking [candidate] (1.202 ms) : 0, 1202
BytebuddyAgent [baseline] (630.407 ms) : 0, 630407
BytebuddyAgent [candidate] (631.711 ms) : 0, 631711
AgentMeter [baseline] (29.14 ms) : 0, 29140
AgentMeter [candidate] (29.266 ms) : 0, 29266
GlobalTracer [baseline] (258.376 ms) : 0, 258376
GlobalTracer [candidate] (259.238 ms) : 0, 259238
AppSec [baseline] (33.007 ms) : 0, 33007
AppSec [candidate] (33.397 ms) : 0, 33397
Debugger [baseline] (63.938 ms) : 0, 63938
Debugger [candidate] (64.774 ms) : 0, 64774
Remote Config [baseline] (623.525 µs) : 0, 624
Remote Config [candidate] (629.253 µs) : 0, 629
Telemetry [baseline] (9.851 ms) : 0, 9851
Telemetry [candidate] (9.183 ms) : 0, 9183
Flare Poller [baseline] (6.018 ms) : 0, 6018
Flare Poller [candidate] (6.155 ms) : 0, 6155
section appsec
crashtracking [baseline] (1.196 ms) : 0, 1196
crashtracking [candidate] (1.191 ms) : 0, 1191
BytebuddyAgent [baseline] (660.33 ms) : 0, 660330
BytebuddyAgent [candidate] (658.016 ms) : 0, 658016
AgentMeter [baseline] (11.975 ms) : 0, 11975
AgentMeter [candidate] (11.995 ms) : 0, 11995
GlobalTracer [baseline] (258.71 ms) : 0, 258710
GlobalTracer [candidate] (258.255 ms) : 0, 258255
AppSec [baseline] (168.177 ms) : 0, 168177
AppSec [candidate] (167.759 ms) : 0, 167759
Debugger [baseline] (66.817 ms) : 0, 66817
Debugger [candidate] (66.46 ms) : 0, 66460
Remote Config [baseline] (671.681 µs) : 0, 672
Remote Config [candidate] (660.821 µs) : 0, 661
Telemetry [baseline] (9.388 ms) : 0, 9388
Telemetry [candidate] (9.401 ms) : 0, 9401
Flare Poller [baseline] (3.728 ms) : 0, 3728
Flare Poller [candidate] (3.634 ms) : 0, 3634
IAST [baseline] (25.387 ms) : 0, 25387
IAST [candidate] (25.327 ms) : 0, 25327
section iast
crashtracking [baseline] (1.206 ms) : 0, 1206
crashtracking [candidate] (1.182 ms) : 0, 1182
BytebuddyAgent [baseline] (802.838 ms) : 0, 802838
BytebuddyAgent [candidate] (795.852 ms) : 0, 795852
AgentMeter [baseline] (11.348 ms) : 0, 11348
AgentMeter [candidate] (11.334 ms) : 0, 11334
GlobalTracer [baseline] (248.652 ms) : 0, 248652
GlobalTracer [candidate] (247.611 ms) : 0, 247611
AppSec [baseline] (33.467 ms) : 0, 33467
AppSec [candidate] (35.325 ms) : 0, 35325
Debugger [baseline] (67.399 ms) : 0, 67399
Debugger [candidate] (66.485 ms) : 0, 66485
Remote Config [baseline] (545.028 µs) : 0, 545
Remote Config [candidate] (544.189 µs) : 0, 544
Telemetry [baseline] (8.703 ms) : 0, 8703
Telemetry [candidate] (8.694 ms) : 0, 8694
Flare Poller [baseline] (3.466 ms) : 0, 3466
Flare Poller [candidate] (3.511 ms) : 0, 3511
IAST [baseline] (27.153 ms) : 0, 27153
IAST [candidate] (26.985 ms) : 0, 26985
section profiling
crashtracking [baseline] (1.189 ms) : 0, 1189
crashtracking [candidate] (1.181 ms) : 0, 1181
BytebuddyAgent [baseline] (681.844 ms) : 0, 681844
BytebuddyAgent [candidate] (682.574 ms) : 0, 682574
AgentMeter [baseline] (8.55 ms) : 0, 8550
AgentMeter [candidate] (8.61 ms) : 0, 8610
GlobalTracer [baseline] (216.071 ms) : 0, 216071
GlobalTracer [candidate] (216.458 ms) : 0, 216458
AppSec [baseline] (32.719 ms) : 0, 32719
AppSec [candidate] (32.857 ms) : 0, 32857
Debugger [baseline] (67.175 ms) : 0, 67175
Debugger [candidate] (67.463 ms) : 0, 67463
Remote Config [baseline] (625.542 µs) : 0, 626
Remote Config [candidate] (635.346 µs) : 0, 635
Telemetry [baseline] (8.965 ms) : 0, 8965
Telemetry [candidate] (9.071 ms) : 0, 9071
Flare Poller [baseline] (3.832 ms) : 0, 3832
Flare Poller [candidate] (3.754 ms) : 0, 3754
ProfilingAgent [baseline] (100.414 ms) : 0, 100414
ProfilingAgent [candidate] (99.836 ms) : 0, 99836
Profiling [baseline] (101.002 ms) : 0, 101002
Profiling [candidate] (100.416 ms) : 0, 100416
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.60.0-SNAPSHOT~e413d1d9cf, baseline=1.60.0-SNAPSHOT~af8b84438c
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.065 s) : 0, 1065499
Total [baseline] (8.72 s) : 0, 8719840
Agent [candidate] (1.07 s) : 0, 1070364
Total [candidate] (8.757 s) : 0, 8757404
section iast
Agent [baseline] (1.229 s) : 0, 1229421
Total [baseline] (9.374 s) : 0, 9373703
Agent [candidate] (1.232 s) : 0, 1231501
Total [candidate] (9.363 s) : 0, 9363281
gantt
title insecure-bank - break down per module: candidate=1.60.0-SNAPSHOT~e413d1d9cf, baseline=1.60.0-SNAPSHOT~af8b84438c
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.211 ms) : 0, 1211
crashtracking [candidate] (1.217 ms) : 0, 1217
BytebuddyAgent [baseline] (627.579 ms) : 0, 627579
BytebuddyAgent [candidate] (633.164 ms) : 0, 633164
AgentMeter [baseline] (29.087 ms) : 0, 29087
AgentMeter [candidate] (29.316 ms) : 0, 29316
GlobalTracer [baseline] (257.55 ms) : 0, 257550
GlobalTracer [candidate] (258.005 ms) : 0, 258005
AppSec [baseline] (33.031 ms) : 0, 33031
AppSec [candidate] (32.95 ms) : 0, 32950
Debugger [baseline] (65.271 ms) : 0, 65271
Debugger [candidate] (64.388 ms) : 0, 64388
Remote Config [baseline] (631.745 µs) : 0, 632
Remote Config [candidate] (632.461 µs) : 0, 632
Telemetry [baseline] (10.562 ms) : 0, 10562
Telemetry [candidate] (10.706 ms) : 0, 10706
Flare Poller [baseline] (4.489 ms) : 0, 4489
Flare Poller [candidate] (3.772 ms) : 0, 3772
section iast
crashtracking [baseline] (1.208 ms) : 0, 1208
crashtracking [candidate] (1.197 ms) : 0, 1197
BytebuddyAgent [baseline] (794.652 ms) : 0, 794652
BytebuddyAgent [candidate] (794.916 ms) : 0, 794916
AgentMeter [baseline] (11.317 ms) : 0, 11317
AgentMeter [candidate] (11.31 ms) : 0, 11310
GlobalTracer [baseline] (247.38 ms) : 0, 247380
GlobalTracer [candidate] (247.923 ms) : 0, 247923
AppSec [baseline] (33.164 ms) : 0, 33164
AppSec [candidate] (31.633 ms) : 0, 31633
Debugger [baseline] (66.051 ms) : 0, 66051
Debugger [candidate] (69.152 ms) : 0, 69152
Remote Config [baseline] (548.871 µs) : 0, 549
Remote Config [candidate] (535.661 µs) : 0, 536
Telemetry [baseline] (8.601 ms) : 0, 8601
Telemetry [candidate] (8.479 ms) : 0, 8479
Flare Poller [baseline] (3.392 ms) : 0, 3392
Flare Poller [candidate] (3.492 ms) : 0, 3492
IAST [baseline] (27.062 ms) : 0, 27062
IAST [candidate] (26.846 ms) : 0, 26846
LoadParameters
See matching parameters
SummaryFound 4 performance improvements and 2 performance regressions! Performance is the same for 11 metrics, 19 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~e413d1d9cf, baseline=1.60.0-SNAPSHOT~af8b84438c
dateFormat X
axisFormat %s
section baseline
no_agent (18.435 ms) : 18238, 18632
. : milestone, 18435,
appsec (18.754 ms) : 18565, 18942
. : milestone, 18754,
code_origins (17.827 ms) : 17646, 18007
. : milestone, 17827,
iast (18.683 ms) : 18493, 18872
. : milestone, 18683,
profiling (18.458 ms) : 18272, 18644
. : milestone, 18458,
tracing (18.859 ms) : 18667, 19051
. : milestone, 18859,
section candidate
no_agent (18.972 ms) : 18777, 19167
. : milestone, 18972,
appsec (18.727 ms) : 18535, 18918
. : milestone, 18727,
code_origins (18.668 ms) : 18478, 18857
. : milestone, 18668,
iast (17.836 ms) : 17656, 18017
. : milestone, 17836,
profiling (18.732 ms) : 18544, 18920
. : milestone, 18732,
tracing (17.625 ms) : 17450, 17800
. : milestone, 17625,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~e413d1d9cf, baseline=1.60.0-SNAPSHOT~af8b84438c
dateFormat X
axisFormat %s
section baseline
no_agent (1.175 ms) : 1163, 1186
. : milestone, 1175,
iast (3.223 ms) : 3176, 3269
. : milestone, 3223,
iast_FULL (5.7 ms) : 5644, 5756
. : milestone, 5700,
iast_GLOBAL (3.512 ms) : 3455, 3569
. : milestone, 3512,
profiling (2.07 ms) : 2052, 2089
. : milestone, 2070,
tracing (1.827 ms) : 1811, 1844
. : milestone, 1827,
section candidate
no_agent (1.185 ms) : 1173, 1197
. : milestone, 1185,
iast (3.106 ms) : 3064, 3149
. : milestone, 3106,
iast_FULL (6.052 ms) : 5989, 6115
. : milestone, 6052,
iast_GLOBAL (3.475 ms) : 3419, 3531
. : milestone, 3475,
profiling (2.064 ms) : 2044, 2084
. : milestone, 2064,
tracing (1.843 ms) : 1827, 1859
. : milestone, 1843,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~e413d1d9cf, baseline=1.60.0-SNAPSHOT~af8b84438c
dateFormat X
axisFormat %s
section baseline
no_agent (1.47 ms) : 1458, 1481
. : milestone, 1470,
appsec (3.758 ms) : 3536, 3980
. : milestone, 3758,
iast (2.258 ms) : 2188, 2328
. : milestone, 2258,
iast_GLOBAL (2.293 ms) : 2223, 2363
. : milestone, 2293,
profiling (2.104 ms) : 2047, 2161
. : milestone, 2104,
tracing (2.054 ms) : 2000, 2109
. : milestone, 2054,
section candidate
no_agent (1.465 ms) : 1454, 1477
. : milestone, 1465,
appsec (3.725 ms) : 3506, 3944
. : milestone, 3725,
iast (2.246 ms) : 2177, 2315
. : milestone, 2246,
iast_GLOBAL (2.29 ms) : 2221, 2360
. : milestone, 2290,
profiling (2.099 ms) : 2042, 2156
. : milestone, 2099,
tracing (2.047 ms) : 1993, 2100
. : milestone, 2047,
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~e413d1d9cf, baseline=1.60.0-SNAPSHOT~af8b84438c
dateFormat X
axisFormat %s
section baseline
no_agent (15.497 s) : 15497000, 15497000
. : milestone, 15497000,
appsec (14.671 s) : 14671000, 14671000
. : milestone, 14671000,
iast (18.213 s) : 18213000, 18213000
. : milestone, 18213000,
iast_GLOBAL (17.808 s) : 17808000, 17808000
. : milestone, 17808000,
profiling (14.972 s) : 14972000, 14972000
. : milestone, 14972000,
tracing (14.585 s) : 14585000, 14585000
. : milestone, 14585000,
section candidate
no_agent (15.673 s) : 15673000, 15673000
. : milestone, 15673000,
appsec (15.003 s) : 15003000, 15003000
. : milestone, 15003000,
iast (18.349 s) : 18349000, 18349000
. : milestone, 18349000,
iast_GLOBAL (18.184 s) : 18184000, 18184000
. : milestone, 18184000,
profiling (14.984 s) : 14984000, 14984000
. : milestone, 14984000,
tracing (14.568 s) : 14568000, 14568000
. : milestone, 14568000,
|
4e5bdc7 to
ba09c80
Compare
cde7981 to
e52fbc5
Compare
e52fbc5 to
e413d1d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What Does This Do
Updates the metrics in the
_dd.spark.sql_planmeta field to use distributions calculated from individual task metrics, rather than the naively summed metrics provided by theStageInfoobjects from Spark. This is becauseStageInfonaively sums all accumulators, even though that may not make sense for certain Spark SQL metrics (e.g. avg hash probes per key for aggr operations). Instead, we should accumulate those ourselves into distribution metrics and emit them accordingly.Currently in the UI, this is only used in one place (in the Spark SQL metrics in the DJM product), so we're not too worried about changing the format here. UI update to follow.
Motivation
We'd like accurate metrics for Spark SQL operations that can reflect task-level characteristics as a distribution. This brings us more in line with what is shown in the Spark UI:

Additional Notes
We can't get rid of the original map that tracks accumulators to stages as we still use that to associate Spark SQL operations to stages. However, we can avoid storing the entire accumulator now, and instead just store a simple map of accumulator ID to stage ID. This will be done in a followup PR: #10645
Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.