Skip to content

Ignore DOCTYPE inside a multi-line comment body#36948

Open
junhyeong9812 wants to merge 1 commit into
spring-projects:mainfrom
junhyeong9812:fix/xmlvalidationmodedetector-doctype-in-comment
Open

Ignore DOCTYPE inside a multi-line comment body#36948
junhyeong9812 wants to merge 1 commit into
spring-projects:mainfrom
junhyeong9812:fix/xmlvalidationmodedetector-doctype-in-comment

Conversation

@junhyeong9812

@junhyeong9812 junhyeong9812 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Overview

XmlValidationModeDetector peeks at the start of an XML document to decide between
DTD- and XSD-based validation by looking for a DOCTYPE declaration, while skipping
any DOCTYPE text that appears inside XML comments.

Problem

The comment-skipping logic does not fully account for the multi-line "in comment"
state. consumeCommentTokens(String) short-circuits a line that contains neither a
start (<!--) nor an end (-->) marker by returning it unchanged:

int indexOfStartComment = line.indexOf(START_COMMENT);
if (indexOfStartComment == -1 && !line.contains(END_COMMENT)) {
    return line;
}

When the parser is already inside a multi-line comment (this.inComment == true),
such a line is the body of the comment, yet it is returned verbatim as content. If
that body line contains the text DOCTYPE, hasDoctype(content) matches and the
document is misdetected as DTD-based.

<?xml version="1.0" encoding="UTF-8"?>
<!--
  See the DOCTYPE notes for legacy configs
-->
<beans .../>

detectValidationMode() returns VALIDATION_DTD for this XSD document instead of
VALIDATION_XSD.

This is the residual case left by the fix for #27915 (DOCTYPE in a comment): that fix
covers comments whose markers sit on the same line, but a marker-less body line of a
multi-line comment still leaks through the early return above. Note the asymmetry —
hasOpeningTag(String) already guards with if (this.inComment) return false;, but
the comment-consumption path does not.

Fix

Honor the "in comment" state in the early return so a comment body line is treated as
empty content rather than leaking out:

if (indexOfStartComment == -1 && !line.contains(END_COMMENT)) {
    // If we are inside a multi-line comment, the entire line is comment
    // data and must not be treated as content.
    return (this.inComment ? "" : line);
}

This addresses the documented contract of consumeCommentTokens ("returns the
remaining content, which may be empty since the supplied content might be all comment
data ... takes the current 'in comment' parsing state into account"). A regression
fixture (xsdWithDoctypeInMultiLineCommentBody.xml) is added to the existing
parameterized xsdDetection test.

Note

If you'd prefer a belt-and-suspenders approach, a symmetric inComment guard could
also be added to hasDoctype(String) (mirroring the existing guard in
hasOpeningTag(String)) so that a DOCTYPE match is never honored while inside a
comment, regardless of how the content was produced. I kept this PR to the single
root-cause change to stay minimal, but I'm happy to add that guard as well if you
think it's worthwhile.

Related Issues

XmlValidationModeDetector peeks at the start of an XML document to choose
between DTD- and XSD-based validation, skipping any DOCTYPE that appears
inside an XML comment. consumeCommentTokens short-circuited a line with no
start or end comment marker by returning it unchanged, even while already
inside a multi-line comment. Such a body line was then treated as content,
so a literal "DOCTYPE" word in the comment body caused an XSD document to be
misdetected as DTD-based.

Honor the "in comment" parse state in that early return so a comment body
line is treated as empty content, completing the fix for spring-projectsgh-27915 which only
covered comment markers on the same line.

Signed-off-by: junhyeong9812 <pickjog@gmail.com>
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Jun 18, 2026
@sbrannen sbrannen self-assigned this Jun 19, 2026
@sbrannen sbrannen added in: core Issues in core modules (aop, beans, core, context, expression) type: bug A general bug and removed status: waiting-for-triage An issue we've not yet triaged or decided on labels Jun 19, 2026
@sbrannen sbrannen added this to the 7.0.9 milestone Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

in: core Issues in core modules (aop, beans, core, context, expression) type: bug A general bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants