Ignore DOCTYPE inside a multi-line comment body#36948
Open
junhyeong9812 wants to merge 1 commit into
Open
Conversation
XmlValidationModeDetector peeks at the start of an XML document to choose between DTD- and XSD-based validation, skipping any DOCTYPE that appears inside an XML comment. consumeCommentTokens short-circuited a line with no start or end comment marker by returning it unchanged, even while already inside a multi-line comment. Such a body line was then treated as content, so a literal "DOCTYPE" word in the comment body caused an XSD document to be misdetected as DTD-based. Honor the "in comment" parse state in that early return so a comment body line is treated as empty content, completing the fix for spring-projectsgh-27915 which only covered comment markers on the same line. Signed-off-by: junhyeong9812 <pickjog@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
XmlValidationModeDetectorpeeks at the start of an XML document to decide betweenDTD- and XSD-based validation by looking for a
DOCTYPEdeclaration, while skippingany
DOCTYPEtext that appears inside XML comments.Problem
The comment-skipping logic does not fully account for the multi-line "in comment"
state.
consumeCommentTokens(String)short-circuits a line that contains neither astart (
<!--) nor an end (-->) marker by returning it unchanged:When the parser is already inside a multi-line comment (
this.inComment == true),such a line is the body of the comment, yet it is returned verbatim as content. If
that body line contains the text
DOCTYPE,hasDoctype(content)matches and thedocument is misdetected as DTD-based.
detectValidationMode()returnsVALIDATION_DTDfor this XSD document instead ofVALIDATION_XSD.This is the residual case left by the fix for #27915 (DOCTYPE in a comment): that fix
covers comments whose markers sit on the same line, but a marker-less body line of a
multi-line comment still leaks through the early return above. Note the asymmetry —
hasOpeningTag(String)already guards withif (this.inComment) return false;, butthe comment-consumption path does not.
Fix
Honor the "in comment" state in the early return so a comment body line is treated as
empty content rather than leaking out:
This addresses the documented contract of
consumeCommentTokens("returns theremaining content, which may be empty since the supplied content might be all comment
data ... takes the current 'in comment' parsing state into account"). A regression
fixture (
xsdWithDoctypeInMultiLineCommentBody.xml) is added to the existingparameterized
xsdDetectiontest.Note
If you'd prefer a belt-and-suspenders approach, a symmetric
inCommentguard couldalso be added to
hasDoctype(String)(mirroring the existing guard inhasOpeningTag(String)) so that aDOCTYPEmatch is never honored while inside acomment, regardless of how the content was produced. I kept this PR to the single
root-cause change to stay minimal, but I'm happy to add that guard as well if you
think it's worthwhile.
Related Issues