Skip to content

Commit

Permalink
Fix performance issue caused by using repeated > characters inside …
Browse files Browse the repository at this point in the history
…`<!DOCTYPE name [<!ENTITY>]>` (#175)

A `<` is treated as a string delimiter. 
In certain cases, if `<` is used in succession, read and match are
repeated, which slows down the process. Therefore, the following is used
to read ahead to a specific part of the string in advance.
  • Loading branch information
Watson1978 authored Jul 16, 2024
1 parent a79ac8b commit 67efb59
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 2 deletions.
8 changes: 6 additions & 2 deletions lib/rexml/parsers/baseparser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -124,11 +124,15 @@ class BaseParser
}

module Private
INSTRUCTION_END = /#{NAME}(\s+.*?)?\?>/um
# Terminal requires two or more letters.
INSTRUCTION_TERM = "?>"
COMMENT_TERM = "-->"
CDATA_TERM = "]]>"
DOCTYPE_TERM = "]>"
# Read to the end of DOCTYPE because there is no proper ENTITY termination
ENTITY_TERM = DOCTYPE_TERM

INSTRUCTION_END = /#{NAME}(\s+.*?)?\?>/um
TAG_PATTERN = /((?>#{QNAME_STR}))\s*/um
CLOSE_PATTERN = /(#{QNAME_STR})\s*>/um
ATTLISTDECL_END = /\s+#{NAME}(?:#{ATTDEF})*\s*>/um
Expand Down Expand Up @@ -313,7 +317,7 @@ def pull_event
raise REXML::ParseException.new( "Bad ELEMENT declaration!", @source ) if md.nil?
return [ :elementdecl, "<!ELEMENT" + md[1] ]
elsif @source.match("ENTITY", true)
match = [:entitydecl, *@source.match(Private::ENTITYDECL_PATTERN, true).captures.compact]
match = [:entitydecl, *@source.match(Private::ENTITYDECL_PATTERN, true, term: Private::ENTITY_TERM).captures.compact]
ref = false
if match[1] == '%'
ref = true
Expand Down
7 changes: 7 additions & 0 deletions test/parse/test_entity_declaration.rb
Original file line number Diff line number Diff line change
Expand Up @@ -32,5 +32,12 @@ def test_empty
<!ENTITY> ]> <r/>
DETAIL
end

def test_gt_linear_performance
seq = [10000, 50000, 100000, 150000, 200000]
assert_linear_performance(seq, rehearsal: 10) do |n|
REXML::Document.new('<!DOCTYPE rubynet [<!ENTITY rbconfig.ruby_version "' + '>' * n + '">')
end
end
end
end

0 comments on commit 67efb59

Please sign in to comment.