Skip to content
This repository has been archived by the owner on Sep 29, 2023. It is now read-only.

Commit

Permalink
Further clarify the rationale of this workaround based on research
Browse files Browse the repository at this point in the history
  • Loading branch information
rayluo committed Mar 2, 2018
1 parent 7a8dc45 commit 562239e
Showing 1 changed file with 10 additions and 4 deletions.
14 changes: 10 additions & 4 deletions adal/wstrust_response.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,15 +63,21 @@ def findall_content(xml_string, tag):
>>> findall_content("<ns0:foo> what <bar> ever </bar> content </ns0:foo>", "foo")
[" what <bar> ever </bar> content "]
Motivation:
Usually we would use XML parser to extract the data by xpath.
However the ElementTree in Python will implicitly normalize the output
by "hoisting" the inner inline namespaces into the outmost element.
The result will be a semantically equivalent XML snippet,
but not fully identical to the original one.
While this shouldn't become a problem,
in practice it could potentially confuse a picky recipient.
Introducing this helper, based on Regex, which will return raw content as-is.
While this effect shouldn't become a problem in all other cases,
it does not seem to fully comply with Exclusive XML Canonicalization spec
(https://www.w3.org/TR/xml-exc-c14n/), and void the SAML token signature.
SAML signature algo needs the "XML -> C14N(XML) -> Signed(C14N(Xml))" order.
The binary extention lxml is probably the right way to solve this
(https://stackoverflow.com/questions/22959577/python-exclusive-xml-canonicalization-xml-exc-c14n)
but here we use this workaround, based on Regex, to return raw content as-is.
"""
# \w+ is good enough for https://www.w3.org/TR/REC-xml/#NT-NameChar
pattern = r"<(?:\w+:)?%(tag)s(?:[^>]*)>(.*)</(?:\w+:)?%(tag)s" % {"tag": tag}
Expand Down

0 comments on commit 562239e

Please sign in to comment.