Further clarify the rationale of this workaround based on research

AzureAD · Mar 2, 2018 · 562239e · 562239e
1 parent 7a8dc45
commit 562239e
Showing 1 changed file with 10 additions and 4 deletions.
diff --git a/adal/wstrust_response.py b/adal/wstrust_response.py
@@ -63,15 +63,21 @@ def findall_content(xml_string, tag):
     >>> findall_content("<ns0:foo> what <bar> ever </bar> content </ns0:foo>", "foo")
     [" what <bar> ever </bar> content "]
 
+    Motivation:
+
     Usually we would use XML parser to extract the data by xpath.
     However the ElementTree in Python will implicitly normalize the output
     by "hoisting" the inner inline namespaces into the outmost element.
     The result will be a semantically equivalent XML snippet,
     but not fully identical to the original one.
-    While this shouldn't become a problem,
-    in practice it could potentially confuse a picky recipient.
-
-    Introducing this helper, based on Regex, which will return raw content as-is.
+    While this effect shouldn't become a problem in all other cases,
+    it does not seem to fully comply with Exclusive XML Canonicalization spec
+    (https://www.w3.org/TR/xml-exc-c14n/), and void the SAML token signature.
+    SAML signature algo needs the "XML -> C14N(XML) -> Signed(C14N(Xml))" order.
+
+    The binary extention lxml is probably the right way to solve this
+    (https://stackoverflow.com/questions/22959577/python-exclusive-xml-canonicalization-xml-exc-c14n)
+    but here we use this workaround, based on Regex, to return raw content as-is.
     """
     # \w+ is good enough for https://www.w3.org/TR/REC-xml/#NT-NameChar
     pattern = r"<(?:\w+:)?%(tag)s(?:[^>]*)>(.*)</(?:\w+:)?%(tag)s" % {"tag": tag}