Home All Groups Group Topic Archive Search About

XSLT output and restricted encoding in Widbey

Author
17 Nov 2004 5:20 PM
Lionel Fourquaux
In .Net 1.1, System.Xml.Xsl.XslTransform cannot output directly a document
in an encoding that cannot represent all the characters used (e.g. write in
us-ascii for compatibility, and convert all non-ascii chars to entities).
While it can be worked around (by defining a TextWriter that does the
conversion to entities), it's not very elegant (because the information on
encodings must be duplicated outside the XSLT file) and breaks compatibility
with msxml (which can handle this).
Can you tell me whether .Net 2.0 fixes this?
Thanks in advance!
--
  Lionel Fourquaux

Author
22 Nov 2004 4:54 PM
Stuart Celarier
Lionel,

I am not sure your question has anything to do with XSLT per se, but XML
generally. When you specify an encoding on an XML document, you are saying
that the document contents conform to that encoding.

I think this is covered in the XML specification [1]:

"It is a fatal error if an XML entity is determined (via default, encoding
declaration, or higher-level protocol) to be in a certain encoding but
contains byte sequences that are not legal in that encoding."

It is a separate question whether the characters are represented as literal
text or using character encodings. It would be an error for any XML document
(regardless of how it is produced) to say that it is encoded in us-ascii and
then include non-us-ascii characters as text or as encoded characters.

If you want compatibility in XML, as you indicate, you should use one of the
two encodings that all XML parsers are required to support [2], UTF-8 or
UTF-16. Or perhaps you meant compatibility with some other software?

Cheers,
Stuart Celarier, Fern Creek

[1] http://www.w3.org/TR/2004/REC-xml-20040204/#charencoding
[2] http://www.w3.org/TR/2004/REC-xml-20040204/#charsets
Author
22 Nov 2004 11:18 PM
Lionel Fourquaux
"Stuart Celarier" <stuart at ferncrk dot com> a écrit dans le message de
news: OH9FyPL0EHA.1***@TK2MSFTNGP14.phx.gbl...
> I am not sure your question has anything to do with XSLT per se, but XML
> generally. When you specify an encoding on an XML document, you are saying
> that the document contents conform to that encoding.

The document must be described using this encoding, but it can reference any
unicode character using character references.

What I'm referring to is this part of the XSLT spec:
"It is possible that the result tree will contain a character that cannot be
represented in the encoding that the XSLT processor is using for output. In
this case, if the character occurs in a context where XML recognizes
character references (i.e. in the value of an attribute node or text node),
then the character should be output as a character reference; otherwise (for
example if the character occurs in the name of an element) the XSLT
processor should signal an error."
(http://www.w3.org/TR/xslt.html#section-XML-Output-Method)

Oleg Tkachenko describes the problem (XslTransform doesn't do this) and
gives a workaround in his blog
(http://www.tkachenko.com/blog/archives/000266.html). I'd like to know
whether I can hope for a more elegant solution in .Net 2.0. (I assume some
people here are beta-testing it).

>  Or perhaps you meant compatibility with some other software?

That's it.

AddThis Social Bookmark Button