Home All Groups Group Topic Archive Search About

Character encoding - 1252 vs. ISO-8859-1

Author
17 Mar 2006 3:37 PM
JS
I was wondering why one would specify character encoding of 1252 vs.
ISO-8859-1 when retrieving data via HTTP.  My circumstance is that I am
retrieving XML via HTTP with French characters in it and I have
specified the encoding as follows:

Dim str as New StreamReader([data source],
system.text.encoding.getencoding("ISO-8859-1"))

Doing this works fine and I retrieve the data without the special
French characters being dropped.  When I change the above line of code
to the following:

Dim str as New StreamReader([data source],
System.Text.Encoding.GetEncoding(1252))

The end result is the same.

Is there any advantage to one encoding over another?

Author
17 Mar 2006 5:35 PM
Joerg Jooss
Thus wrote js,

Show quote
> I was wondering why one would specify character encoding of 1252 vs.
> ISO-8859-1 when retrieving data via HTTP.  My circumstance is that I
> am retrieving XML via HTTP with French characters in it and I have
> specified the encoding as follows:
>
> Dim str as New StreamReader([data source],
> system.text.encoding.getencoding("ISO-8859-1"))
> Doing this works fine and I retrieve the data without the special
> French characters being dropped.  When I change the above line of code
> to the following:
>
> Dim str as New StreamReader([data source],
> System.Text.Encoding.GetEncoding(1252))
> The end result is the same.
>
> Is there any advantage to one encoding over another?

Well, both are dated. Windows-1252 is actually an extension of ISO-8859-1.
See http://www.microsoft.com/globaldev/reference/sbcs/1252.mspx and http://www.microsoft.com/globaldev/reference/iso/28591.mspx.
ISO-8859-1 does not contain €, nor the uppercase and lowercase "oe" ligature
(Unicode \u0152 and \u0153). Windows-1252 contains both.

Modern applications should rather use one of the Unicode Transformation Formats
like UTF-8.

Cheers,
--
Joerg Jooss
news-re***@joergjooss.de
Author
17 Mar 2006 6:15 PM
JS
> Well, both are dated. Windows-1252 is actually an extension of
> ISO-8859-1. See
> http://www.microsoft.com/globaldev/reference/sbcs/1252.mspx and
> http://www.microsoft.com/globaldev/reference/iso/28591.mspx.
> ISO-8859-1 does not contain €, nor the uppercase and
> lowercase "oe" ligature (Unicode \u0152 and \u0153).
> Windows-1252 contains both.
>
> Modern applications should rather use one of the Unicode
> Transformation Formats like UTF-8.

Okay, that is what I was thinking (in terms of the difference between
the two of them) when I was researching the issue but figured that
there must be something else I was missing.  Unfortunately I cannot get
our remote partners to switch to UTF-8 (or something else more current)
so I am stuck with it but at least I feel comfortable with what I am
doing.

Thank you Joerg; great informations and assistance as always.

J.

AddThis Social Bookmark Button