|
dev
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
XmlNode.InnerXml and Xml Readers / XmlDocumentI am currently working on xml files and i am trying to ensure that my code handles any encoded chars (like > < & ' stored as < > " ' ) I am currently using XmlValidatingReader.ReadInnerXml() but i have noticed the behavior with XmlDocument too. assume that i am trying to read a node which looks like this <exlObjectFields> <it:DSPLY_NAME><test"invalid"></it:DSPLY_NAME> <it:OFFCL_CODE>TFMS000000</it:OFFCL_CODE> <it:BCKGRNDPAG>TFMS000000</it:BCKGRNDPAG> </exlObjectFields> The DSPLY_NAME Field code that is read looks like this <it:DSPLY_NAME><test"invalid"></it:DSPLY_NAME> As you can see. the < > not decoded but " is being decode back to " char. It there any ways i could force it to read without automatic decoding certain chars ? Is it a bug ? TIA Hi Hermit,
As for the XML character escaping issue, I'm wondering how do you to load the XML document, is it originally store in file and you use XmlDocument to load it into memory? Based on my understanding, the following like XML document is an invalid one as the '<' , '>' hasn't been escaped and when you load it through XmlDocument class, it will report exception(also the namespace prefix "it" need to be declared): ============== <exlObjectFields> <it:DSPLY_NAME><test"invalid"></it:DSPLY_NAME> <it:OFFCL_CODE>TFMS000000</it:OFFCL_CODE> <it:BCKGRNDPAG>TFMS000000</it:BCKGRNDPAG> </exlObjectFields> ============= I have performed some test wihch store the escaped XML in file(as below); ============== <exlObjectFields xmlns:it="xxxx" > <it:DSPLY_NAME><test"invalid"></it:DSPLY_NAME> <it:OFFCL_CODE>TFMS000000</it:OFFCL_CODE> <it:BCKGRNDPAG>TFMS000000</it:BCKGRNDPAG> </exlObjectFields> =============== After loading into XmlDocument as following, it still keep the escaped format(< and >): >>>>>>>>>>>>>>>>>> private void btnTest2_Click(object sender, EventArgs e){ XmlDocument doc = new XmlDocument(); doc.Load("output.xml"); MessageBox.Show(doc.OuterXml); } <<<<<<<<<<<<<<<<<<< Are you also using the similar code logic? Please feel free to let me know if there is anything I missed. Sincerely, Steven Cheng Microsoft MSDN Online Support Lead ================================================== Get notification to my posts through email? Please refer to http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif ications. Note: The MSDN Managed Newsgroup support offering is for non-urgent issues where an initial response from the community or a Microsoft Support Engineer within 1 business day is acceptable. Please note that each follow up response may take approximately 2 business days as the support professional working with you may need further investigation to reach the most efficient resolution. The offering is not appropriate for situations that require urgent, real-time or phone-based interactions or complex project analysis and dump analysis issues. Issues of this nature are best handled working with a dedicated Microsoft Support Engineer by contacting Microsoft Customer Support Services (CSS) at http://msdn.microsoft.com/subscriptions/support/default.aspx. ================================================== This posting is provided "AS IS" with no warranties, and confers no rights. Steven,
The xml i copied was a part of a bigger xml document that does contain the declaration for "it". However that is not the point. What i meant was that the following characters need to be escaped as they are xml reserved chars > > < <" " ' ' consider the node <DSPLY_NAME><test></DSPLY_NAME> Use XmlTextReader or XmlValidatingReader.ReadInnerXml i correctly received the value <test> However if i were to use <DSPLY_NAME><test"Invalid"></DSPLY_NAME>. The XmlTextReader or XmlValidatingReader's ReadInnerXml() return <test"invalid"> The same applied for any use of '. Is there any way i can avoid the Xml Readers / Document objects from decoding the encoded characters ? Regards, Hermit Show quote "Steven Cheng[MSFT]" wrote: > Hi Hermit, > > As for the XML character escaping issue, I'm wondering how do you to load > the XML document, is it originally store in file and you use XmlDocument to > load it into memory? > > Based on my understanding, the following like XML document is an invalid > one as the '<' , '>' hasn't been escaped and when you load it through > XmlDocument class, it will report exception(also the namespace prefix "it" > need to be declared): > > ============== > <exlObjectFields> > <it:DSPLY_NAME><test"invalid"></it:DSPLY_NAME> > <it:OFFCL_CODE>TFMS000000</it:OFFCL_CODE> > <it:BCKGRNDPAG>TFMS000000</it:BCKGRNDPAG> > </exlObjectFields> > ============= > > > I have performed some test wihch store the escaped XML in file(as below); > > ============== > <exlObjectFields xmlns:it="xxxx" > > <it:DSPLY_NAME><test"invalid"></it:DSPLY_NAME> > <it:OFFCL_CODE>TFMS000000</it:OFFCL_CODE> > <it:BCKGRNDPAG>TFMS000000</it:BCKGRNDPAG> > </exlObjectFields> > =============== > > After loading into XmlDocument as following, it still keep the escaped > format(< and >): > > >>>>>>>>>>>>>>>>>> > private void btnTest2_Click(object sender, EventArgs e) > { > XmlDocument doc = new XmlDocument(); > > doc.Load("output.xml"); > > MessageBox.Show(doc.OuterXml); > } > <<<<<<<<<<<<<<<<<<< > > Are you also using the similar code logic? Please feel free to let me know > if there is anything I missed. > > Sincerely, > > Steven Cheng > > Microsoft MSDN Online Support Lead > > > > ================================================== > > Get notification to my posts through email? Please refer to > http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif > ications. > > > > Note: The MSDN Managed Newsgroup support offering is for non-urgent issues > where an initial response from the community or a Microsoft Support > Engineer within 1 business day is acceptable. Please note that each follow > up response may take approximately 2 business days as the support > professional working with you may need further investigation to reach the > most efficient resolution. The offering is not appropriate for situations > that require urgent, real-time or phone-based interactions or complex > project analysis and dump analysis issues. Issues of this nature are best > handled working with a dedicated Microsoft Support Engineer by contacting > Microsoft Customer Support Services (CSS) at > http://msdn.microsoft.com/subscriptions/support/default.aspx. > > ================================================== > > > > This posting is provided "AS IS" with no warranties, and confers no rights. > > > > > > > Thanks for your reply Hermit,
I'm wondering how you load the XML document, have you tried save it in a file and load it from file. Also, have you set the XmlReader's ReaerSetting to checkCharacters? Based on my test, the following like XML fragment will definitely raise exception when parsing it through XMLReader(since it is an invalid XML document). Here is my test code to load it: ============================ XmlDocument doc = new XmlDocument(); string filepath = @"baddata.xml"; XmlReaderSettings settings = new XmlReaderSettings(); settings.CheckCharacters = true; XmlReader xtr = XmlReader.Create(filepath, settings); doc.Load(xtr); xtr.Close(); MessageBox.Show(doc.OuterXml); ============================= =====baddata.xml=========== <?xml version="1.0" encoding="utf-8" ?> <exlObjectFields xmlns:it="http://schemas.it.org"> <it:DSPLY_NAME> <test"invalid"> </it:DSPLY_NAME> <it:OFFCL_CODE>TFMS000000</it:OFFCL_CODE> <it:BCKGRNDPAG>TFMS000000</it:BCKGRNDPAG> </exlObjectFields> =========================== If possible, would you provide your test code logic so that I can also have a look and test it on my side? Sincerely, Steven Cheng Microsoft MSDN Online Support Lead This posting is provided "AS IS" with no warranties, and confers no rights. Steven,
I am copying relevent code. If you will notice the contents of fieldsXml varialbe when it is read you will realise that the problem. < and > are read as is ' or " are converted to ' or " Xml file being opened with ---------------------------------------------------------------- XmlTextReader textReader = new XmlTextReader( _sPrivateFilePath ); textReader.WhitespaceHandling = WhitespaceHandling.None; _oMyReader = new XmlValidatingReader( textReader ); _oMyReader.ValidationType = ValidationType.None; // Xml validation to be done seperately // Set the validation event handler _oMyReader.ValidationEventHandler += new ValidationEventHandler (ValidationCallBack); ---------------------------------------------------------------- Xml being read. ---------------------------------------------------------------- if (_oMyReader.ReadState == ReadState.Interactive ) { do { if (( _oMyReader.Name == Const.XmlElementName_ExlObject ) && ( _oMyReader.IsStartElement() )) { string fieldsXml = _oMyReader.ReadInnerXml(); string sTag = Const.XmlStartTag_With_Namespace_ExlObjectFields; string eTag = Const.XmlEndTag_ExlObjectFields; // use the data read break; } } while ( _oMyReader.Read() ); } ---------------------------------------------------------------- Xml File ---------------------------------------------------------------- <?xml version="1.0" encoding="utf-8" ?> <exl xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="ExlSchema" xmlns:it="DbIssueTypeSchema" xsi:schemaLocation="DbIssueTypeSchema TEST_SEC.xsd ExlSchema ExlSchema.xsd"> <name>AnEXL</name> <headends><headend>MLS01-1</headend></headends> <version>1.0</version> <date>07-MAR-2007</date> <description>Raaar</description> <exlHeader> <it:EXCHANGE>BA</it:EXCHANGE> <it:ISSUTYPE>TEST_SEC</it:ISSUTYPE> <exlHeaderFields> <it:RECORDTYPE>113</it:RECORDTYPE> <it:TEMP_VERS>202</it:TEMP_VERS> </exlHeaderFields> </exlHeader> <exlObject> <it:SYMBOL>TFMS000000</it:SYMBOL> <it:RIC>TFMR000000.WA</it:RIC> <exlObjectFields> <it:DSPLY_NAME><test"invalid"></it:DSPLY_NAME> <it:OFFCL_CODE>TFMS000000</it:OFFCL_CODE> <it:BCKGRNDPAG>TFMS000000</it:BCKGRNDPAG> </exlObjectFields> </exlObject> </exl> ---------------------------------------------------------------- Show quote "Steven Cheng[MSFT]" wrote: > Thanks for your reply Hermit, > > I'm wondering how you load the XML document, have you tried save it in a > file and load it from file. Also, have you set the XmlReader's ReaerSetting > to checkCharacters? > > Based on my test, the following like XML fragment will definitely raise > exception when parsing it through XMLReader(since it is an invalid XML > document). Here is my test code to load it: > > ============================ > XmlDocument doc = new XmlDocument(); > > string filepath = @"baddata.xml"; > > XmlReaderSettings settings = new XmlReaderSettings(); > settings.CheckCharacters = true; > > > XmlReader xtr = XmlReader.Create(filepath, settings); > > doc.Load(xtr); > > xtr.Close(); > > MessageBox.Show(doc.OuterXml); > ============================= > > =====baddata.xml=========== > <?xml version="1.0" encoding="utf-8" ?> > <exlObjectFields xmlns:it="http://schemas.it.org"> > <it:DSPLY_NAME> > <test"invalid"> > </it:DSPLY_NAME> > <it:OFFCL_CODE>TFMS000000</it:OFFCL_CODE> > <it:BCKGRNDPAG>TFMS000000</it:BCKGRNDPAG> > </exlObjectFields> > =========================== > > If possible, would you provide your test code logic so that I can also have > a look and test it on my side? > > Sincerely, > > Steven Cheng > > Microsoft MSDN Online Support Lead > > > This posting is provided "AS IS" with no warranties, and confers no rights. > > > > > Thanks for your reply Hermit,
I'll have a look and test through it locally and let you know the result. Sincerely, Steven Cheng Microsoft MSDN Online Support Lead This posting is provided "AS IS" with no warranties, and confers no rights. Hi Hermit,
For the code you provided, there still has many undefined variables that may impact the test code logic. Would you send me a simplified project to demonstrate it? You can reach me through the email in my signature (remove "online"). Sincerely, Steven Cheng Microsoft MSDN Online Support Lead This posting is provided "AS IS" with no warranties, and confers no rights. Steven,
I will send you a test demo tomorrow on your email address. Show quote "Steven Cheng[MSFT]" wrote: > Hi Hermit, > > For the code you provided, there still has many undefined variables that > may impact the test code logic. Would you send me a simplified project to > demonstrate it? You can reach me through the email in my signature (remove > "online"). > > Sincerely, > > Steven Cheng > > Microsoft MSDN Online Support Lead > > > This posting is provided "AS IS" with no warranties, and confers no rights. > > Hi Hermit,
I've received the test package you sent and performed test. As you put those escaped special char entites in input string, some of them are not expand(such as the < and >) and others are expand( the "). I have checked the entityReference reading & expansion of .NET xml component in MSDN and it seems xmlreader will always expand character entities. That's why quotes are expand, for < and > , since < and > are illegal chars in xml document c ontent, they can not be expand. For other general entities the XmlTextReader has the "EntityHandling" property for control whether to preseve entityreference or not: #EntityReference Reading and Expansion http://msdn2.microsoft.com/en-us/library/a4f0e433(vs.71).aspx In addition, if the source XML document is originally ilegal(contains invalid characters, such as <, >) in content, you need to manually replace them (through IO reader) before the XML component parse them: #How to locate and replace special characters in an XML file with Visual C# .NET http://support.microsoft.com/kb/316063 Sincerely, Steven Cheng Microsoft MSDN Online Support Lead This posting is provided "AS IS" with no warranties, and confers no rights. Thanks for the detailed reply and pointers to MSDN docs :)
The entityHandling enum should have had a 3rd option on doing nothing :) rather than resovling char and entity references. Well if its the only behavior there is very little i can do. The xml data files do ensure that node inner xml is correctly encoded. However only < > and & chars need to be encoded. My initial guess based on a msdn doc was the have " and ' encoded too but i will revert those to chars to as was before. Thanks for your help Steven, Show quote "Steven Cheng[MSFT]" wrote: > Hi Hermit, > > I've received the test package you sent and performed test. As you put > those escaped special char entites in input string, some of them are not > expand(such as the < and >) and others are expand( the "). > > I have checked the entityReference reading & expansion of .NET xml > component in MSDN and it seems xmlreader will always expand character > entities. That's why quotes are expand, for < and > , since < and > > are illegal chars in xml document c ontent, they can not be expand. For > other general entities the XmlTextReader has the "EntityHandling" property > for control whether to preseve entityreference or not: > > #EntityReference Reading and Expansion > http://msdn2.microsoft.com/en-us/library/a4f0e433(vs.71).aspx > > In addition, if the source XML document is originally ilegal(contains > invalid characters, such as <, >) in content, you need to manually replace > them (through IO reader) before the XML component parse them: > > #How to locate and replace special characters in an XML file with Visual C# > .NET > http://support.microsoft.com/kb/316063 > > Sincerely, > > Steven Cheng > > Microsoft MSDN Online Support Lead > > > > This posting is provided "AS IS" with no warranties, and confers no rights. > > > > > > |
|||||||||||||||||||||||