使用JavaScript检查XML错误

Check for XML errors using JavaScript

本文关键字：错误 XML 检查 JavaScript 使用更新时间：2023-09-26

问题：如何在现代浏览器（IE除外）中检查XML的语法？

我在W3Schools上看到一个页面，其中包含一个XML语法检查器。我不知道它是如何运作的，但我想知道我如何才能实现同样的行为。

我已经对这个问题进行了多次搜索（没有成功），并尝试使用DOMParser来检查我的XML是否"格式良好"（也没有成功）。

var xml="Caleb"；var解析器=新DOMParser（）；var doc=parser.parseFromString（xml，'text/xml'）；

我希望解析器告诉我有一个XML语法错误（即一个未关闭的name标记）。然而，它总是返回一个XMLDOM对象，就好像根本没有错误一样。

总之，我想知道如何使用JavaScript自动检查XML文档的语法。

附言：有没有什么方法可以根据DTD验证XML文档（使用JS，而不是IE）

编辑：这里有一个更简洁的例子，来自MDN:

var xmlString = '<a id="a"><b id="b">hey!</b></a>';
var domParser = new DOMParser();
var dom = domParser.parseFromString(xmlString, 'text/xml');
// print the name of the root element or error message
dump(dom.documentElement.nodeName == 'parsererror' ? 'error while parsing' : dom.documentElement.nodeName);

上面的NoBugs答案对我来说不适用于当前的chrome。我建议：

var sMyString = "<a id='"a'"><b id='"b'">hey!<'/b><'/a>";
var oParser = new DOMParser();
var oDOM = oParser.parseFromString(sMyString, "text/xml");
dump(oDOM.getElementsByTagName('parsererror').length ? 
     (new XMLSerializer()).serializeToString(oDOM) : "all good"    
);

您还可以使用包快速xml解析器，该包对xml文件进行了验证检查：

import { validate, parse } from 'fast-xml-parser';
if( validate(xmlData) === true) {
  var jsonObj = parse(xmlData,options);
}

只需F12即可进入开发人员模式并检查源，然后即可搜索validateXML，并找到一个非常长的完整XML检查器以供参考。

我使用react和DOMParser来显示错误消息，如下所示：

  handleXmlCheck = () => {
    const { fileContent } = this.state;
    const parser = new window.DOMParser();
    const theDom = parser.parseFromString(fileContent, 'application/xml');
    if (theDom.getElementsByTagName('parsererror').length > 0) {
      showErrorMessage(theDom.getElementsByTagName('parsererror')[0].getElementsByTagName('div')[0].innerHTML);
    } else {
      showSuccessMessage('Valid Xml');
    }
  }

javscript中的基本xml验证器。此代码可能对高级xml无效，但对基本xml无效。

function xmlValidator(xml){
    // var xml = "<note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>";
    while(xml.indexOf('<') != -1){
        var sub = xml.substring(xml.indexOf('<'), xml.indexOf('>')+1);
        var value = xml.substring(xml.indexOf('<')+1, xml.indexOf('>'));
        var endTag = '</'+value+'>';
        if(xml.indexOf(endTag) != -1){
            // console.log('xml is valid');
            // break;
        }else{
            console.log('xml is in invalid');
            break;
        }
        xml = xml.replace(sub, '');
        xml = xml.replace(endTag, '');
        console.log(xml);
        console.log(sub+' '+value+' '+endTag);
    }
}
var xml = "<note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>";
xmlValidator(xml);

/**
 * Check if the input is a valid XML file.
 * @param xmlStr The input to be parsed.
 * @returns If the input is invalid, this returns an XMLDocument explaining the problem.
 * If the input is valid, this return undefined.
 */
export function xmlIsInvalid(xmlStr : string) : HTMLElement | undefined {
  const parser = new DOMParser();
  const dom = parser.parseFromString(xmlStr, "application/xml");
  // https://developer.mozilla.org/en-US/docs/Web/API/DOMParser/parseFromString
  // says that parseFromString() will throw an error if the input is invalid.
  //
  // https://developer.mozilla.org/en-US/docs/Web/Guide/Parsing_and_serializing_XML
  // says dom.documentElement.nodeName == "parsererror" will be true of the input
  // is invalid.
  //
  // Neither of those is true when I tested it in Chrome.  Nothing is thrown.
  // If the input is "" I get:
  // dom.documentElement.nodeName returns "html", 
  // doc.documentElement.firstElementChild.nodeName returns "body" and
  // doc.documentElement.firstElementChild.firstElementChild.nodeName = "parsererror".
  //
  // It seems that the parsererror can move around.  It looks like it's trying to
  // create as much of the XML tree as it can, then it inserts parsererror whenever 
  // and wherever it gets stuck.  It sometimes generates additional XML after the
  // parsererror, so .lastElementChild might not find the problem.
  //
  // In case of an error the <parsererror> element will be an instance of
  // HTMLElement.  A valid XML document can include an element with name name
  // "parsererror", however it will NOT be an instance of HTMLElement.
  //
  // getElementsByTagName('parsererror') might be faster than querySelectorAll().
  for (const element of Array.from(dom.querySelectorAll("parsererror"))) {
    if (element instanceof HTMLElement) {
      // Found the error.
      return element;
    }
  }
  // No errors found.
  return;
}

（从技术上讲，这是TypeScript。删除: string和: HTMLElement | undefined使其成为JavaScript。）