使用JavaScript检查XML错误

Check for XML errors using JavaScript

本文关键字:错误 XML 检查 JavaScript 使用      更新时间:2023-09-26

问题:如何在现代浏览器(IE除外)中检查XML的语法?

我在W3Schools上看到一个页面,其中包含一个XML语法检查器。我不知道它是如何运作的,但我想知道我如何才能实现同样的行为。

我已经对这个问题进行了多次搜索(没有成功),并尝试使用DOMParser来检查我的XML是否"格式良好"(也没有成功)。

var xml="Caleb";var解析器=新DOMParser();var doc=parser.parseFromString(xml,'text/xml');

我希望解析器告诉我有一个XML语法错误(即一个未关闭的name标记)。然而,它总是返回一个XMLDOM对象,就好像根本没有错误一样。

总之,我想知道如何使用JavaScript自动检查XML文档的语法。

附言:有没有什么方法可以根据DTD验证XML文档(使用JS,而不是IE)

编辑:这里有一个更简洁的例子,来自MDN:

var xmlString = '<a id="a"><b id="b">hey!</b></a>';
var domParser = new DOMParser();
var dom = domParser.parseFromString(xmlString, 'text/xml');
// print the name of the root element or error message
dump(dom.documentElement.nodeName == 'parsererror' ? 'error while parsing' : dom.documentElement.nodeName);

上面的NoBugs答案对我来说不适用于当前的chrome。我建议:

var sMyString = "<a id='"a'"><b id='"b'">hey!<'/b><'/a>";
var oParser = new DOMParser();
var oDOM = oParser.parseFromString(sMyString, "text/xml");
dump(oDOM.getElementsByTagName('parsererror').length ? 
     (new XMLSerializer()).serializeToString(oDOM) : "all good"    
);

您还可以使用包快速xml解析器,该包对xml文件进行了验证检查:

import { validate, parse } from 'fast-xml-parser';
if( validate(xmlData) === true) {
  var jsonObj = parse(xmlData,options);
}

只需F12即可进入开发人员模式并检查,然后即可搜索validateXML,并找到一个非常长的完整XML检查器以供参考。

我使用reactDOMParser来显示错误消息,如下所示:

  handleXmlCheck = () => {
    const { fileContent } = this.state;
    const parser = new window.DOMParser();
    const theDom = parser.parseFromString(fileContent, 'application/xml');
    if (theDom.getElementsByTagName('parsererror').length > 0) {
      showErrorMessage(theDom.getElementsByTagName('parsererror')[0].getElementsByTagName('div')[0].innerHTML);
    } else {
      showSuccessMessage('Valid Xml');
    }
  }

javscript中的基本xml验证器。此代码可能对高级xml无效,但对基本xml无效。

function xmlValidator(xml){
    // var xml = "<note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>";
    while(xml.indexOf('<') != -1){
        var sub = xml.substring(xml.indexOf('<'), xml.indexOf('>')+1);
        var value = xml.substring(xml.indexOf('<')+1, xml.indexOf('>'));
        var endTag = '</'+value+'>';
        if(xml.indexOf(endTag) != -1){
            // console.log('xml is valid');
            // break;
        }else{
            console.log('xml is in invalid');
            break;
        }
        xml = xml.replace(sub, '');
        xml = xml.replace(endTag, '');
        console.log(xml);
        console.log(sub+' '+value+' '+endTag);
    }
}
var xml = "<note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>";
xmlValidator(xml);
/**
 * Check if the input is a valid XML file.
 * @param xmlStr The input to be parsed.
 * @returns If the input is invalid, this returns an XMLDocument explaining the problem.
 * If the input is valid, this return undefined.
 */
export function xmlIsInvalid(xmlStr : string) : HTMLElement | undefined {
  const parser = new DOMParser();
  const dom = parser.parseFromString(xmlStr, "application/xml");
  // https://developer.mozilla.org/en-US/docs/Web/API/DOMParser/parseFromString
  // says that parseFromString() will throw an error if the input is invalid.
  //
  // https://developer.mozilla.org/en-US/docs/Web/Guide/Parsing_and_serializing_XML
  // says dom.documentElement.nodeName == "parsererror" will be true of the input
  // is invalid.
  //
  // Neither of those is true when I tested it in Chrome.  Nothing is thrown.
  // If the input is "" I get:
  // dom.documentElement.nodeName returns "html", 
  // doc.documentElement.firstElementChild.nodeName returns "body" and
  // doc.documentElement.firstElementChild.firstElementChild.nodeName = "parsererror".
  //
  // It seems that the parsererror can move around.  It looks like it's trying to
  // create as much of the XML tree as it can, then it inserts parsererror whenever 
  // and wherever it gets stuck.  It sometimes generates additional XML after the
  // parsererror, so .lastElementChild might not find the problem.
  //
  // In case of an error the <parsererror> element will be an instance of
  // HTMLElement.  A valid XML document can include an element with name name
  // "parsererror", however it will NOT be an instance of HTMLElement.
  //
  // getElementsByTagName('parsererror') might be faster than querySelectorAll().
  for (const element of Array.from(dom.querySelectorAll("parsererror"))) {
    if (element instanceof HTMLElement) {
      // Found the error.
      return element;
    }
  }
  // No errors found.
  return;
}

(从技术上讲,这是TypeScript。删除: string: HTMLElement | undefined使其成为JavaScript。)