如何在Java中使用Google Caja HTML/CSS消毒JS库

How to use Google Caja HTML/CSS sanitizer JS library in Java

本文关键字:HTML CSS 消毒 JS Caja Google Java      更新时间:2023-09-26

我有一个接受自定义CSS字段的Java API。在将CSS存储在数据库中之前,我需要对其进行消毒,并希望使用Google Caja来实现这一点。

首先,我尝试使用Rhino JavaScript引擎运行Google Caja HTML/CSS消毒JavaScript库。不幸的是,这并没有起作用,因为该库在很大程度上依赖于DOM(特别是window对象)的存在。

接下来,我从Maven存储库中导入了Caja项目。我看了一些测试,但找不到如何使用消毒液的例子。

我可以尝试将浏览器带到服务器上,但这似乎有点过分。

有人能够使用Caja在Java中净化CSS字符串吗?

提前感谢!

如果您计划在Java服务器上进行消毒,我建议您使用OWASP HTML消毒器,它显然是基于Caja的代码。它包括将<a>元素消毒为包括rel="nofollow"的能力。

import org.owasp.html.PolicyFactory;
import static org.owasp.html.Sanitizers.BLOCKS;
import static org.owasp.html.Sanitizers.FORMATTING;
import static org.owasp.html.Sanitizers.IMAGES;
import static org.owasp.html.Sanitizers.LINKS;
PolicyFactory sanitiser = BLOCKS.and(FORMATTING).and(IMAGES).and(LINKS);
String htmlSanitised = sanitiser.sanitize(htmlSource)

然而,要从Java调用Caja,Rhino(Java 7)和Nashorn(Java 8)都可以使用:

import javax.script.Bindings;
import javax.script.ScriptContext;
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
import javax.script.ScriptException;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
public class CajaSanitiser {
    private final ScriptEngine engine;
    private final Bindings bindings;
    public CajaSanitiser() throws IOException, ScriptException {
        this.engine = new ScriptEngineManager().getEngineByName("js");
        this.bindings = engine.getBindings(ScriptContext.ENGINE_SCOPE);
        String scriptName = "com/google/caja/plugin/html-css-sanitizer-minified.js";
        try (BufferedReader reader = getReader(scriptName)) {
            engine.eval(reader);
        }
        String identity = "function identity(value) {return value;}";
        engine.eval(identity);
    }
    private BufferedReader getReader(String name) {
        return new BufferedReader(new InputStreamReader(
                getClass().getClassLoader().getResourceAsStream(name)));
    }
    public String sanitise(String htmlSource) throws ScriptException {
        bindings.put("src", htmlSource);
        // You can use other functions beside 'identity' if you
        // want to transform the html.
        // See https://code.google.com/p/google-caja/wiki/JsHtmlSanitizer
        return (String) engine.eval("html_sanitize(src, identity, identity)");
    }
    public static void main(String[] args) throws Exception {
        CajaSanitiser sanitiser = new CajaSanitiser();
        String source = "<html>'n" +
                "<head>'n" +
                "<style>'n" +
                "h1 {color:blue;}'n" +
                "</style>'n" +
                "</head>'n" +
                "<body>'n" +
                "<h1>A heading</h1>'n" +
                "</body>'n" +
                "</html>";
        System.out.println("Original HTML with CSS:");
        System.out.println(source);
        System.out.println();
        System.out.println("Sanitised HTML:");
        System.out.println(sanitiser.sanitise(source));
    }
}

我将此作为Maven配置的一部分:

<dependencies>
    <dependency>
        <groupId>caja</groupId>
        <artifactId>caja</artifactId>
        <version>r5127</version>
    </dependency>
</dependencies>
<repositories>
    <repository>
        <id>caja</id>
        <name>caja</name>
        <url>http://google-caja.googlecode.com/svn/maven</url>
    </repository>
</repositories>

Google Caja也是一个"Java项目",因此您可以直接用Java执行Caja可以做的任何事情。例如,您可以查看Caja单元测试用例,在这里直接用java验证CSS。