Java 替换 Apache POI XWPF 中的文本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22268898/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 14:38:14  来源:igfitidea点击:

Replacing a text in Apache POI XWPF

javams-wordapache-poixwpf

提问by Gagan93

I just found Apache POI library very useful for editing Word files using Java. Specifically, I want to edit a DOCXfile using Apache POI's XWPF classes. I found no proper method / documentation following which I could do this. Can somebody please explain in steps, how to replace some text in a DOCX file.

我刚刚发现 Apache POI 库对于使用 Java 编辑 Word 文件非常有用。具体来说,我想使用 Apache POI 的 XWPF 类编辑DOCX文件。我没有找到可以执行此操作的正确方法/文档。有人可以按步骤解释如何替换 DOCX 文件中的某些文本。

** The text may be in a line / paragraph or in a table row/column

** 文本可以在一行/段落或表格行/列中

Thanks in Advance :)

提前致谢 :)

采纳答案by Gagravarr

The method you need is XWPFRun.setText(String). Simply work your way through the file until you find the XWPFRun of interest, work out what you want the new text to be, and replace it. (A run is a sequence of text with the same formatting)

您需要的方法是XWPFRun.setText(String)。只需按自己的方式浏览文件,直到找到感兴趣的 XWPFRun,找出您想要的新文本,然后替换它。(运行是具有相同格式的文本序列)

You should be able to do something like:

您应该能够执行以下操作:

XWPFDocument doc = new XWPFDocument(OPCPackage.open("input.docx"));
for (XWPFParagraph p : doc.getParagraphs()) {
    List<XWPFRun> runs = p.getRuns();
    if (runs != null) {
        for (XWPFRun r : runs) {
            String text = r.getText(0);
            if (text != null && text.contains("needle")) {
                text = text.replace("needle", "haystack");
                r.setText(text, 0);
            }
        }
    }
}
for (XWPFTable tbl : doc.getTables()) {
   for (XWPFTableRow row : tbl.getRows()) {
      for (XWPFTableCell cell : row.getTableCells()) {
         for (XWPFParagraph p : cell.getParagraphs()) {
            for (XWPFRun r : p.getRuns()) {
              String text = r.getText(0);
              if (text != null && text.contains("needle")) {
                text = text.replace("needle", "haystack");
                r.setText(text,0);
              }
            }
         }
      }
   }
}
doc.write(new FileOutputStream("output.docx"));

回答by birya

The first chunk of code is giing me a NullPointerException, anyone know what is wrong?

第一段代码给我一个 NullPointerException,有谁知道出了什么问题?

run.getText(int position) - from documentation: Returns: the text of this text run or null if not set

run.getText(int position) - 来自文档:返回:此文本的文本 run 或 null 如果未设置

Just check if it is not null before calling contains() on it

在调用 contains() 之前检查它是否不为 null

And btw if you want to replace the text you need to set it in position from which you get it, in this case r.setText(text, 0);. Otherwise text will be added not replaced

顺便说一句,如果你想替换文本,你需要将它设置在你从中获取它的位置,在这种情况下是 r.setText(text, 0);。否则文本将被添加而不是被替换

回答by Kyle Willkomm

Here is what we did for text replacement using Apache POI. We found that it was not worth the hassle and simpler to replace the text of an entire XWPFParagraph instead of a run. A run can be randomly split in the middle of a word as Microsoft Word is in charge of where runs are created within the paragraph of a document. Therefore the text you might be searching for could be half in one run and half in another. Using the full text of a paragraph, removing its existing runs, and adding a new run with the adjusted text seems to solve the problem of text replacement.

这是我们使用 Apache POI 为文本替换所做的工作。我们发现替换整个 XWPFParagraph 的文本而不是一个运行的麻烦和简单的做法是不值得的。由于 Microsoft Word 负责在文档段落内创建运行的位置,因此可以在单词中间随机拆分运行。因此,您可能正在搜索的文本可能一次运行一半,另一次运行。使用段落的全文,删除其现有的运行,并使用调整后的文本添加新运行似乎解决了文本替换的问题。

However there is a cost of doing the replacement at the paragraph level; you lose the formatting of the runs in that paragraph. For example if in the middle of your paragraph you had bolded the word "bits", and then when parsing the file you replaced the word "bits" with "bytes", the word "bytes" would no longer be bolded. Because the bolding was stored with a run that was removed when the paragraph's entire body of text was replaced. The attached code has a commented out section that was working for replacement of text at the run level if you need it.

但是,在段落级别进行替换是有成本的;您会丢失该段落中运行的格式。例如,如果您在段落中间将“bits”一词加粗,然后在解析文件时将“bits”一词替换为“bytes”,则“bytes”一词将不再加粗。因为粗体是随着段落的整个正文被替换时被删除的运行而存储的。附加的代码有一个注释掉的部分,如果需要,它可以在运行级别替换文本。

It should also be noted that the below works if the text you are inserting contains \n return characters. We could not find a way to insert returns without creating a run for each section prior to the return and marking the run addCarriageReturn(). Cheers

还应该注意的是,如果您插入的文本包含 \n 返回字符,则以下内容有效。我们无法找到插入返回的方法,而无需在返回之前为每个部分创建运行并标记运行 addCarriageReturn()。干杯

    package com.healthpartners.hcss.client.external.word.replacement;

import java.util.List;

import org.apache.commons.lang.StringUtils;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;

public class TextReplacer {
    private String searchValue;
    private String replacement;

    public TextReplacer(String searchValue, String replacement) {
        this.searchValue = searchValue;
        this.replacement = replacement;
    }

    public void replace(XWPFDocument document) {
        List<XWPFParagraph> paragraphs = document.getParagraphs();

    for (XWPFParagraph xwpfParagraph : paragraphs) {
        replace(xwpfParagraph);
    }
}

private void replace(XWPFParagraph paragraph) {
    if (hasReplaceableItem(paragraph.getText())) {
        String replacedText = StringUtils.replace(paragraph.getText(), searchValue, replacement);

        removeAllRuns(paragraph);

        insertReplacementRuns(paragraph, replacedText);
    }
}

private void insertReplacementRuns(XWPFParagraph paragraph, String replacedText) {
    String[] replacementTextSplitOnCarriageReturn = StringUtils.split(replacedText, "\n");

    for (int j = 0; j < replacementTextSplitOnCarriageReturn.length; j++) {
        String part = replacementTextSplitOnCarriageReturn[j];

        XWPFRun newRun = paragraph.insertNewRun(j);
        newRun.setText(part);

        if (j+1 < replacementTextSplitOnCarriageReturn.length) {
            newRun.addCarriageReturn();
        }
    }       
}

private void removeAllRuns(XWPFParagraph paragraph) {
    int size = paragraph.getRuns().size();
    for (int i = 0; i < size; i++) {
        paragraph.removeRun(0);
    }
}

private boolean hasReplaceableItem(String runText) {
    return StringUtils.contains(runText, searchValue);
}

//REVISIT The below can be removed if Michele tests and approved the above less versatile replacement version

//  private void replace(XWPFParagraph paragraph) {
//      for (int i = 0; i < paragraph.getRuns().size()  ; i++) {
//          i = replace(paragraph, i);
//      }
//  }

//  private int replace(XWPFParagraph paragraph, int i) {
//      XWPFRun run = paragraph.getRuns().get(i);
//      
//      String runText = run.getText(0);
//      
//      if (hasReplaceableItem(runText)) {
//          return replace(paragraph, i, run);
//      }
//      
//      return i;
//  }

//  private int replace(XWPFParagraph paragraph, int i, XWPFRun run) {
//      String runText = run.getCTR().getTArray(0).getStringValue();
//      
//      String beforeSuperLong = StringUtils.substring(runText, 0, runText.indexOf(searchValue));
//      
//      String[] replacementTextSplitOnCarriageReturn = StringUtils.split(replacement, "\n");
//      
//      String afterSuperLong = StringUtils.substring(runText, runText.indexOf(searchValue) + searchValue.length());
//      
//      Counter counter = new Counter(i);
//      
//      insertNewRun(paragraph, run, counter, beforeSuperLong);
//      
//      for (int j = 0; j < replacementTextSplitOnCarriageReturn.length; j++) {
//          String part = replacementTextSplitOnCarriageReturn[j];
//
//          XWPFRun newRun = insertNewRun(paragraph, run, counter, part);
//          
//          if (j+1 < replacementTextSplitOnCarriageReturn.length) {
//              newRun.addCarriageReturn();
//          }
//      }
//      
//      insertNewRun(paragraph, run, counter, afterSuperLong);
//      
//      paragraph.removeRun(counter.getCount());
//      
//      return counter.getCount();
//  }

//  private class Counter {
//      private int i;
//      
//      public Counter(int i) {
//          this.i = i;
//      }
//      
//      public void increment() {
//          i++;
//      }
//      
//      public int getCount() {
//          return i;
//      }
//  }

//  private XWPFRun insertNewRun(XWPFParagraph xwpfParagraph, XWPFRun run, Counter counter, String newText) {
//      XWPFRun newRun = xwpfParagraph.insertNewRun(counter.i);
//      newRun.getCTR().set(run.getCTR());
//      newRun.getCTR().getTArray(0).setStringValue(newText);
//      
//      counter.increment();
//      
//      return newRun;
//  }

回答by Sherin

The answer accepted here needs one more update along with Justin Skiles update. r.setText(text, 0); Reason: If not updating setText with pos variable, the output will be the combination of old string and replace string.

此处接受的答案需要与 Justin Skiles 更新一起再进行一次更新。r.setText(text, 0); 原因:如果不使用 pos 变量更新 setText,则输出将是旧字符串和替换字符串的组合。

回答by Thierry Bodhuin

If somebody needs also to keep the formatting of the text, this code works better.

如果有人还需要保留文本的格式,则此代码效果更好。

private static Map<Integer, XWPFRun> getPosToRuns(XWPFParagraph paragraph) {
    int pos = 0;
    Map<Integer, XWPFRun> map = new HashMap<Integer, XWPFRun>(10);
    for (XWPFRun run : paragraph.getRuns()) {
        String runText = run.text();
        if (runText != null) {
            for (int i = 0; i < runText.length(); i++) {
                map.put(pos + i, run);
            }
            pos += runText.length();
        }
    }
    return (map);
}

public static <V> void replace(XWPFDocument document, Map<String, V> map) {
    List<XWPFParagraph> paragraphs = document.getParagraphs();
    for (XWPFParagraph paragraph : paragraphs) {
        replace(paragraph, map);
    }
}

public static <V> void replace(XWPFDocument document, String searchText, V replacement) {
    List<XWPFParagraph> paragraphs = document.getParagraphs();
    for (XWPFParagraph paragraph : paragraphs) {
        replace(paragraph, searchText, replacement);
    }
}

private static <V> void replace(XWPFParagraph paragraph, Map<String, V> map) {
    for (Map.Entry<String, V> entry : map.entrySet()) {
        replace(paragraph, entry.getKey(), entry.getValue());
    }
}

public static <V> void replace(XWPFParagraph paragraph, String searchText, V replacement) {
    boolean found = true;
    while (found) {
        found = false;
        int pos = paragraph.getText().indexOf(searchText);
        if (pos >= 0) {
            found = true;
            Map<Integer, XWPFRun> posToRuns = getPosToRuns(paragraph);
            XWPFRun run = posToRuns.get(pos);
            XWPFRun lastRun = posToRuns.get(pos + searchText.length() - 1);
            int runNum = paragraph.getRuns().indexOf(run);
            int lastRunNum = paragraph.getRuns().indexOf(lastRun);
            String texts[] = replacement.toString().split("\n");
            run.setText(texts[0], 0);
            XWPFRun newRun = run;
            for (int i = 1; i < texts.length; i++) {
                newRun.addCarriageReturn();
                newRun = paragraph.insertNewRun(runNum + i);
                /*
                    We should copy all style attributes
                    to the newRun from run
                    also from background color, ...
                    Here we duplicate only the simple attributes...
                 */
                newRun.setText(texts[i]);
                newRun.setBold(run.isBold());
                newRun.setCapitalized(run.isCapitalized());
                // newRun.setCharacterSpacing(run.getCharacterSpacing());
                newRun.setColor(run.getColor());
                newRun.setDoubleStrikethrough(run.isDoubleStrikeThrough());
                newRun.setEmbossed(run.isEmbossed());
                newRun.setFontFamily(run.getFontFamily());
                newRun.setFontSize(run.getFontSize());
                newRun.setImprinted(run.isImprinted());
                newRun.setItalic(run.isItalic());
                newRun.setKerning(run.getKerning());
                newRun.setShadow(run.isShadowed());
                newRun.setSmallCaps(run.isSmallCaps());
                newRun.setStrikeThrough(run.isStrikeThrough());
                newRun.setSubscript(run.getSubscript());
                newRun.setUnderline(run.getUnderline());
            }
            for (int i = lastRunNum + texts.length - 1; i > runNum + texts.length - 1; i--) {
                paragraph.removeRun(i);
            }
        }
    }
}

回答by ron

my task was to replace texts of the format ${key} with values of a map within a word docx document. The above solutions were a good starting point but did not take into account all cases: ${key} can be spread not only across multiple runs but also across multiple texts within a run. I therefore ended up with the following code:

我的任务是用 word docx 文档中的地图值替换 ${key} 格式的文本。上述解决方案是一个很好的起点,但并未考虑所有情况: ${key} 不仅可以跨多个运行分布,还可以跨运行中的多个文本分布。因此,我最终得到了以下代码:

    private void replace(String inFile, Map<String, String> data, OutputStream out) throws Exception, IOException {
    XWPFDocument doc = new XWPFDocument(OPCPackage.open(inFile));
    for (XWPFParagraph p : doc.getParagraphs()) {
        replace2(p, data);
    }
    for (XWPFTable tbl : doc.getTables()) {
        for (XWPFTableRow row : tbl.getRows()) {
            for (XWPFTableCell cell : row.getTableCells()) {
                for (XWPFParagraph p : cell.getParagraphs()) {
                    replace2(p, data);
                }
            }
        }
    }
    doc.write(out);
}

private void replace2(XWPFParagraph p, Map<String, String> data) {
    String pText = p.getText(); // complete paragraph as string
    if (pText.contains("${")) { // if paragraph does not include our pattern, ignore
        TreeMap<Integer, XWPFRun> posRuns = getPosToRuns(p);
        Pattern pat = Pattern.compile("\$\{(.+?)\}");
        Matcher m = pat.matcher(pText);
        while (m.find()) { // for all patterns in the paragraph
            String g = m.group(1);  // extract key start and end pos
            int s = m.start(1);
            int e = m.end(1);
            String key = g;
            String x = data.get(key);
            if (x == null)
                x = "";
            SortedMap<Integer, XWPFRun> range = posRuns.subMap(s - 2, true, e + 1, true); // get runs which contain the pattern
            boolean found1 = false; // found $
            boolean found2 = false; // found {
            boolean found3 = false; // found }
            XWPFRun prevRun = null; // previous run handled in the loop
            XWPFRun found2Run = null; // run in which { was found
            int found2Pos = -1; // pos of { within above run
            for (XWPFRun r : range.values())
            {
                if (r == prevRun)
                    continue; // this run has already been handled
                if (found3)
                    break; // done working on current key pattern
                prevRun = r;
                for (int k = 0;; k++) { // iterate over texts of run r
                    if (found3)
                        break;
                    String txt = null;
                    try {
                        txt = r.getText(k); // note: should return null, but throws exception if the text does not exist
                    } catch (Exception ex) {

                    }
                    if (txt == null)
                        break; // no more texts in the run, exit loop
                    if (txt.contains("$") && !found1) {  // found $, replace it with value from data map
                        txt = txt.replaceFirst("\$", x);
                        found1 = true;
                    }
                    if (txt.contains("{") && !found2 && found1) {
                        found2Run = r; // found { replace it with empty string and remember location
                        found2Pos = txt.indexOf('{');
                        txt = txt.replaceFirst("\{", "");
                        found2 = true;
                    }
                    if (found1 && found2 && !found3) { // find } and set all chars between { and } to blank
                        if (txt.contains("}"))
                        {
                            if (r == found2Run)
                            { // complete pattern was within a single run
                                txt = txt.substring(0, found2Pos)+txt.substring(txt.indexOf('}'));
                            }
                            else // pattern spread across multiple runs
                                txt = txt.substring(txt.indexOf('}'));
                        }
                        else if (r == found2Run) // same run as { but no }, remove all text starting at {
                            txt = txt.substring(0,  found2Pos);
                        else
                            txt = ""; // run between { and }, set text to blank
                    }
                    if (txt.contains("}") && !found3) {
                        txt = txt.replaceFirst("\}", "");
                        found3 = true;
                    }
                    r.setText(txt, k);
                }
            }
        }
        System.out.println(p.getText());

    }

}

private TreeMap<Integer, XWPFRun> getPosToRuns(XWPFParagraph paragraph) {
    int pos = 0;
    TreeMap<Integer, XWPFRun> map = new TreeMap<Integer, XWPFRun>();
    for (XWPFRun run : paragraph.getRuns()) {
        String runText = run.text();
        if (runText != null && runText.length() > 0) {
            for (int i = 0; i < runText.length(); i++) {
                map.put(pos + i, run);
            }
            pos += runText.length();
        }

    }
    return map;
}

回答by Optio

I suggest my solution for replacing text between #, for example: This #bookmark# should be replaced.It is replace in:

我建议我在# 之间替换文本的解决方案,例如:这个#bookmark# 应该被替换。它被替换为:

  • paragraphs;
  • tables;
  • footers.
  • 段落;
  • 桌子;
  • 页脚。

Also, it takes into account situations, when symbol # and bookmark are in the separated runs (replace variable between different runs).

此外,它还考虑了符号 # 和书签在单独运行中的情况(在不同运行之间替换变量)。

Here link to the code: https://gist.github.com/aerobium/bf02e443c079c5caec7568e167849dda

这里链接到代码:https: //gist.github.com/aerobium/bf02e443c079c5caec7568e167849dda

回答by Deividas Strioga

As of the date of writing, none of the answers replace properly.

截至撰写本文之日,没有一个答案正确替换。

Gagravars answer does not include cases where words to replace are split in runs; Thierry Boduins solution sometimes left words to replace blank when they were after other words to replace, also it does not check tables.

Gagravars 的回答不包括要替换的单词在运行中被拆分的情况;Thierry Boduins 解决方案有时会在替换其他单词之后将单词替换为空白,而且它也不检查表格。

Using Gagtavars answer as base I have also checked the run before current run if the text of both runs contain the word to replace, adding else block. My addition in kotlin:

使用 Gagtavars 答案作为基础,如果两次运行的文本都包含要替换的单词,我还检查了当前运行之前的运行,并添加了 else 块。我在 kotlin 中的补充:

if (text != null) {
        if (text.contains(findText)) {
            text = text.replace(findText, replaceText)
            r.setText(text, 0)
        } else if (i > 0 && p.runs[i - 1].getText(0).plus(text).contains(findText)) {
            val pos = p.runs[i - 1].getText(0).indexOf('$')
            text = textOfNotFullSecondRun(text, findText)
            r.setText(text, 0)
            val findTextLengthInFirstRun = findTextPartInFirstRun(p.runs[i - 1].getText(0), findText)
            val prevRunText = p.runs[i - 1].getText(0).replaceRange(pos, findTextLengthInFirstRun, replaceText)
            p.runs[i - 1].setText(prevRunText, 0)
        }
    }

private fun textOfNotFullSecondRun(text: String, findText: String): String {
    return if (!text.contains(findText)) {
        textOfNotFullSecondRun(text, findText.drop(1))
    } else {
        text.replace(findText, "")
    }
}

private fun findTextPartInFirstRun(text: String, findText: String): Int {
    return if (text.contains(findText)) {
        findText.length
    } else {
        findTextPartInFirstRun(text, findText.dropLast(1))
    }
}

it is the list of runs in a paragraph. Same with the search block in the table. With this solution I did not have any issues yet. All formatting is intact.

它是一个段落中的运行列表。与表中的搜索块相同。有了这个解决方案,我还没有遇到任何问题。所有格式都完好无损。

Edit: I made a java lib for replacing, check it out: https://github.com/deividasstr/docx-word-replacer

编辑:我制作了一个用于替换的 Java 库,请查看:https: //github.com/deividasstr/docx-word-replacer

回答by Dmitry Stolbov

There is the replaceParagraphimplementation that replaces ${key}with value(the fieldsForReportparameter) and saves format by merging runscontents ${key}.

replaceParagraph替换${key}with value(fieldsForReport参数) 并通过合并runs内容保存格式的实现${key}

private void replaceParagraph(XWPFParagraph paragraph, Map<String, String> fieldsForReport) throws POIXMLException {
    String find, text, runsText;
    List<XWPFRun> runs;
    XWPFRun run, nextRun;
    for (String key : fieldsForReport.keySet()) {
        text = paragraph.getText();
        if (!text.contains("${"))
            return;
        find = "${" + key + "}";
        if (!text.contains(find))
            continue;
        runs = paragraph.getRuns();
        for (int i = 0; i < runs.size(); i++) {
            run = runs.get(i);
            runsText = run.getText(0);
            if (runsText.contains("${") || (runsText.contains("$") && runs.get(i + 1).getText(0).substring(0, 1).equals("{"))) {
                //As the next run may has a closed tag and an open tag at 
                //the same time, we have to be sure that our building string 
                //has a fully completed tags 
                while (!openTagCountIsEqualCloseTagCount(runsText))) {
                    nextRun = runs.get(i + 1);
                    runsText = runsText + nextRun.getText(0);
                    paragraph.removeRun(i + 1);
                }
                run.setText(runsText.contains(find) ?
                        runsText.replace(find, fieldsForReport.get(key)) :
                        runsText, 0);
            }
        }
    }
}

private boolean openTagCountIsEqualCloseTagCount(String runText) {
    int openTagCount = runText.split("\$\{", -1).length - 1;
    int closeTagCount = runText.split("}", -1).length - 1;
    return openTagCount == closeTagCount;
}

Implementation replaceParagraph

实现替换段落

Unit test

单元测试