C# 逐行读取word文档

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18555064/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-10 12:35:06  来源:igfitidea点击:

Read from word document line by line

c#asp.net.netms-wordoffice-interop

提问by Bat_Programmer


I'm trying to read a word document using C#. I am able to get all text but I want to be able to read line by lineand store in a list and bind to a gridview. Currently my code returns a list of one item only with all text (not line by line as desired). I'm using the Microsoft.Office.Interop.Wordlibrary to read the file. Below is my code till now:


我正在尝试使用 C# 读取 Word 文档。我能够获取所有文本,但我希望能够逐行读取并存储在列表中并绑定到 gridview。目前,我的代码仅返回一个包含所有文本的项目列表(而不是按需要逐行)。我正在使用Microsoft.Office.Interop.Word库来读取文件。以下是我到目前为止的代码:

    Application word = new Application();
    Document doc = new Document();

    object fileName = path;
    // Define an object to pass to the API for missing parameters
    object missing = System.Type.Missing;
    doc = word.Documents.Open(ref fileName,
            ref missing, ref missing, ref missing, ref missing,
            ref missing, ref missing, ref missing, ref missing,
            ref missing, ref missing, ref missing, ref missing,
            ref missing, ref missing, ref missing);

    String read = string.Empty;
    List<string> data = new List<string>();
    foreach (Range tmpRange in doc.StoryRanges)
    {
        //read += tmpRange.Text + "<br>";
        data.Add(tmpRange.Text);
    }
    ((_Document)doc).Close();
    ((_Application)word).Quit();

    GridView1.DataSource = data;
    GridView1.DataBind();

采纳答案by Bat_Programmer

Ok. I found the solution here.

好的。我在这里找到了解决方案。


The final code is as follows:


最终代码如下:

Application word = new Application();
Document doc = new Document();

object fileName = path;
// Define an object to pass to the API for missing parameters
object missing = System.Type.Missing;
doc = word.Documents.Open(ref fileName,
        ref missing, ref missing, ref missing, ref missing,
        ref missing, ref missing, ref missing, ref missing,
        ref missing, ref missing, ref missing, ref missing,
        ref missing, ref missing, ref missing);

String read = string.Empty;
List<string> data = new List<string>();
for (int i = 0; i < doc.Paragraphs.Count; i++)
{
    string temp = doc.Paragraphs[i + 1].Range.Text.Trim();
    if (temp != string.Empty)
        data.Add(temp);
}
((_Document)doc).Close();
((_Application)word).Quit();

GridView1.DataSource = data;
GridView1.DataBind();

回答by Pratik Anjania

The above code is correct, but it's too slow. I have improved the code, and it's much faster than the above one.

上面的代码是正确的,但是太慢了。我已经改进了代码,它比上面的要快得多。

List<string> data = new List<string>();
Application app = new Application();
Document doc = app.Documents.Open(ref readFromPath);

foreach (Paragraph objParagraph in doc.Paragraphs)
    data.Add(objParagraph.Range.Text.Trim());

((_Document)doc).Close();
((_Application)app).Quit();

回答by Chris

How about this yo. Get all the words from the doc and split them on return or whatever is better for you. Then turn into list

这个怎么样哟。从文档中获取所有单词并在返回时拆分它们或对您更好的任何内容。然后变成list

   List<string> lines = doc.Content.Text.Split('\n').ToList();