使用c#枚举Excel工作簿中单元格的有效方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/261374/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 20:19:18  来源:igfitidea点击:

Efficient method to enumerate cells in an Excel workbook using c#

c#excelautomation

提问by Edward Wilde

What is the most efficient way to enumerate every cell in every sheet in a workbook?

枚举工作簿中每个工作表中的每个单元格的最有效方法是什么?

The method below seems to work reasonably for a workbook with ~130,000 cells. On my machine it took ~26 seconds to open the file and ~5 seconds to enumerate the cells . However I'm no Excel expert and wanted to validate this code snippet with the wider community.

下面的方法似乎适用于包含约 130,000 个单元格的工作簿。在我的机器上,打开文件需要大约 26 秒,枚举单元格需要大约 5 秒。但是,我不是 Excel 专家,想在更广泛的社区中验证此代码片段。

DateTime timer = DateTime.Now;
Microsoft.Office.Interop.Excel.Application excelApplication = new Microsoft.Office.Interop.Excel.Application();
try
{
    exampleFile = new FileInfo(Path.Combine(System.Environment.CurrentDirectory, "Large.xlsx"));
    excelApplication.Workbooks.Open(exampleFile.FullName, false, false, missing, missing, missing, true, missing, missing, true, missing, missing, missing, missing, missing);
    Console.WriteLine(string.Format("Took {0} seconds to open file", (DateTime.Now - timer).Seconds.ToString()));

    timer = DateTime.Now;
    foreach(Workbook workbook in excelApplication.Workbooks)
    {
            foreach(Worksheet sheet in workbook.Sheets)
            {
            int i = 0, iRowMax, iColMax;
            string data = String.Empty;

            Object[,] rangeData = (System.Object[,]) sheet.UsedRange.Cells.get_Value(missing);

            if (rangeData != null)
            {
                iRowMax = rangeData.GetUpperBound(0);                       
                iColMax = rangeData.GetUpperBound(1);                                                       

                for (int iRow = 1; iRow < iRowMax; iRow++)
                {
                        for(int iCol = 1; iCol < iColMax; iCol++)
                    {
                        data = rangeData[iRow, iCol] != null ? rangeData[iRow, iCol].ToString() : string.Empty;
                        if (i % 100 == 0)
                        {
                            Console.WriteLine(String.Format("Processed {0} cells.", i));
                        }

                        i++;
                    }                                                                                                   
                }   
            }
        }

        workbook.Close(false, missing, missing);
    }

    Console.WriteLine(string.Format("Took {0} seconds to parse file", (DateTime.Now - timer).Seconds.ToString()));              
    }
    finally
    {
        excelApplication.Workbooks.Close();             
        excelApplication.Quit();                  
    }                   

Edit:

编辑

Worth stating that I want to use PIA and interop in order to access properties of excel workbooks that are not exposed by API's that work directly with the Excel file.

值得说明的是,我想使用 PIA 和互操作来访问 excel 工作簿的属性,这些属性不是由直接使用 Excel 文件的 API 公开的。

采纳答案by Tamas Czinege

Excel PIA Interop is really slow when you are doing things cell by cell.

当您逐个单元地做事时,Excel PIA Interop 真的很慢。

You should select the range you want to extract, like you did with the Worksheet.UsedRangeproperty and then read the value of the whole range in one step, by invoking get_Value()(or just simply by reading the Valueor Value2property, I can't remember which one) on it.

您应该选择要提取的范围,就像您对Worksheet.UsedRange属性所做的那样,然后通过调用get_Value()(或者只是通过读取ValueValue2属性,我不记得是哪一个)一步读取整个范围的值它。

This will yield an object[,], that is, a two dimensional array, which can be easily enumerated and is quick to be read.

这将产生一个object[,],即一个二维数组,它可以很容易地枚举并且可以快速读取。

EDIT: I just read your actual code and realized that it actually does what I proposed. Shame on me for not reading the question properly before answering. In that case, you cannot make it much faster. Excel PIA Interop is slow. If you need a quicker solution you will have to either migrate jExcelApi from Java to C# (not a terribly hard thing to do) or use some commercial component. I suggest to avoid the OLEDB interface at all costs, in order to keep your sanity.

编辑:我刚刚阅读了您的实际代码,并意识到它实际上执行了我的建议。为我在回答之前没有正确阅读问题而感到羞耻。在这种情况下,你不能让它更快。Excel PIA 互操作很慢。如果您需要更快的解决方案,您将不得不将 jExcelApi 从 Java 迁移到 C#(这不是一件非常困难的事情)或使用一些商业组件。我建议不惜一切代价避免使用 OLEDB 接口,以保持理智。

Unrelated, but helpful tip: You should use the ?? operator. It is really handy. Instead of

无关但有用的提示:您应该使用 ?? 操作员。这真的很方便。代替

data = rangeData[iRow, iCol] != null ? rangeData[iRow, iCol].ToString() : string.Empty;

you could just write

你可以写

data = Convert.ToString(rangeData[iRow, iCol]) ?? string.Empty;

In that case, even String.Empty is not necessary since Convert.ToString(object)converts nullto an empty string anyway.

在这种情况下,即使 String.Empty 也不是必需的,因为Convert.ToString(object)null无论如何都会 转换为空字符串。

回答by TcKs

I think, this is the most efficient way, how do it with PIA. Maybe will littlebit faster using "foreach" insted of "for", but it will not dramatic change.

我认为,这是最有效的方式,如何用 PIA 做到这一点。也许使用“foreach”代替“for”会更快一点,但它不会发生巨大变化。

If is efficiency your primary goal, you should work with excel files directly - without excel application.

如果效率是您的主要目标,您应该直接使用 excel 文件 - 无需使用 excel 应用程序。

回答by Rune Grimstad

There is an open source implementation of an Excel reader and writer called Koogra. It allows you to read in the excel file and modify it using pure managed code. This would probably be much faster than the code you are using now.

有一个名为Koogra的 Excel 读取器和写入器的开源实现。它允许您读入 excel 文件并使用纯托管代码对其进行修改。这可能比您现在使用的代码快得多。