pandas 你如何对 Python DataFrames 进行单元测试

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41852686/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:51:46  来源:igfitidea点击:

How do you Unit Test Python DataFrames

unit-testingpandasnumpydataframe

提问by CodeGeek123

How do i unit test python dataframes?

我如何对 python 数据帧进行单元测试?

I have functions that have an input and output as dataframes. Almost every function I have does this. Now if i want to unit test this what is the best method of doing it? It seems a bit of an effort to create a new dataframe (with values populated) for every function?

我有输入和输出作为数据帧的函数。几乎我拥有的每个功能都这样做。现在,如果我想对此进行单元测试,最好的方法是什么?为每个函数创建一个新的数据框(填充了值)似乎有点费力?

Are there any materials you can refer me to? Should you write unit tests for these functions?

有什么材料可以参考吗?你应该为这些函数编写单元测试吗?

采纳答案by sechilds

While Pandas's test functions are primarily used for internal testing, NumPy includes a very useful set of testing functions that are documented here: NumPy Test Support.

虽然 Pandas 的测试功能主要用于内部测试,但 NumPy 包含一组非常有用的测试功能,记录在此处:NumPy 测试支持

These functions compare NumPy arrays, but you can get array that underlie a Pandas Data Frame using the valuesproperty. You can define a simple Data Frame and compare what your function returns to what you expect.

这些函数比较 NumPy 数组,但您可以使用该values属性获取 Pandas 数据框基础的数组。您可以定义一个简单的数据框并将您的函数返回的内容与您期望的内容进行比较。

One technique you can use is to define one set of test data for a number of functions. That way, you can use Pytest Fixturesto define that Data Frame once, and use it in multiple tests.

您可以使用的一种技术是为多个函数定义一组测试数据。这样,您可以使用Pytest Fixtures定义该数据帧一次,并在多个测试中使用它。

In terms of resources, I found this article on Testing with NumPy and Pandasto be very useful. I also did a short presentation about data analysis testing at PyCon Canada this year: Automate Your Data Analysis Testing.

在资源方面,我发现这篇关于用 NumPy 和 Pandas进行测试的文章非常有用。今年我还在 PyCon Canada 上做了一个关于数据分析测试的简短演讲:Automate Your Data Analysis Testing

回答by Mohamed Thasin ah

you can use pandas testing functions:

您可以使用Pandas测试功能:

It will give more flexbile to compare your result with computed result in different ways.

以不同的方式将您的结果与计算结果进行比较将提供更多的灵活性。

For example:

例如:

df1=pd.DataFrame({'a':[1,2,3,4,5]})
df2=pd.DataFrame({'a':[6,7,8,9,10]})

expected_res=pd.Series([7,9,11,13,15])
pd.testing.assert_series_equal((df1['a']+df2['a']),expected_res,check_names=False)

For more details refer this link

有关更多详细信息,请参阅此链接

回答by rtkaleta

I don't think it's hard to create small DataFrames for unit testing?

我认为创建用于单元测试的小型 DataFrame 不难吗?

import pandas as pd
from nose.tools import assert_dict_equal

input = pd.DataFrame.from_dict({
    'field_1': [some, values],
    'field_2': [other, values]
})
expected = {
    'result': [...]
}
assert_dict_equal(expected, my_func(input).to_dict(), "oops, there's a bug...")

回答by John Zwinck

I would suggest writing the values as CSV in docstrings (or separate files if they're large) and parsing them using pd.read_csv(). You can parse the expected output from CSV too, and compare, or else use df.to_csv()to write a CSV out and diff it.

我建议将值作为 CSV 写入文档字符串(或单独的文件,如果它们很大)并使用pd.read_csv(). 您也可以解析来自 CSV 的预期输出,并进行比较,或者df.to_csv()用于写出 CSV 并对其进行比较。