如何在单元测试中使用 Pandas 数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27950891/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to use a pandas data frame in a unit test
提问by tjb305
I am developing a set of python scripts to pre-process a dataset then produce a series of machine learning models using scikit-learn. I would like to develop a set of unittests to check the data pre-processing functions, and would like to be able to use a small test pandas dataframe for which I can determine the answers for and use it in assert statements.
我正在开发一组 python 脚本来预处理数据集,然后使用 scikit-learn 生成一系列机器学习模型。我想开发一组单元测试来检查数据预处理功能,并希望能够使用一个小的测试Pandas数据框,我可以确定答案并在断言语句中使用它。
I cannot seem to get it to load the dataframe and to pass it to the unit tests using self. My code looks something like this;
我似乎无法让它加载数据帧并将其传递给使用 self. 我的代码看起来像这样;
def setUp(self):
TEST_INPUT_DIR = 'data/'
test_file_name = 'testdata.csv'
try:
data = pd.read_csv(INPUT_DIR + test_file_name,
sep = ',',
header = 0)
except IOError:
print 'cannot open file'
self.fixture = data
def tearDown(self):
del self.fixture
def test1(self):
self.assertEqual(somefunction(self.fixture), somevalue)
if __name__ == '__main__':
unittest.main()
Thanks for the help.
谢谢您的帮助。
回答by Adam Slack
Pandas has some utilities for testing.
Pandas 有一些用于测试的实用程序。
import unittest
import pandas as pd
from pandas.util.testing import assert_frame_equal # <-- for testing dataframes
class DFTests(unittest.TestCase):
""" class for running unittests """
def setUp(self):
""" Your setUp """
TEST_INPUT_DIR = 'data/'
test_file_name = 'testdata.csv'
try:
data = pd.read_csv(INPUT_DIR + test_file_name,
sep = ',',
header = 0)
except IOError:
print 'cannot open file'
self.fixture = data
def test_dataFrame_constructedAsExpected(self):
""" Test that the dataframe read in equals what you expect"""
foo = pd.DataFrame()
assert_frame_equal(self.fixture, foo)
回答by Steven
If you are using latest pandas, I think the following way is a bit cleaner:
如果您使用的是最新的 Pandas,我认为以下方式更简洁:
import pandas as pd
pd.testing.assert_frame_equal(my_df, expected_df)
pd.testing.assert_series_equal(my_series, expected_series)
pd.testing.assert_index_equal(my_index, expected_index)
Each of these functions will raise AssertionErrorif they are not "equal".
AssertionError如果这些函数不“相等”,它们中的每一个都会引发。
For more information and options: https://pandas.pydata.org/pandas-docs/stable/reference/general_utility_functions.html#testing-functions
有关更多信息和选项:https: //pandas.pydata.org/pandas-docs/stable/reference/general_utility_functions.html#testing-functions

