如何在单元测试中使用 Pandas 数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27950891/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:50:54  来源:igfitidea点击:

How to use a pandas data frame in a unit test

pythonpandaspython-unittest

提问by tjb305

I am developing a set of python scripts to pre-process a dataset then produce a series of machine learning models using scikit-learn. I would like to develop a set of unittests to check the data pre-processing functions, and would like to be able to use a small test pandas dataframe for which I can determine the answers for and use it in assert statements.

我正在开发一组 python 脚本来预处理数据集,然后使用 scikit-learn 生成一系列机器学习模型。我想开发一组单元测试来检查数据预处理功能,并希望能够使用一个小的测试Pandas数据框,我可以确定答案并在断言语句中使用它。

I cannot seem to get it to load the dataframe and to pass it to the unit tests using self. My code looks something like this;

我似乎无法让它加载数据帧并将其传递给使用 self. 我的代码看起来像这样;

def setUp(self):
    TEST_INPUT_DIR = 'data/'
    test_file_name =  'testdata.csv'
    try:
        data = pd.read_csv(INPUT_DIR + test_file_name,
            sep = ',',
            header = 0)
    except IOError:
        print 'cannot open file'
    self.fixture = data

def tearDown(self):
    del self.fixture

def test1(self):    
    self.assertEqual(somefunction(self.fixture), somevalue)

if __name__ == '__main__':
    unittest.main()

Thanks for the help.

谢谢您的帮助。

回答by Adam Slack

Pandas has some utilities for testing.

Pandas 有一些用于测试的实用程序。

import unittest
import pandas as pd
from pandas.util.testing import assert_frame_equal # <-- for testing dataframes

class DFTests(unittest.TestCase):

    """ class for running unittests """

    def setUp(self):
        """ Your setUp """
        TEST_INPUT_DIR = 'data/'
        test_file_name =  'testdata.csv'
        try:
            data = pd.read_csv(INPUT_DIR + test_file_name,
                sep = ',',
                header = 0)
        except IOError:
            print 'cannot open file'
        self.fixture = data

    def test_dataFrame_constructedAsExpected(self):
        """ Test that the dataframe read in equals what you expect"""
        foo = pd.DataFrame()
        assert_frame_equal(self.fixture, foo)

回答by Steven

If you are using latest pandas, I think the following way is a bit cleaner:

如果您使用的是最新的 Pandas,我认为以下方式更简洁:

import pandas as pd

pd.testing.assert_frame_equal(my_df, expected_df)
pd.testing.assert_series_equal(my_series, expected_series)
pd.testing.assert_index_equal(my_index, expected_index)

Each of these functions will raise AssertionErrorif they are not "equal".

AssertionError如果这些函数不“相等”,它们中的每一个都会引发。

For more information and options: https://pandas.pydata.org/pandas-docs/stable/reference/general_utility_functions.html#testing-functions

有关更多信息和选项:https: //pandas.pydata.org/pandas-docs/stable/reference/general_utility_functions.html#testing-functions