Python 测试是否存在 Pandas DataFrame

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39337115/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:09:16  来源:igfitidea点击:

Testing if a pandas DataFrame exists

pythonpandasdataframe

提问by J Jones

In my code, I have several variables which can either contain a pandas DataFrame or nothing at all. Let's say I want to test and see if a certain DataFrame has been created yet or not. My first thought would be to test for it like this:

在我的代码中,我有几个变量可以包含一个 pandas DataFrame 或者什么都不包含。假设我想测试并查看某个 DataFrame 是否已创建。我的第一个想法是像这样测试它:

if df1:
    # do something

However, that code fails in this way:

但是,该代码以这种方式失败:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Fair enough. Ideally, I would like to have a presence test that works for either a DataFrame or Python None.

很公平。理想情况下,我希望有一个适用于 DataFrame 或 Python None 的存在测试。

Here is one way this can work:

这是一种可行的方法:

if not isinstance(df1, type(None)):
    # do something

However, testing for type is really slow.

但是,类型测试真的很慢。

t = timeit.Timer('if None: pass')
t.timeit()
# approximately 0.04
t = timeit.Timer('if isinstance(x, type(None)): pass', setup='x=None')
t.timeit()
# approximately 0.4

Ouch. Along with being slow, testing for NoneType isn't very flexible, either.

哎哟。除了速度慢之外,对 NoneType 的测试也不是很灵活。

A different solution would be to initialize df1as an empty DataFrame, so that the type would be the same in both the null and non-null cases. I could then just test using len(), or any(), or something like that. Making an empty DataFrame seems kind of silly and wasteful, though.

一个不同的解决方案是初始化df1为一个空的 DataFrame,这样类型在 null 和非 null 情况下都是相同的。然后我可以使用len()、 或any()或类似的东西进行测试。不过,制作一个空的 DataFrame 似乎有点愚蠢和浪费。

Another solution would be to have an indicator variable: df1_exists, which is set to False until df1is created. Then, instead of testing df1, I would be testing df1_exists. But this doesn't seem all that elegant, either.

另一种解决方案是有一个指示变量:df1_existsdf1在创建之前设置为 False 。然后,df1我将测试而不是testing df1_exists。但这似乎也不是那么优雅。

Is there a better, more Pythonic way of handling this issue? Am I missing something, or is this just an awkward side effect all the awesome things about pandas?

有没有更好、更 Pythonic 的方式来处理这个问题?我是否遗漏了什么,或者这只是熊猫所有令人敬畏的事情的尴尬副作用?

回答by piRSquared

Option 1(my preferred option)

选项 1(我的首选选项)

This is @Ami Tavory's

这是@Ami Tavory 的

Please select his answer if you like this approach

如果您喜欢这种方法,请选择他的答案

It is very idiomatic python to initialize a variable with Nonethen check for Noneprior to doing something with that variable.

在使用该变量执行某些操作之前先使用None然后检查来初始化变量是非常惯用的python None

df1 = None

if df1 is not None:
    print df1.head()


Option 2

选项 2

However, setting up an empty dataframe isn't at all a bad idea.

然而,设置一个空的数据框并不是一个坏主意。

df1 = pd.DataFrame()

if not df1.empty:
    print df1.head()


Option 3

选项 3

Just try it.

就试一试吧。

try:
    print df1.head()
# catch when df1 is None
except AttributeError:
    pass
# catch when it hasn't even been defined
except NameError:
    pass


Timing

定时

When df1is in initialized state or doesn't exist at all

何时df1处于初始化状态或根本不存在

enter image description here

在此处输入图片说明

When df1is a dataframe with something in it

什么时候df1是一个包含某些东西的数据框

df1 = pd.DataFrame(np.arange(25).reshape(5, 5), list('ABCDE'), list('abcde'))
df1

enter image description here

在此处输入图片说明

enter image description here

在此处输入图片说明

回答by Ami Tavory

In my code, I have several variables which can either contain a pandas DataFrame or nothing at all

在我的代码中,我有几个变量可以包含一个 Pandas DataFrame 或者什么都不包含

The Pythonic way of indicating "nothing" is via None, and for checking "not nothing" via

指示“无”的 Pythonic 方式是通过None,并通过检查“无”

if df1 is not None:
    ...

I am not sure how critical time is here, but since you measured things:

我不确定这里的时间有多重要,但既然你衡量了事情:

In [82]: t = timeit.Timer('if x is not None: pass', setup='x=None')

In [83]: t.timeit()
Out[83]: 0.022536039352416992

In [84]: t = timeit.Timer('if isinstance(x, type(None)): pass', setup='x=None')

In [85]: t.timeit()
Out[85]: 0.11571192741394043

So checking that something is not None, is also faster than the isinstancealternative.

所以检查某事is not None,也比isinstance替代方案更快。