Python 测试是否存在 Pandas DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39337115/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Testing if a pandas DataFrame exists
提问by J Jones
In my code, I have several variables which can either contain a pandas DataFrame or nothing at all. Let's say I want to test and see if a certain DataFrame has been created yet or not. My first thought would be to test for it like this:
在我的代码中,我有几个变量可以包含一个 pandas DataFrame 或者什么都不包含。假设我想测试并查看某个 DataFrame 是否已创建。我的第一个想法是像这样测试它:
if df1:
# do something
However, that code fails in this way:
但是,该代码以这种方式失败:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Fair enough. Ideally, I would like to have a presence test that works for either a DataFrame or Python None.
很公平。理想情况下,我希望有一个适用于 DataFrame 或 Python None 的存在测试。
Here is one way this can work:
这是一种可行的方法:
if not isinstance(df1, type(None)):
# do something
However, testing for type is really slow.
但是,类型测试真的很慢。
t = timeit.Timer('if None: pass')
t.timeit()
# approximately 0.04
t = timeit.Timer('if isinstance(x, type(None)): pass', setup='x=None')
t.timeit()
# approximately 0.4
Ouch. Along with being slow, testing for NoneType isn't very flexible, either.
哎哟。除了速度慢之外,对 NoneType 的测试也不是很灵活。
A different solution would be to initialize df1
as an empty DataFrame, so that the type would be the same in both the null and non-null cases. I could then just test using len()
, or any()
, or something like that. Making an empty DataFrame seems kind of silly and wasteful, though.
一个不同的解决方案是初始化df1
为一个空的 DataFrame,这样类型在 null 和非 null 情况下都是相同的。然后我可以使用len()
、 或any()
或类似的东西进行测试。不过,制作一个空的 DataFrame 似乎有点愚蠢和浪费。
Another solution would be to have an indicator variable: df1_exists
, which is set to False until df1
is created. Then, instead of testing df1
, I would be testing df1_exists
. But this doesn't seem all that elegant, either.
另一种解决方案是有一个指示变量:df1_exists
,df1
在创建之前设置为 False 。然后,df1
我将测试而不是testing df1_exists
。但这似乎也不是那么优雅。
Is there a better, more Pythonic way of handling this issue? Am I missing something, or is this just an awkward side effect all the awesome things about pandas?
有没有更好、更 Pythonic 的方式来处理这个问题?我是否遗漏了什么,或者这只是熊猫所有令人敬畏的事情的尴尬副作用?
回答by piRSquared
Option 1(my preferred option)
选项 1(我的首选选项)
This is @Ami Tavory's
这是@Ami Tavory 的
Please select his answer if you like this approach
如果您喜欢这种方法,请选择他的答案
It is very idiomatic python to initialize a variable with None
then check for None
prior to doing something with that variable.
在使用该变量执行某些操作之前先使用None
然后检查来初始化变量是非常惯用的python None
。
df1 = None
if df1 is not None:
print df1.head()
Option 2
选项 2
However, setting up an empty dataframe isn't at all a bad idea.
然而,设置一个空的数据框并不是一个坏主意。
df1 = pd.DataFrame()
if not df1.empty:
print df1.head()
Option 3
选项 3
Just try it.
就试一试吧。
try:
print df1.head()
# catch when df1 is None
except AttributeError:
pass
# catch when it hasn't even been defined
except NameError:
pass
Timing
定时
When df1
is in initialized state or doesn't exist at all
何时df1
处于初始化状态或根本不存在
When df1
is a dataframe with something in it
什么时候df1
是一个包含某些东西的数据框
df1 = pd.DataFrame(np.arange(25).reshape(5, 5), list('ABCDE'), list('abcde'))
df1
回答by Ami Tavory
In my code, I have several variables which can either contain a pandas DataFrame or nothing at all
在我的代码中,我有几个变量可以包含一个 Pandas DataFrame 或者什么都不包含
The Pythonic way of indicating "nothing" is via None
, and for checking "not nothing" via
指示“无”的 Pythonic 方式是通过None
,并通过检查“无”
if df1 is not None:
...
I am not sure how critical time is here, but since you measured things:
我不确定这里的时间有多重要,但既然你衡量了事情:
In [82]: t = timeit.Timer('if x is not None: pass', setup='x=None')
In [83]: t.timeit()
Out[83]: 0.022536039352416992
In [84]: t = timeit.Timer('if isinstance(x, type(None)): pass', setup='x=None')
In [85]: t.timeit()
Out[85]: 0.11571192741394043
So checking that something is not None
, is also faster than the isinstance
alternative.
所以检查某事is not None
,也比isinstance
替代方案更快。