pandas 尝试使用函数中定义的数据帧名称时发生意外的 NameError

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24231437/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:10:06  来源:igfitidea点击:

Unexpected NameError occurs when trying to use a dataframe name defined within a function

pythonpython-2.7pandas

提问by Jason

Could somebody explain why the following code produces a NameError?

有人可以解释为什么下面的代码会产生NameError?

def nonull(df, col, name):
    name = df[pd.notnull(df[col])]
    print name[col].count(), df[col].count()
    return name

nonull(sve, 'DOC_mg/L', 'sveDOC')
sveDOC.count()

NameError: name 'sveDOC' is not defined

711 711

The dataframeseems to be created as the printstatement works, so I don't understand why when I try to use sveDOC(which was nameinside the function) it produces an error.

dataframe似乎为要创建print声明的作品,所以我不明白为什么当我尝试使用sveDOC(这是name在函数内部)会产生错误。

Here's an example of what I'd like to do within the function:

这是我想在函数中执行的操作的示例:

import pandas as pd

d = {'one' : pd.Series([1., 1., 1., 1.], index=['a', 'b', 'c', 'd']), 
     'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
pd.DataFrame(d)
df = pd.DataFrame(d)
df1 = df
df = df * 2
print df.head(), df1.head()

one  two
a    2    2
b    2    4
c    2    6
d    2    8    
one  two
a    1    1
b    1    2
c    1    3
d    1    4

回答by jonrsharpe

Python names do notwork the way you seem to think. Here's what your code actually does:

Python 名称并不像您认为的那样工作。这是您的代码实际执行的操作:

def nonull(df, col, name):
    name = df # rebind the name 'name' to the object referenced by 'df'
    name = df[pd.notnull(name[col])] # rebind the name 'name' again 
    print name[col].count(), df[col].count()
    return name # return the instance

nonull(sve, 'DOC_mg/L', 'sveDOC') # call the function and ignore the return value

The function never actually uses the 'sveDOC'argument. Here's what you should actually do:

该函数从不实际使用该'sveDOC'参数。以下是您实际应该做的:

def nonull(df, col):
    name = df[pd.notnull(df[col])]
    print name[col].count(), df[col].count()
    return name

sveDOC = nonull(sve, 'DOC_mg/L')
sveDOC.count()


Your conception of Python's use of names and references is completely wrong.

您对 Python 使用名称和引用的概念是完全错误的。

pd.DataFrame(d) # creates a new DataFrame but doesn't do anything with it
                # (what was the point of this line?)
df = pd.DataFrame(d) # assigns a second new DataFrame to the name 'df'
df1 = df # assigns the name `df1` to the same object that 'df' refers to
         # - note that this does *not* create a copy
df = df * 2 # create a new DataFrame based on the one referenced by 'df' 
            # (and 'df1'!)and assign to the name 'df'

To demonstrate this:

为了证明这一点:

df1 = pd.DataFrame(d)

df2 = df1

df1 is df2
Out[5]: True # still the same object

df2 = df2 * 2

df1 is df2
Out[7]: False #?now different

If you want to create a copy of a DataFrame, do so explicitly:

如果要创建 a 的副本DataFrame,请明确执行以下操作:

df2 = copy(df1)

You can either do this outside nonulland pass the copy, or do it inside nonulland returnthe modified copy.

您可以在外部执行此操作nonull并传递副本,也可以在内部执行nonullreturn修改后的副本。