分配给 Pandas 中的容器
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/23227171/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Assignment to containers in Pandas
提问by Josh
I want to replace Noneentries in a specific column in Pandas with an empty list. 
我想None用空列表替换Pandas 中特定列中的条目。
Note that some entries in this column may already have an empty list in them, and I don't want to touch those.
请注意,此列中的某些条目可能已经包含一个空列表,我不想触及它们。
I have tried:
我试过了:
indices = np.equal(df[col],None)
df[col][indices] = []
and
和
indices = np.equal(df[col],None)
df[col][indices] = list()
but both solutions fail with:
但两种解决方案都失败了:
ValueError: Length of replacements must equal series length
Why? How can I update those specific rows with an empty list?
为什么?如何使用空列表更新这些特定行?
回答by Jeff
Using endemic lists is not allowed on assignment and is not recommended to do this at all.
分配中不允许使用地方性列表,并且根本不建议这样做。
You cando it if you create from scratch
如果你从头开始创建,你可以做到
In [50]: DataFrame({ 'A' : [[],[],1]})
Out[50]: 
    A
0  []
1  []
2   1
[3 rows x 1 columns]
The reason this is not allowed is that without indicies (e.g. say in numpy), you can do something like this:
不允许这样做的原因是没有索引(例如在 numpy 中说),您可以执行以下操作:
In [51]: df = DataFrame({ 'A' : [1,2,3] })
In [52]: df.loc[df['A'] == 2] = [ 5 ]
In [53]: df
Out[53]: 
   A
0  1
1  5
2  3
[3 rows x 1 columns]
You can do an assignment where the length of the True values in the mask are equal to the length of the list/tuple/ndarray on the rhs (e.g. the value you are setting). Pandas allows this, as well as a length that is exactly equal to the lhs, and a scalar. Anything else is expressly disallowed because its ambiguous (e.g. do you mean to align it or not?)
您可以进行赋值,其中掩码中 True 值的长度等于 rhs 上的列表/元组/ndarray 的长度(例如您正在设置的值)。Pandas 允许这样做,并且长度与 lhs 完全相等,并且是一个标量。其他任何东西都被明确禁止,因为它不明确(例如,您的意思是对齐还是不对齐?)
For example, imagine:
例如,想象:
In [54]: df = DataFrame({ 'A' : [1,2,3] })
In [55]: df.loc[df['A']<3] = [5]
ValueError: cannot set using a list-like indexer with a different length than the value
A 0-length list/tuple/ndarray is considered an error not because it can't be done, but usually its user error, its unclear what to do.
一个 0 长度的列表/元组/ndarray 被认为是一个错误,不是因为它不能完成,而是通常它的用户错误,它不清楚该怎么做。
Bottom line, don't use lists insideof a pandas object. Its not efficient, and just makes interpretation difficult / impossible.
最重要的是,不要在pandas 对象内使用列表。它效率不高,只会使解释变得困难/不可能。
回答by exp1orer
Edit: Preserved my original answer below, but I put it up without testing it and it actually doesn't work for me.
编辑:在下面保留了我的原始答案,但我没有对其进行测试就提出来了,它实际上对我不起作用。
import pandas as pd
import numpy as np
ser1 = pd.Series(['hi',None,np.nan])
ser2 = pd.Series([5,7,9])
df = pd.DataFrame([ser1,ser2]).T
This is janky, I know. Also, apparently the DataFrame constructor (but not the Series constructor) coerces None to np.nan. No idea why.
这很笨拙,我知道。此外,显然 DataFrame 构造函数(但不是 Series 构造函数)将 None 强制转换为 np.nan。不知道为什么。
df.loc[1,0] = None
So now we have
所以现在我们有
    0     1
0   'hi'  5
1   None  7
2   NaN   9
df.columns = ['col1','col2']
mask = np.equal(df['col1'], None)
df.loc[mask, 'col1'] = []
But this doesn't assign anything. The dataframe looks the same as before. I'm following the recommended usage from the docs and assigning base types (strings and numbers) works. So for me the problem is assigning objects to dataframe entries. No idea what's up.
但这并没有分配任何东西。数据框看起来和以前一样。我正在遵循文档中的推荐用法并分配基本类型(字符串和数字)。所以对我来说,问题是将对象分配给数据框条目。不知道怎么回事。
(Original answer)
(原答案)
Two things:
两件事情:
- I'm not familiar with np.equalbutpandas.isnull()should also work if you want to capture all null values.
- You are doing what is called "chained assignment". I don't understand the problem fully but I know it doesn't work. In the docs.
- 我不熟悉np.equal但pandas.isnull()如果你想捕获所有空值也应该工作。
- 您正在执行所谓的“链式分配”。我不完全理解这个问题,但我知道它不起作用。在文档中。
Try this:
尝试这个:
mask = pandas.isnull(df[col])
df.loc[mask, col] = list()
Or, if you only want to catch Noneand not np.nan:
或者,如果您只想捕捉None而不是np.nan:
mask = np.equal(df[col].values, None) 
df.loc[mask, col] = list()
Note: While pandas.isnullworks with Noneon dataframes, series, and arrays as expected, numpy.equalonly works as expected with dataframes and arrays. A pandas Series of all Nonewill not return True for any of them. This is due to Noneonly selectively behaving as np.nanSee BUG: None is not equal to None #20442
注:虽然pandas.isnull有作品None上dataframes,系列和阵列如预期,numpy.equal仅按预期工作与dataframes和数组。Pandas系列中的任何一个None都不会返回 True 。这是因为None仅选择性地表现为np.nan见BUG:无不等于无 #20442

