基于标签的索引 Pandas (.loc)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20603925/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Label based indexing Pandas (.loc)
提问by Woody Pride
I have recently been made aware of the dangers of chained assignment, and I am trying to use the proper method of indexing in pandas using loc[rowindex, colindex]. I am working with mixed data types (mix within the same series of np.float64 and list and string) - this is unavoidable. I have an integer index
我最近意识到链式赋值的危险,我正在尝试使用 loc[rowindex, colindex] 在 Pandas 中使用正确的索引方法。我正在使用混合数据类型(在同一系列的 np.float64 以及列表和字符串中混合) - 这是不可避免的。我有一个整数索引
I am running the following loop through a data frame
我正在通过数据框运行以下循环
Count = 0
for row in DF.index:
print row
if '/' in str(DF.order_no[row]) and '/' not in str(DF.buyer[row]) and '/' not in str(DF.buyer[row])\
and '/' not in str(DF.smv[row]) and '/' not in str(DF.item[row]):
DF.loc[row, 'order_no'] = str(DF.loc[row, 'order_no']).split('/')
Count +=1
Count
数数
Which returns the error:
返回错误:
TypeError: object of type 'int' has no len()
What am I doing wrong?
我究竟做错了什么?
Within that loop I can do:
在该循环中,我可以执行以下操作:
print DF.loc[row, 'order_no']
and
和
print DF.loc[row, 'order_no'] == str(DF.loc[row, order_no]).split('/')
but not
但不是
DF.loc[row, 'order_no'] = str(DF.loc[row, order_no]).split('/')
Using the print statement I see that it gets stuck on row 3, yet:
使用打印语句,我看到它卡在第 3 行,但是:
DF.loc[3, 'order_no']
works just fine.
工作得很好。
Help apprecitated.
帮助赞赏。
EDIT
编辑
A workaround is the following:
解决方法如下:
Count = 0
Vals = []
Ind = []
for row in DF.index:
if '/' in str(DF.order_no[row]) and '/' not in str(DF.buyer[row]) and '/' not in str(DF.buyer[row])\
and '/' not in str(DF.smv[row]) and '/' not in str(DF.item[row]):
Vals.append(DF.order_no[row].split('/'))
Ind.append(row)
Count +=1
DF.loc[Ind, 'order_no'] = Vals
In other words I can create a list of the values to be modified and then change them using .loc. This works fine which leads me to believge that the issue is not with the values I am tryng to assign, and with the assignment process itself.
换句话说,我可以创建要修改的值的列表,然后使用 .loc 更改它们。这很好用,这让我相信问题不在于我试图分配的值,以及分配过程本身。
Here is an example of the type of data I am working on: The code fails on row 3 and 9 as far as i can tell. Sorry its in csv format, but this is how I am reading it into pandas.
这是我正在处理的数据类型的示例: 据我所知,代码在第 3 行和第 9 行失败。对不起,它是 csv 格式,但这就是我将它读入Pandas的方式。
https://www.dropbox.com/s/zuy8pj15nlhmcfb/EG2.csv
https://www.dropbox.com/s/zuy8pj15nlhmcfb/EG2.csv
Using that data if the following is done:
如果完成以下操作,则使用该数据:
EG = pd.reas_csv('EG.csv')
EG.loc[3, 'order_no'] = str(EG.loc[3, 'order_no']).split('/')
Fails with the error
因错误而失败
object of type 'int' has no len()
object of type 'int' has no len()
But
但
EG['order_no'][3] = str(EG.loc[3, 'order_no']).split('/')
works fine, but this is the type of chain assignment I am trying to avoid as it was giving me issues elsewhere.
工作正常,但这是我试图避免的链分配类型,因为它在其他地方给我带来了问题。
which is why I thought this was just a syntax error.
这就是为什么我认为这只是一个语法错误。
Sorry for this now unweildy question
很抱歉这个现在不合理的问题
回答by BrenBarn
You may be running into dtype issues. The following code works for me:
您可能会遇到 dtype 问题。以下代码对我有用:
import pandas as pd
data = {'working_hr': {3: 9.0}, 'order_no': {3: 731231}}
df = pd.DataFrame.from_dict(data, dtype=object)
And then:
进而:
>>> df.loc[3, 'order_no'] = [1, 2]
>>> df
order_no working_hr
3 [1, 2] 9
Note the dtype=object. This may be why your errors disappeared when you shortened the DataFrame, especially if you're reading from csv. In many situations (such as readng from CSV), pandas tries to infer the dtype and pick the most specific one. You can assign a list as a value if the dtype is object, but not if it's (for instance) float64. So check whether your mixed-type column really is set to dtype object.
注意dtype=object. 这可能就是当您缩短 DataFrame 时错误消失的原因,尤其是当您从 csv 读取时。在许多情况下(例如从 CSV 读取),pandas 会尝试推断 dtype 并选择最具体的一个。如果 dtype 是对象,则可以将列表分配为值,但如果它是(例如)float64,则不能。因此,请检查您的混合类型列是否真的设置为 dtype object。
The same works with your provided CSV:
同样适用于您提供的 CSV:
>>> df = pandas.read_clipboard(sep='\t', index_col=0)
>>> df
buyer order_no item smv
0 H&M 992754 Cole tank top 6.17
1 H&M 859901 Thilo Bottom 8.55
2 H&M 731231 Palma Short Sleeve Tee 5.65
3 H&M 731231/339260 Palma Price Tee 5.65
4 H&M 859901/304141 Thilo Paijama Set top/Elva Tank Top 5.80/5.58
5 H&M 768380 Folke Tank Top 6
6 H&M 596701/590691 Paul Rock Tee 7.65
7 H&M/Mexx 731231/KIEZ-P002 Palma Short Sleeve Tee/Shorts 5.65/12.85
8 NaN NaN NaN NaN
9 Ginatricot 512008/512009 J.Tank top 4.6
>>> df.loc[3, 'order_no'] = str(df.loc[3, 'order_no']).split('/')
>>> df
buyer order_no item smv
0 H&M 992754 Cole tank top 6.17
1 H&M 859901 Thilo Bottom 8.55
2 H&M 731231 Palma Short Sleeve Tee 5.65
3 H&M [731231, 339260] Palma Price Tee 5.65
4 H&M 859901/304141 Thilo Paijama Set top/Elva Tank Top 5.80/5.58
5 H&M 768380 Folke Tank Top 6
6 H&M 596701/590691 Paul Rock Tee 7.65
7 H&M/Mexx 731231/KIEZ-P002 Palma Short Sleeve Tee/Shorts 5.65/12.85
8 NaN NaN NaN NaN
9 Ginatricot 512008/512009 J.Tank top 4.6
回答by alko
Shorter error raising code for reference (until OP includes it in his question):
更短的错误引发代码供参考(直到 OP 将其包含在他的问题中):
import pandas as pd
data = {'working_hr': {3: 9.0}, 'order_no': {3: 731231}}
df = pd.DataFrame.from_dict(data)
df.loc[3, 'order_no'] = [1,2] # raises error
Inspecting code, list value [1,2]is treated by _setitem_with_indexer as list, and I can't see how can this issue be avoided for the value treated as scalar.
检查代码,列表值[1,2]被 _setitem_with_indexer 视为列表,我看不出如何将值视为标量来避免此问题。

