基于标签的索引 Pandas (.loc)

Question

提问by Woody Pride

I have recently been made aware of the dangers of chained assignment, and I am trying to use the proper method of indexing in pandas using loc[rowindex, colindex]. I am working with mixed data types (mix within the same series of np.float64 and list and string) - this is unavoidable. I have an integer index

我最近意识到链式赋值的危险，我正在尝试使用 loc[rowindex, colindex] 在 Pandas 中使用正确的索引方法。我正在使用混合数据类型（在同一系列的 np.float64 以及列表和字符串中混合） - 这是不可避免的。我有一个整数索引

I am running the following loop through a data frame

我正在通过数据框运行以下循环

Count = 0
for row in DF.index:
print row
    if '/' in str(DF.order_no[row]) and '/' not in str(DF.buyer[row]) and '/' not in    str(DF.buyer[row])\
    and '/' not in str(DF.smv[row]) and '/' not in str(DF.item[row]):
        DF.loc[row, 'order_no'] = str(DF.loc[row, 'order_no']).split('/')
        Count +=1

Count

数数

Which returns the error:

返回错误：

 TypeError: object of type 'int' has no len()

What am I doing wrong?

我究竟做错了什么？

Within that loop I can do:

在该循环中，我可以执行以下操作：

print DF.loc[row, 'order_no']

and

和

print DF.loc[row, 'order_no'] == str(DF.loc[row, order_no]).split('/')

but not

但不是

DF.loc[row, 'order_no'] = str(DF.loc[row, order_no]).split('/')

Using the print statement I see that it gets stuck on row 3, yet:

使用打印语句，我看到它卡在第 3 行，但是：

DF.loc[3, 'order_no']

works just fine.

工作得很好。

Help apprecitated.

帮助赞赏。

EDIT

编辑

A workaround is the following:

解决方法如下：

Count = 0
Vals = []
Ind = []
for row in DF.index:
    if '/' in str(DF.order_no[row]) and '/' not in str(DF.buyer[row]) and '/' not in str(DF.buyer[row])\
    and '/' not in str(DF.smv[row]) and '/' not in str(DF.item[row]):
        Vals.append(DF.order_no[row].split('/'))
        Ind.append(row)
        Count +=1

DF.loc[Ind, 'order_no'] = Vals

In other words I can create a list of the values to be modified and then change them using .loc. This works fine which leads me to believge that the issue is not with the values I am tryng to assign, and with the assignment process itself.

换句话说，我可以创建要修改的值的列表，然后使用 .loc 更改它们。这很好用，这让我相信问题不在于我试图分配的值，以及分配过程本身。

Here is an example of the type of data I am working on: The code fails on row 3 and 9 as far as i can tell. Sorry its in csv format, but this is how I am reading it into pandas.

这是我正在处理的数据类型的示例：据我所知，代码在第 3 行和第 9 行失败。对不起，它是 csv 格式，但这就是我将它读入Pandas的方式。

https://www.dropbox.com/s/zuy8pj15nlhmcfb/EG2.csv

Using that data if the following is done:

如果完成以下操作，则使用该数据：

EG = pd.reas_csv('EG.csv')
EG.loc[3, 'order_no'] = str(EG.loc[3, 'order_no']).split('/')

Fails with the error

因错误而失败

object of type 'int' has no len()

But

但

EG['order_no'][3] = str(EG.loc[3, 'order_no']).split('/')

works fine, but this is the type of chain assignment I am trying to avoid as it was giving me issues elsewhere.

工作正常，但这是我试图避免的链分配类型，因为它在其他地方给我带来了问题。

which is why I thought this was just a syntax error.

这就是为什么我认为这只是一个语法错误。

Sorry for this now unweildy question

很抱歉这个现在不合理的问题

Answer 1

回答by BrenBarn

You may be running into dtype issues. The following code works for me:

您可能会遇到 dtype 问题。以下代码对我有用：

import pandas as pd
data = {'working_hr': {3: 9.0}, 'order_no': {3: 731231}}
df = pd.DataFrame.from_dict(data, dtype=object)

And then:

进而：

>>> df.loc[3, 'order_no'] = [1, 2]
>>> df
  order_no working_hr
3   [1, 2]          9

Note the dtype=object. This may be why your errors disappeared when you shortened the DataFrame, especially if you're reading from csv. In many situations (such as readng from CSV), pandas tries to infer the dtype and pick the most specific one. You can assign a list as a value if the dtype is object, but not if it's (for instance) float64. So check whether your mixed-type column really is set to dtype object.

注意dtype=object. 这可能就是当您缩短 DataFrame 时错误消失的原因，尤其是当您从 csv 读取时。在许多情况下（例如从 CSV 读取），pandas 会尝试推断 dtype 并选择最具体的一个。如果 dtype 是对象，则可以将列表分配为值，但如果它是（例如）float64，则不能。因此，请检查您的混合类型列是否真的设置为 dtype object。

The same works with your provided CSV:

同样适用于您提供的 CSV：

>>> df = pandas.read_clipboard(sep='\t', index_col=0)
>>> df
        buyer          order_no                                 item         smv
0         H&M            992754                        Cole tank top        6.17
1         H&M            859901                         Thilo Bottom        8.55
2         H&M            731231               Palma Short Sleeve Tee        5.65
3         H&M     731231/339260                      Palma Price Tee        5.65
4         H&M     859901/304141  Thilo Paijama Set top/Elva Tank Top   5.80/5.58
5         H&M            768380                       Folke Tank Top           6
6         H&M     596701/590691                        Paul Rock Tee        7.65
7    H&M/Mexx  731231/KIEZ-P002        Palma Short Sleeve Tee/Shorts  5.65/12.85
8         NaN               NaN                                  NaN         NaN
9  Ginatricot     512008/512009                           J.Tank top         4.6
>>> df.loc[3, 'order_no'] = str(df.loc[3, 'order_no']).split('/')
>>> df
        buyer          order_no                                 item         smv
0         H&M            992754                        Cole tank top        6.17
1         H&M            859901                         Thilo Bottom        8.55
2         H&M            731231               Palma Short Sleeve Tee        5.65
3         H&M  [731231, 339260]                      Palma Price Tee        5.65
4         H&M     859901/304141  Thilo Paijama Set top/Elva Tank Top   5.80/5.58
5         H&M            768380                       Folke Tank Top           6
6         H&M     596701/590691                        Paul Rock Tee        7.65
7    H&M/Mexx  731231/KIEZ-P002        Palma Short Sleeve Tee/Shorts  5.65/12.85
8         NaN               NaN                                  NaN         NaN
9  Ginatricot     512008/512009                           J.Tank top         4.6

Answer 2

回答by alko

Shorter error raising code for reference (until OP includes it in his question):

更短的错误引发代码供参考（直到 OP 将其包含在他的问题中）：

import pandas as pd
data = {'working_hr': {3: 9.0}, 'order_no': {3: 731231}}
df = pd.DataFrame.from_dict(data)
df.loc[3, 'order_no'] = [1,2] # raises error

Inspecting code, list value [1,2]is treated by _setitem_with_indexer as list, and I can't see how can this issue be avoided for the value treated as scalar.

检查代码，列表值[1,2]被 _setitem_with_indexer 视为列表，我看不出如何将值视为标量来避免此问题。

基于标签的索引 Pandas (.loc)

提问by Woody Pride

回答by BrenBarn

回答by alko

相关推荐

最近更新

标签

基于标签的索引 Pandas (.loc)

提问by Woody Pride

回答by BrenBarn

回答by alko

相关推荐

Python Pandas：如何过滤存储在不同变量中的多个表达式的数据帧？

pandas 如何在不破坏 openpyxl 公式的情况下写入现有的 excel 文件？

pandas 大熊猫数据帧转移日期

使用距离矩阵计算 Pandas Dataframe 中行之间的距离

相关推荐

最近更新

标签