pandas ValueError:缓冲区在 if in 语句中的维数错误(预期为 1,得到 2)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46634131/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:36:06  来源:igfitidea点击:

ValueError: Buffer has wrong number of dimensions (expected 1, got 2) on if in statement

pythonpandasvalueerror

提问by Luca Butera

I'm trying to use an "if" statement inside a "for" cycle to check if the index of the current item in the cycle (index of a pandas Series containing the item), corresponds to one of the indexes of another Series, but doing so raises a ValueError. This is the line of code which gives problems:

我试图在“for”循环中使用“if”语句来检查循环中当前项目的索引(包含该项目的Pandas系列的索引)是否对应于另一个系列的索引之一,但这样做会引发 ValueError。这是产生问题的代码行:

if(ICM_items[ICM_items['track_id'] == i].index[0] in ICM_tgt_items.index.values.flatten().tolist()):

I tried changing both sides of the "in" statement with random integers or lists and it works, also the two items are built correctly, but when coupled in the statement they raise an error.

我尝试使用随机整数或列表更改“in”语句的两侧,并且可以正常工作,这两个项目也正确构建,但是当在语句中耦合时,它们会引发错误。

Hope someone can give me some hints on where's the problem or an alternative way to perform the same task.

希望有人能给我一些关于问题出在哪里的提示或执行相同任务的替代方法。

ICM_items and ICM_tgt_items are both pandas.Series

ICM_items 和 ICM_tgt_items 都是 pandas.Series

Below there's the console error:

下面是控制台错误:

Traceback (most recent call last):
File "/Users/LucaButera/git/rschallenge/similarity_to_recommandable_builder.py", line 27, in <module>
dot[ICM_tgt_items[ICM_items[ICM_items['track_id'] == i].index[0]]] = 0
File "/Users/LucaButera/anaconda/lib/python3.6/site-packages/pandas/core/series.py", line 603, in __getitem__
result = self.index.get_value(self, key)
File "/Users/LucaButera/anaconda/lib/python3.6/site-packages/pandas/indexes/base.py", line 2169, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/index.pyx", line 98, in pandas.index.IndexEngine.get_value (pandas/index.c:3557)
File "pandas/index.pyx", line 106, in pandas.index.IndexEngine.get_value (pandas/index.c:3240)
File "pandas/index.pyx", line 147, in pandas.index.IndexEngine.get_loc (pandas/index.c:4194)
File "pandas/index.pyx", line 280, in pandas.index.IndexEngine._ensure_mapping_populated (pandas/index.c:6150)
File "pandas/src/hashtable_class_helper.pxi", line 446, in pandas.hashtable.Int64HashTable.map_locations (pandas/hashtable.c:9261)
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
[Finished in 1.26s]

回答by andrew_reece

I would recommend you simplify your expressions, use .loc, and keep an eye out for edge cases (such as track_idturning up empty for a given i).
With the right test data, these steps should help you to narrow down your bug hunt.

我建议您简化表达式,使用.loc,并留意边缘情况(例如,track_id给定的 变为空i)。
有了正确的测试数据,这些步骤应该可以帮助您缩小错误搜索的范围。

Example ICM_itemsdata:

示例ICM_items数据:

import numpy as np
import pandas as pd

N = 7
max_track_id = 5
idx1 = ['A','B','C']
icm_idx = np.random.choice(idx1, size=N)
icm = {"track_id":np.random.randint(0, max_track_id, size=N)}
ICM_items = pd.DataFrame(icm, index=icm_idx)

ICM_items
   track_id
C         1
A         1
A         2
C         1
B         0
B         0
B         2

Example ICM_tgt_itemsdata:

示例ICM_tgt_items数据:

idx2 = ['A','B']
icm_tgt_idx = np.random.choice(idx2, size=N)
icm = np.random.random(size=N)
ICM_tgt_items = pd.DataFrame(icm, index=icm_tgt_idx)

          0
B  0.785614
A  0.976523
A  0.856821
B  0.098086
B  0.481140
A  0.686156
A  0.851714

Now simply the comparison and catch potential edge cases:

现在简单地比较并捕捉潜在的边缘情况:

for i in range(max_track_id):
    mask = ICM_items['track_id'] == i
    try:
        # use .loc for indexing, no need to flatten() or use .values on the right.
        if ICM_items.loc[mask].index[0] in ICM_tgt_items.index:
            print("found")
        else:
            print("not found")
    # catch error if i not found in track_id
    except IndexError as e:           
        print(f"ERROR at i={i}: {e}")

Output:

输出:

found
not found
found
ERROR at i=3: index 0 is out of bounds for axis 0 with size 0
ERROR at i=4: index 0 is out of bounds for axis 0 with size 0