pandas ValueError：缓冲区在 if in 语句中的维数错误（预期为 1，得到 2）

Question

提问by Luca Butera

I'm trying to use an "if" statement inside a "for" cycle to check if the index of the current item in the cycle (index of a pandas Series containing the item), corresponds to one of the indexes of another Series, but doing so raises a ValueError. This is the line of code which gives problems:

我试图在“for”循环中使用“if”语句来检查循环中当前项目的索引（包含该项目的Pandas系列的索引）是否对应于另一个系列的索引之一，但这样做会引发 ValueError。这是产生问题的代码行：

if(ICM_items[ICM_items['track_id'] == i].index[0] in ICM_tgt_items.index.values.flatten().tolist()):

I tried changing both sides of the "in" statement with random integers or lists and it works, also the two items are built correctly, but when coupled in the statement they raise an error.

我尝试使用随机整数或列表更改“in”语句的两侧，并且可以正常工作，这两个项目也正确构建，但是当在语句中耦合时，它们会引发错误。

Hope someone can give me some hints on where's the problem or an alternative way to perform the same task.

希望有人能给我一些关于问题出在哪里的提示或执行相同任务的替代方法。

ICM_items and ICM_tgt_items are both pandas.Series

ICM_items 和 ICM_tgt_items 都是 pandas.Series

Below there's the console error:

下面是控制台错误：

Traceback (most recent call last):
File "/Users/LucaButera/git/rschallenge/similarity_to_recommandable_builder.py", line 27, in <module>
dot[ICM_tgt_items[ICM_items[ICM_items['track_id'] == i].index[0]]] = 0
File "/Users/LucaButera/anaconda/lib/python3.6/site-packages/pandas/core/series.py", line 603, in __getitem__
result = self.index.get_value(self, key)
File "/Users/LucaButera/anaconda/lib/python3.6/site-packages/pandas/indexes/base.py", line 2169, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/index.pyx", line 98, in pandas.index.IndexEngine.get_value (pandas/index.c:3557)
File "pandas/index.pyx", line 106, in pandas.index.IndexEngine.get_value (pandas/index.c:3240)
File "pandas/index.pyx", line 147, in pandas.index.IndexEngine.get_loc (pandas/index.c:4194)
File "pandas/index.pyx", line 280, in pandas.index.IndexEngine._ensure_mapping_populated (pandas/index.c:6150)
File "pandas/src/hashtable_class_helper.pxi", line 446, in pandas.hashtable.Int64HashTable.map_locations (pandas/hashtable.c:9261)
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
[Finished in 1.26s]

Answer 1

回答by andrew_reece

I would recommend you simplify your expressions, use .loc, and keep an eye out for edge cases (such as track_idturning up empty for a given i).
With the right test data, these steps should help you to narrow down your bug hunt.

我建议您简化表达式，使用.loc，并留意边缘情况（例如，track_id给定的变为空i）。
有了正确的测试数据，这些步骤应该可以帮助您缩小错误搜索的范围。

Example ICM_itemsdata:

示例ICM_items数据：

import numpy as np
import pandas as pd

N = 7
max_track_id = 5
idx1 = ['A','B','C']
icm_idx = np.random.choice(idx1, size=N)
icm = {"track_id":np.random.randint(0, max_track_id, size=N)}
ICM_items = pd.DataFrame(icm, index=icm_idx)

ICM_items
   track_id
C         1
A         1
A         2
C         1
B         0
B         0
B         2

Example ICM_tgt_itemsdata:

示例ICM_tgt_items数据：

idx2 = ['A','B']
icm_tgt_idx = np.random.choice(idx2, size=N)
icm = np.random.random(size=N)
ICM_tgt_items = pd.DataFrame(icm, index=icm_tgt_idx)

          0
B  0.785614
A  0.976523
A  0.856821
B  0.098086
B  0.481140
A  0.686156
A  0.851714

Now simply the comparison and catch potential edge cases:

现在简单地比较并捕捉潜在的边缘情况：

for i in range(max_track_id):
    mask = ICM_items['track_id'] == i
    try:
        # use .loc for indexing, no need to flatten() or use .values on the right.
        if ICM_items.loc[mask].index[0] in ICM_tgt_items.index:
            print("found")
        else:
            print("not found")
    # catch error if i not found in track_id
    except IndexError as e:           
        print(f"ERROR at i={i}: {e}")

Output:

输出：

found
not found
found
ERROR at i=3: index 0 is out of bounds for axis 0 with size 0
ERROR at i=4: index 0 is out of bounds for axis 0 with size 0

pandas ValueError：缓冲区在 if in 语句中的维数错误（预期为 1，得到 2）

提问by Luca Butera

回答by andrew_reece

相关推荐

最近更新

标签

pandas ValueError：缓冲区在 if in 语句中的维数错误（预期为 1，得到 2）

提问by Luca Butera

回答by andrew_reece

相关推荐

将 Pandas DatetimeIndex 转换为数字格式

pandas Groupby 列并找到每个组的最小值和最大值

使用 Pandas 循环读取 CSV 文件，然后将它们连接起来

从 python/pandas 中的日期/时间格式计算年龄

相关推荐

最近更新

标签