pandas 类型错误：“numpy.int64”类型的对象没有 len()

Question

提问by Sarit

I am making a DataLoaderfrom DataSetin PyTorch.

我正在制作一个DataLoaderfrom DataSetin PyTorch。

Start from loading the DataFramewith all dtypeas an np.float64

从加载DataFrame所有dtype作为np.float64

result = pd.read_csv('dummy.csv', header=0, dtype=DTYPE_CLEANED_DF)

Here is my dataset classes.

这是我的数据集类。

from torch.utils.data import Dataset, DataLoader
class MyDataset(Dataset):
    def __init__(self, result):
        headers = list(result)
        headers.remove('classes')

        self.x_data = result[headers]
        self.y_data = result['classes']
        self.len = self.x_data.shape[0]

    def __getitem__(self, index):
        x = torch.tensor(self.x_data.iloc[index].values, dtype=torch.float)
        y = torch.tensor(self.y_data.iloc[index], dtype=torch.float)
        return (x, y)

    def __len__(self):
        return self.len

Prepare the train_loader and test_loader

准备 train_loader and test_loader

train_size = int(0.5 * len(full_dataset))
test_size = len(full_dataset) - train_size
train_dataset, test_dataset = torch.utils.data.random_split(full_dataset, [train_size, test_size])

train_loader = DataLoader(dataset=train_dataset, batch_size=16, shuffle=True, num_workers=1)
test_loader = DataLoader(dataset=train_dataset)

Here is my csvfile

这是我的csv文件

When I try to iterate over the train_loader. It raises the error

当我尝试遍历train_loader. 它引发了错误

for i , (data, target) in enumerate(train_loader):
    print(i)

TypeError                                 Traceback (most recent call last)
<ipython-input-32-0b4921c3fe8c> in <module>
----> 1 for i , (data, target) in enumerate(train_loader):
      2     print(i)

/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
    635                 self.reorder_dict[idx] = batch
    636                 continue
--> 637             return self._process_next_batch(batch)
    638 
    639     next = __next__  # Python 2 compatibility

/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _process_next_batch(self, batch)
    656         self._put_indices()
    657         if isinstance(batch, ExceptionWrapper):
--> 658             raise batch.exc_type(batch.exc_msg)
    659         return batch
    660 

TypeError: Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 103, in __getitem__
    return self.dataset[self.indices[idx]]
  File "<ipython-input-27-107e03bc3c6a>", line 12, in __getitem__
    x = torch.tensor(self.x_data.iloc[index].values, dtype=torch.float)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py", line 1478, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py", line 2091, in _getitem_axis
    return self._get_list_axis(key, axis=axis)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py", line 2070, in _get_list_axis
    return self.obj._take(key, axis=axis)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py", line 2789, in _take
    verify=True)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/internals.py", line 4537, in take
    new_labels = self.axes[axis].take(indexer)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2195, in take
    return self._shallow_copy(taken)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/range.py", line 267, in _shallow_copy
    return self._int64index._shallow_copy(values, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/numeric.py", line 68, in _shallow_copy
    return self._shallow_copy_with_infer(values=values, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 538, in _shallow_copy_with_infer
    if not len(values) and 'dtype' not in kwargs:
TypeError: object of type 'numpy.int64' has no len()

Related issues:
https://github.com/pytorch/pytorch/issues/10165
https://github.com/pytorch/pytorch/pull/9237
https://github.com/pandas-dev/pandas/issues/21946

相关问题：
https : //github.com/pytorch/pytorch/issues/10165
https://github.com/pytorch/pytorch/pull/9237
https://github.com/pandas-dev/pandas/issues/21946

Questions:
How to workaround pandasissue here?

问题：
如何解决pandas这里的问题？

Answer 1

采纳答案by Sarit

Reference:
https://github.com/pytorch/pytorch/issues/9211

参考：https :
//github.com/pytorch/pytorch/issues/9211

Just add .tolist()to indicesline.

只需添加.tolist()到indices行。

def random_split(dataset, lengths):
    """
    Randomly split a dataset into non-overlapping new datasets of given lengths.
    Arguments:
        dataset (Dataset): Dataset to be split
        lengths (sequence): lengths of splits to be produced
    """
    if sum(lengths) != len(dataset):
        raise ValueError("Sum of input lengths does not equal the length of the input dataset!")

    indices = randperm(sum(lengths)).tolist()
    return [Subset(dataset, indices[offset - length:offset]) for offset, length in zip(_accumulate(lengths), lengths)]

Answer 2

回答by Anjum Sayed

I think the issue is that after using random_split, indexis now a torch.Tensorrather than an int. I found that adding a quick type check to __getitem__and then using .item()on the tensor works for me:

我认为问题是使用后random_split，index现在是一个torch.Tensor而不是一个int. 我发现添加快速类型检查__getitem__然后.item()在张量上使用对我有用：

def __getitem__(self, index):

    if type(index) == torch.Tensor:
        index = index.item()

    x = torch.tensor(self.x_data.iloc[index].values, dtype=torch.float)
    y = torch.tensor(self.y_data.iloc[index], dtype=torch.float)
    return (x, y)

Source: https://discuss.pytorch.org/t/issues-with-torch-utils-data-random-split/22298/8

来源：https: //discuss.pytorch.org/t/issues-with-torch-utils-data-random-split/22298/8

Answer 3

回答by trsvchn

Why not simply to try:

为什么不简单地尝试：

self.len = len(self.x_data)

lenworks fine with pandasDataFramew/o conversion to array or tensor.

len正常工作与pandasDataFrameW / O转换到阵列或张量。

Answer 4

回答by Andrew Schreiber

I solved the issue by upgrading my version of PyTorch to version 1.3.

我通过将我的 PyTorch 版本升级到 1.3 版解决了这个问题。

https://pytorch.org/get-started/locally/

pandas 类型错误：“numpy.int64”类型的对象没有 len()

提问by Sarit

采纳答案by Sarit

回答by Anjum Sayed

回答by trsvchn

回答by Andrew Schreiber

相关推荐

最近更新

标签

pandas 类型错误：“numpy.int64”类型的对象没有 len()

提问by Sarit

采纳答案by Sarit

回答by Anjum Sayed

回答by trsvchn

回答by Andrew Schreiber

相关推荐

Pandas pd.read_csv 不适用于简单的 sep=','

将图像添加到 Pandas DataFrame

pandas 在python pandas中将时间对象转换为日期时间格式

pandas 附加列表作为数据框行

相关推荐

最近更新

标签