pandas 类型错误:“numpy.int64”类型的对象没有 len()

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/53916594/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:13:53  来源:igfitidea点击:

TypeError: object of type 'numpy.int64' has no len()

pythonpandasnumpydatasetpytorch

提问by Sarit

I am making a DataLoaderfrom DataSetin PyTorch.

我正在制作一个DataLoaderfrom DataSetin PyTorch

Start from loading the DataFramewith all dtypeas an np.float64

从加载DataFrame所有dtype作为np.float64

result = pd.read_csv('dummy.csv', header=0, dtype=DTYPE_CLEANED_DF)

result = pd.read_csv('dummy.csv', header=0, dtype=DTYPE_CLEANED_DF)

Here is my dataset classes.

这是我的数据集类。

from torch.utils.data import Dataset, DataLoader
class MyDataset(Dataset):
    def __init__(self, result):
        headers = list(result)
        headers.remove('classes')

        self.x_data = result[headers]
        self.y_data = result['classes']
        self.len = self.x_data.shape[0]

    def __getitem__(self, index):
        x = torch.tensor(self.x_data.iloc[index].values, dtype=torch.float)
        y = torch.tensor(self.y_data.iloc[index], dtype=torch.float)
        return (x, y)

    def __len__(self):
        return self.len

Prepare the train_loader and test_loader

准备 train_loader and test_loader

train_size = int(0.5 * len(full_dataset))
test_size = len(full_dataset) - train_size
train_dataset, test_dataset = torch.utils.data.random_split(full_dataset, [train_size, test_size])

train_loader = DataLoader(dataset=train_dataset, batch_size=16, shuffle=True, num_workers=1)
test_loader = DataLoader(dataset=train_dataset)

Here is my csvfile

这是我的csv文件

When I try to iterate over the train_loader. It raises the error

当我尝试遍历train_loader. 它引发了错误

for i , (data, target) in enumerate(train_loader):
    print(i)

TypeError                                 Traceback (most recent call last)
<ipython-input-32-0b4921c3fe8c> in <module>
----> 1 for i , (data, target) in enumerate(train_loader):
      2     print(i)

/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
    635                 self.reorder_dict[idx] = batch
    636                 continue
--> 637             return self._process_next_batch(batch)
    638 
    639     next = __next__  # Python 2 compatibility

/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _process_next_batch(self, batch)
    656         self._put_indices()
    657         if isinstance(batch, ExceptionWrapper):
--> 658             raise batch.exc_type(batch.exc_msg)
    659         return batch
    660 

TypeError: Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 103, in __getitem__
    return self.dataset[self.indices[idx]]
  File "<ipython-input-27-107e03bc3c6a>", line 12, in __getitem__
    x = torch.tensor(self.x_data.iloc[index].values, dtype=torch.float)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py", line 1478, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py", line 2091, in _getitem_axis
    return self._get_list_axis(key, axis=axis)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py", line 2070, in _get_list_axis
    return self.obj._take(key, axis=axis)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py", line 2789, in _take
    verify=True)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/internals.py", line 4537, in take
    new_labels = self.axes[axis].take(indexer)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2195, in take
    return self._shallow_copy(taken)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/range.py", line 267, in _shallow_copy
    return self._int64index._shallow_copy(values, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/numeric.py", line 68, in _shallow_copy
    return self._shallow_copy_with_infer(values=values, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 538, in _shallow_copy_with_infer
    if not len(values) and 'dtype' not in kwargs:
TypeError: object of type 'numpy.int64' has no len()

Related issues:
https://github.com/pytorch/pytorch/issues/10165
https://github.com/pytorch/pytorch/pull/9237
https://github.com/pandas-dev/pandas/issues/21946

相关问题:
https : //github.com/pytorch/pytorch/issues/10165
https://github.com/pytorch/pytorch/pull/9237
https://github.com/pandas-dev/pandas/issues/21946

Questions:
How to workaround pandasissue here?

问题:
如何解决pandas这里的问题?

采纳答案by Sarit

Reference:
https://github.com/pytorch/pytorch/issues/9211

参考:https :
//github.com/pytorch/pytorch/issues/9211

Just add .tolist()to indicesline.

只需添加.tolist()indices行。

def random_split(dataset, lengths):
    """
    Randomly split a dataset into non-overlapping new datasets of given lengths.
    Arguments:
        dataset (Dataset): Dataset to be split
        lengths (sequence): lengths of splits to be produced
    """
    if sum(lengths) != len(dataset):
        raise ValueError("Sum of input lengths does not equal the length of the input dataset!")

    indices = randperm(sum(lengths)).tolist()
    return [Subset(dataset, indices[offset - length:offset]) for offset, length in zip(_accumulate(lengths), lengths)]

回答by Anjum Sayed

I think the issue is that after using random_split, indexis now a torch.Tensorrather than an int. I found that adding a quick type check to __getitem__and then using .item()on the tensor works for me:

我认为问题是使用后random_splitindex现在是一个torch.Tensor而不是一个int. 我发现添加快速类型检查__getitem__然后.item()在张量上使用对我有用:

def __getitem__(self, index):

    if type(index) == torch.Tensor:
        index = index.item()

    x = torch.tensor(self.x_data.iloc[index].values, dtype=torch.float)
    y = torch.tensor(self.y_data.iloc[index], dtype=torch.float)
    return (x, y)

Source: https://discuss.pytorch.org/t/issues-with-torch-utils-data-random-split/22298/8

来源:https: //discuss.pytorch.org/t/issues-with-torch-utils-data-random-split/22298/8

回答by trsvchn

Why not simply to try:

为什么不简单地尝试:

self.len = len(self.x_data)

lenworks fine with pandasDataFramew/o conversion to array or tensor.

len正常工作与pandasDataFrameW / O转换到阵列或张量。

回答by Andrew Schreiber

I solved the issue by upgrading my version of PyTorch to version 1.3.

我通过将我的 PyTorch 版本升级到 1.3 版解决了这个问题。

https://pytorch.org/get-started/locally/

https://pytorch.org/get-started/locally/