Python 将 Pandas 数据帧转换为 PyTorch 张量?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50307707/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:27:21  来源:igfitidea点击:

Convert Pandas dataframe to PyTorch tensor?

pythonpandasdataframepytorch

提问by M. Fabio

I want to train a simple neural network on PyTorch using a personal database. This database is imported from an Excel file and stored in df.

我想使用个人数据库在 PyTorch 上训练一个简单的神经网络。该数据库从 Excel 文件导入并存储在df.

One of the columns is named "Target", and it is the target variable of the network. How can i use this data frame as an input for the PyTorch neural network?

其中一列名为"Target",它是网络的目标变量。如何使用此数据框作为 PyTorch 神经网络的输入?

I tried this, but it doesn't work:

我试过这个,但它不起作用:

target = pd.DataFrame(data = df['Target'])
train = data_utils.TensorDataset(df, target)
train_loader = data_utils.DataLoader(train, batch_size = 10, shuffle = True)

采纳答案by MBT

I'm referring to the question in the title as you haven't really specified anything else in the text, so just converting the DataFrame into a PyTorch tensor.

我指的是标题中的问题,因为您还没有在文本中真正指定任何其他内容,因此只需将 DataFrame 转换为 PyTorch 张量。

Without information about your data, I'm just taking float values as example targets here.

没有关于您的数据的信息,我只是在这里将浮点值作为示例目标。

Convert Pandas dataframe to PyTorch tensor?

将 Pandas 数据帧转换为 PyTorch 张量?

import pandas as pd
import torch
import random

# creating dummy targets (float values)
targets_data = [random.random() for i in range(10)]

# creating DataFrame from targets_data
targets_df = pd.DataFrame(data=targets_data)
targets_df.columns = ['targets']

# creating tensor from targets_df 
torch_tensor = torch.tensor(targets_df['targets'].values)

# printing out result
print(torch_tensor)

Output:

输出:

tensor([ 0.5827,  0.5881,  0.1543,  0.6815,  0.9400,  0.8683,  0.4289,
         0.5940,  0.6438,  0.7514], dtype=torch.float64)

Tested with Pytorch 0.4.0.

使用 Pytorch 0.4.0 测试。

I hope this helps, if you have any further questions - just ask. :)

我希望这会有所帮助,如果您有任何其他问题 - 就问吧。:)

回答by Allen

Maybe try this to see if it can fix your problem(based on your sample code)?

也许试试这个,看看它是否可以解决你的问题(基于你的示例代码)?

train_target = torch.tensor(train['Target'].values.astype(np.float32))
train = torch.tensor(train.drop('Target', axis = 1).values.astype(np.float32)) 
train_tensor = data_utils.TensorDataset(train, train_target) 
train_loader = data_utils.DataLoader(dataset = train_tensor, batch_size = batch_size, shuffle = True)

回答by Gaurav Shrivastava

Simply convert the pandas dataframe -> numpy array -> pytorch tensor. An example of this is described below:

只需将pandas dataframe -> numpy array -> pytorch tensor. 下面描述了一个例子:

import pandas as pd
import numpy as np
import torch

df = pd.read_csv('train.csv')
target = pd.DataFrame(df['target'])
del df['target']
train = data_utils.TensorDataset(torch.Tensor(np.array(df)), torch.Tensor(np.array(target)))
train_loader = data_utils.DataLoader(train, batch_size = 10, shuffle = True)

Hopefully, this will help you to create your own datasets using pytorch (Compatible with the latest version of pytorch).

希望这将帮助您使用 pytorch(与最新版本的 pytorch 兼容)创建自己的数据集。

回答by Anh-Thi DINH

You can use below functions to convert any dataframe or pandas series to a pytorch tensor

您可以使用以下函数将任何数据帧或熊猫系列转换为 pytorch 张量

import pandas as pd
import torch

# determine the supported device
def get_device():
    if torch.cuda.is_available():
        device = torch.device('cuda:0')
    else:
        device = torch.device('cpu') # don't have GPU 
    return device

# convert a df to tensor to be used in pytorch
def df_to_tensor(df):
    device = get_device()
    return torch.from_numpy(df.values).float().to(device)

df_tensor = df_to_tensor(df)
series_tensor = df_to_tensor(series)

回答by Rahul Moozhikkal

@MBT Has already given the correct answer here.I will add some extra information for future readers.

@MBT 已经在这里给出了正确答案。我会为未来的读者添加一些额外的信息。

  1. torch.tensor(train['Target'].values.astype(np.float32))will work in this example as your data can casted be as that float32type. In case your data was "objects" type the torch.tensormethod would throw an error and this method will not work.
  1. torch.tensor(train['Target'].values.astype(np.float32))将在此示例中工作,因为您的数据可以转换为该float32类型。如果您的数据是“对象”类型,则该torch.tensor方法将引发错误并且该方法将不起作用。

2.You are not converting a dataframe when you call df['target']rather it is a series.

2.您在调用时不是在转换数据帧,df['target']而是在转换一个系列。