使用 Pandas 读取带有 numpy 数组的 csv

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30930541/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:30:02  来源:igfitidea点击:

Read a csv with numpy array using pandas

pythoncsvnumpypandas

提问by VeilEclipse

I have a csvfile with 3 columns emotion, pixels, Usageconsisting of 35000rows e.g. 0,70 23 45 178 455,Training.

我有一个csv由 3 列emotion, pixels, Usage组成的文件,35000例如0,70 23 45 178 455,Training.

I used pandas.read_csvto read the csvfile as pd.read_csv(filename, dtype={'emotion':np.int32, 'pixels':np.int32, 'Usage':str}).

我曾经pandas.read_csvcsv文件读取为pd.read_csv(filename, dtype={'emotion':np.int32, 'pixels':np.int32, 'Usage':str}).

When I try the above, it says ValueError: invalid literal for long() with base 10: '70 23 45 178 455'? How do i read the pixels columns as a numpyarray?

当我尝试上述操作时,它说ValueError: invalid literal for long() with base 10: '70 23 45 178 455'?我如何将像素列作为numpy数组读取?

回答by Anand S Kumar

Please try the below code instead -

请尝试以下代码 -

df = pd.read_csv(filename, dtype={'emotion':np.int32, 'pixels':str, 'Usage':str})

def makeArray(text):
    return np.fromstring(text,sep=' ')

df['pixels'] = df['pixels'].apply(makeArray)

回答by EdChum

It will be faster I believe to use the vectorised strmethod to split the string and create the new pixel columns as desired and concatthe new columns to the new df:

我相信使用矢量化str方法拆分字符串并根据需要创建新像素列和concat新列到新 df会更快:

In [175]:
# load the data
import pandas as pd
import io
t="""emotion,pixels,Usage
0,70 23 45 178 455,Training"""
df = pd.read_csv(io.StringIO(t))
df

Out[175]:
   emotion            pixels     Usage
0        0  70 23 45 178 455  Training

In [177]:
# now split the string and concat column-wise with the orig df
df = pd.concat([df, df['pixels'].str.split(expand=True).astype(int)], axis=1)
df
Out[177]:
   emotion            pixels     Usage   0   1   2    3    4
0        0  70 23 45 178 455  Training  70  23  45  178  455

If you specifically want a flat np array you can just call the .valuesattribute:

如果你特别想要一个平面 np 数组,你可以调用.values属性:

In [181]:
df['pixels'].str.split(expand=True).astype(int).values

Out[181]:
array([[ 70,  23,  45, 178, 455]])

回答by Sanchari Dan

I encountered the same problem and figured out a hack. Save your datafrae as a .npyfile. While loading it, it will be loaded as an ndarray. You can the use pandas.DataFrameto convert the ndarray to a dataframe for your use. I found this solution to be easier than converting from string fields. Sample code below:

我遇到了同样的问题并想出了一个黑客。将您的数据帧保存为.npy文件。加载时,它将作为ndarray. 您可以使用pandas.DataFrame将 ndarray 转换为数据帧供您使用。我发现这个解决方案比从字符串字段转换更容易。示例代码如下:

import numpy as np
import pandas as pd
np.save('file_name.npy',dataframe_to_be_saved)
#the dataframe is saved in 'file_name.npy' in your current working directory

#loading the saved file into an ndarray
arr=np.load('file_name.npy')
df=pd.DataFrame(data=arr[:,1:],index=arr[:,0],columns=column_names)

#df variable now stores your dataframe with the original datatypes