Python 如何将 CSV 数据读入 NumPy 中的记录数组？

Question

提问by hatmatrix

I wonder if there is a direct way to import the contents of a CSV file into a record array, much in the way that R's read.table(), read.delim(), and read.csv()family imports data to R's data frame?

我不知道是否有一个CSV文件的内容导入到一个记录阵列直接的方式，很多的方式是R的read.table()，read.delim()和read.csv()家庭的进口数据与R的数据帧？

Or is the best way to use csv.reader()and then apply something like numpy.core.records.fromrecords()?

或者是使用csv.reader()然后应用类似的最好方法numpy.core.records.fromrecords()？

Answer 1

采纳答案by Andrew

You can use Numpy's genfromtxt()method to do so, by setting the delimiterkwarg to a comma.

您可以使用 Numpy 的genfromtxt()方法来执行此操作，方法是将delimiterkwarg设置为逗号。

from numpy import genfromtxt
my_data = genfromtxt('my_file.csv', delimiter=',')

More information on the function can be found at its respective documentation.

有关该功能的更多信息，请参见其各自的文档。

Answer 2

回答by btel

You can also try recfromcsv()which can guess data types and return a properly formatted record array.

您还可以尝试recfromcsv()哪些可以猜测数据类型并返回格式正确的记录数组。

Answer 3

回答by atomh33ls

I would recommend the read_csvfunction from the pandaslibrary:

我会推荐库中的read_csv函数pandas：

import pandas as pd
df=pd.read_csv('myfile.csv', sep=',',header=None)
df.values
array([[ 1. ,  2. ,  3. ],
       [ 4. ,  5.5,  6. ]])

This gives a pandas DataFrame- allowing many useful data manipulation functions which are not directly available with numpy record arrays.

这提供了一个Pandas DataFrame- 允许许多有用的数据操作函数，这些函数不能直接用于 numpy 记录数组。

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table...

DataFrame 是一种二维标记数据结构，具有可能不同类型的列。你可以把它想象成一个电子表格或 SQL 表......

I would also recommend genfromtxt. However, since the question asks for a record array, as opposed to a normal array, the dtype=Noneparameter needs to be added to the genfromtxtcall:

我也会推荐genfromtxt。但是，由于问题要求记录数组，而不是普通数组，因此dtype=None需要将参数添加到genfromtxt调用中：

Given an input file, myfile.csv:

给定一个输入文件，myfile.csv：

1.0, 2, 3
4, 5.5, 6

import numpy as np
np.genfromtxt('myfile.csv',delimiter=',')

gives an array:

给出一个数组：

array([[ 1. ,  2. ,  3. ],
       [ 4. ,  5.5,  6. ]])

and

和

np.genfromtxt('myfile.csv',delimiter=',',dtype=None)

gives a record array:

给出一个记录数组：

array([(1.0, 2.0, 3), (4.0, 5.5, 6)], 
      dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<i4')])

This has the advantage that file with multiple data types (including strings) can be easily imported.

这样做的优点是可以轻松导入具有多种数据类型（包括字符串）的文件。

Answer 4

回答by William komp

I timed the

我计时了

from numpy import genfromtxt
genfromtxt(fname = dest_file, dtype = (<whatever options>))

versus

相对

import csv
import numpy as np
with open(dest_file,'r') as dest_f:
    data_iter = csv.reader(dest_f,
                           delimiter = delimiter,
                           quotechar = '"')
    data = [data for data in data_iter]
data_array = np.asarray(data, dtype = <whatever options>)

on 4.6 million rows with about 70 columns and found that the NumPy path took 2 min 16 secs and the csv-list comprehension method took 13 seconds.

在大约 70 列的 460 万行上，发现 NumPy 路径需要 2 分 16 秒，而 csv-list 理解方法需要 13 秒。

I would recommend the csv-list comprehension method as it is most likely relies on pre-compiled libraries and not the interpreter as much as NumPy. I suspect the pandas method would have similar interpreter overhead.

我会推荐 csv-list 理解方法，因为它很可能依赖于预编译的库，而不是像 NumPy 那样多的解释器。我怀疑 pandas 方法会有类似的解释器开销。

Answer 5

回答by chamzz.dot

You can use this code to send CSV file data into an array:

您可以使用此代码将 CSV 文件数据发送到数组中：

import numpy as np
csv = np.genfromtxt('test.csv', delimiter=",")
print(csv)

Answer 6

回答by muTheTechie

I tried this:

我试过这个：

import pandas as p
import numpy as n

closingValue = p.read_csv("<FILENAME>", usecols=[4], dtype=float)
print(closingValue)

Answer 7

回答by HVNSweeting

As I tried both ways using NumPy and Pandas, using pandas has a lot of advantages:

当我尝试使用 NumPy 和 Pandas 两种方式时，使用 Pandas 有很多优点：

Faster
Less CPU usage
1/3 RAM usage compared to NumPy genfromtxt

快点
更少的 CPU 使用率
与 NumPy genfromtxt 相比，RAM 使用量减少了 1/3

This is my test code:

这是我的测试代码：

$ for f in test_pandas.py test_numpy_csv.py ; do  /usr/bin/time python $f; done
2.94user 0.41system 0:03.05elapsed 109%CPU (0avgtext+0avgdata 502068maxresident)k
0inputs+24outputs (0major+107147minor)pagefaults 0swaps

23.29user 0.72system 0:23.72elapsed 101%CPU (0avgtext+0avgdata 1680888maxresident)k
0inputs+0outputs (0major+416145minor)pagefaults 0swaps

test_numpy_csv.py

from numpy import genfromtxt
train = genfromtxt('/home/hvn/me/notebook/train.csv', delimiter=',')

test_pandas.py

from pandas import read_csv
df = read_csv('/home/hvn/me/notebook/train.csv')

Data file:

数据文件：

du -h ~/me/notebook/train.csv
 59M    /home/hvn/me/notebook/train.csv

With NumPy and pandas at versions:

使用 NumPy 和 pandas 版本：

$ pip freeze | egrep -i 'pandas|numpy'
numpy==1.13.3
pandas==0.20.2

Answer 8

回答by Xiaojian Chen

Using numpy.loadtxt

使用 numpy.loadtxt

A quite simple method. But it requires all the elements being float (int and so on)

一个很简单的方法。但它要求所有元素都是浮动的（int 等）

import numpy as np 
data = np.loadtxt('c:\1.csv',delimiter=',',skiprows=0)

Answer 9

回答by Matthew Park

This is the easiest way:

这是最简单的方法：

import csv with open('testfile.csv', newline='') as csvfile: data = list(csv.reader(csvfile))

Now each entry in data is a record, represented as an array. So you have a 2D array. It saved me so much time.

现在 data 中的每个条目都是一条记录，表示为一个数组。所以你有一个二维数组。它为我节省了很多时间。

Answer 10

回答by Jatin Mandav

I would suggest using tables (pip3 install tables). You can save your .csvfile to .h5using pandas (pip3 install pandas),

我建议使用表格 ( pip3 install tables)。您可以使用 pandas ( )保存.csv文件，.h5pip3 install pandas

import pandas as pd
data = pd.read_csv("dataset.csv")
store = pd.HDFStore('dataset.h5')
store['mydata'] = data
store.close()

You can then easily, and with less time even for huge amount of data, load your data in a NumPy array.

然后，即使对于大量数据，您也可以轻松且用更少的时间将数据加载到NumPy 数组中。

import pandas as pd
store = pd.HDFStore('dataset.h5')
data = store['mydata']
store.close()

# Data in NumPy format
data = data.values

Python 如何将 CSV 数据读入 NumPy 中的记录数组？

提问by hatmatrix

采纳答案by Andrew

回答by btel

回答by atomh33ls

回答by William komp

回答by chamzz.dot

回答by muTheTechie

回答by HVNSweeting

test_numpy_csv.py

test_numpy_csv.py

test_pandas.py

test_pandas.py

Data file:

数据文件：

回答by Xiaojian Chen

回答by Matthew Park

回答by Jatin Mandav

相关推荐

最近更新

标签

Python 如何将 CSV 数据读入 NumPy 中的记录数组？

提问by hatmatrix

采纳答案by Andrew

回答by btel

回答by atomh33ls

回答by William komp

回答by chamzz.dot

回答by muTheTechie

回答by HVNSweeting

test_numpy_csv.py

test_numpy_csv.py

test_pandas.py

test_pandas.py

Data file:

数据文件：

回答by Xiaojian Chen

回答by Matthew Park

回答by Jatin Mandav

相关推荐

Python中的文本移位功能

Python 仅包含年和月的日期对象

Python 检查变量是否为整数

Python 捕获“socket.error: [Errno 111] 连接被拒绝”异常

相关推荐

最近更新

标签