Python 使用 numpy.loadtxt 加载包含浮点数和字符串的文本文件

Question

提问by VeilEclipse

I have a text file, data.txt, which contains:

我有一个文本文件data.txt，其中包含：

5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
5.8,2.7,4.1,1.0,Iris-versicolor
6.2,2.2,4.5,1.5,Iris-versicolor
6.4,3.1,5.5,1.8,Iris-virginica
6.0,3.0,4.8,1.8,Iris-virginica

How do I load this data using numpy.loadtxt()so that I get a NumPy array after loading such as [['5.1' '3.5' '1.4' '0.2' 'Iris-setosa'] ['4.9' '3.0' '1.4' '0.2' 'Iris-setosa'] ...]?

如何使用加载此数据numpy.loadtxt()以便在加载后获得 NumPy 数组，例如[['5.1' '3.5' '1.4' '0.2' 'Iris-setosa'] ['4.9' '3.0' '1.4' '0.2' 'Iris-setosa'] ...]？

I tried

我试过

np.loadtxt(open("data.txt"), 'r',
           dtype={
               'names': (
                   'sepal length', 'sepal width', 'petal length',
                   'petal width', 'label'),
               'formats': (
                   np.float, np.float, np.float, np.float, np.str)},
           delimiter= ',', skiprows=0)

Answer 1

采纳答案by unutbu

If you use np.genfromtxt, you could specify dtype=None, which will tell genfromtxtto intelligently guess the dtype of each column. Most conveniently, it relieves you of the burder of specifying the number of bytes required for the string column. (Omitting the number of bytes, by specifying e.g. np.str, does not work.)

如果您使用np.genfromtxt，您可以指定dtype=None，这将告诉genfromtxt智能猜测每列的 dtype。最方便的是，它减轻了您指定字符串列所需字节数的麻烦。（通过指定 eg 来省略字节数是np.str行不通的。）

In [58]: np.genfromtxt('data.txt', delimiter=',', dtype=None, names=('sepal length', 'sepal width', 'petal length', 'petal width', 'label'))
Out[58]: 
array([(5.1, 3.5, 1.4, 0.2, 'Iris-setosa'),
       (4.9, 3.0, 1.4, 0.2, 'Iris-setosa'),
       (5.8, 2.7, 4.1, 1.0, 'Iris-versicolor'),
       (6.2, 2.2, 4.5, 1.5, 'Iris-versicolor'),
       (6.4, 3.1, 5.5, 1.8, 'Iris-virginica'),
       (6.0, 3.0, 4.8, 1.8, 'Iris-virginica')], 
      dtype=[('sepal_length', '<f8'), ('sepal_width', '<f8'), ('petal_length', '<f8'), ('petal_width', '<f8'), ('label', 'S15')])

If you do want to use np.loadtxt, then to fix your code with minimal changes, you could use:

如果您确实想使用np.loadtxt，然后以最少的更改修复您的代码，您可以使用：

np.loadtxt("data.txt",
   dtype={'names': ('sepal length', 'sepal width', 'petal length', 'petal width', 'label'),
          'formats': (np.float, np.float, np.float, np.float, '|S15')},
   delimiter=',', skiprows=0)

The main difference is simply changing np.strto |S15(a 15-byte string).

主要区别只是更改np.str为|S15（一个 15 字节的字符串）。

Also note that open("data.txt"), 'r'should be open("data.txt", 'r'). But since np.loadtxtcan accept a filename, you don't really need to use openat all.

还要注意， open("data.txt"), 'r'应该是open("data.txt", 'r'). 但是由于np.loadtxt可以接受文件名，因此您根本不需要使用open。

Answer 2

回答by mauve

It seems that keeping the numbers and text together has been causing you so much trouble - if you end up deciding to separate them, my workaround is:

似乎将数字和文本放在一起给您带来了很多麻烦 - 如果您最终决定将它们分开，我的解决方法是：

values = np.loadtxt('data', delimiter=',', usecols=[0,1,2,3])
labels = np.loadtxt('data', delimiter=',', usecols=[4])

Python 使用 numpy.loadtxt 加载包含浮点数和字符串的文本文件

提问by VeilEclipse

采纳答案by unutbu

回答by mauve

相关推荐

最近更新

标签

Python 使用 numpy.loadtxt 加载包含浮点数和字符串的文本文件

提问by VeilEclipse

采纳答案by unutbu

回答by mauve

相关推荐

Python Scapy 数据包嗅探器在每个嗅探到的数据包上触发一个动作

Python 使用 Pandas 在同一图中绘制分组数据

在 Python 中合并多个 JSON 文件的问题

Python 模拟一个函数来引发一个异常来测试一个except块

相关推荐

最近更新

标签