Python 使用 numpy.loadtxt 加载包含浮点数和字符串的文本文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23546349/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:07:20  来源:igfitidea点击:

Loading text file containing both float and string using numpy.loadtxt

pythonpython-2.7python-3.xnumpy

提问by VeilEclipse

I have a text file, data.txt, which contains:

我有一个文本文件data.txt,其中包含:

5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
5.8,2.7,4.1,1.0,Iris-versicolor
6.2,2.2,4.5,1.5,Iris-versicolor
6.4,3.1,5.5,1.8,Iris-virginica
6.0,3.0,4.8,1.8,Iris-virginica

How do I load this data using numpy.loadtxt()so that I get a NumPy array after loading such as [['5.1' '3.5' '1.4' '0.2' 'Iris-setosa'] ['4.9' '3.0' '1.4' '0.2' 'Iris-setosa'] ...]?

如何使用加载此数据numpy.loadtxt()以便在加载后获得 NumPy 数组,例如[['5.1' '3.5' '1.4' '0.2' 'Iris-setosa'] ['4.9' '3.0' '1.4' '0.2' 'Iris-setosa'] ...]

I tried

我试过

np.loadtxt(open("data.txt"), 'r',
           dtype={
               'names': (
                   'sepal length', 'sepal width', 'petal length',
                   'petal width', 'label'),
               'formats': (
                   np.float, np.float, np.float, np.float, np.str)},
           delimiter= ',', skiprows=0)

采纳答案by unutbu

If you use np.genfromtxt, you could specify dtype=None, which will tell genfromtxtto intelligently guess the dtype of each column. Most conveniently, it relieves you of the burder of specifying the number of bytes required for the string column. (Omitting the number of bytes, by specifying e.g. np.str, does not work.)

如果您使用np.genfromtxt,您可以指定dtype=None,这将告诉genfromtxt智能猜测每列的 dtype。最方便的是,它减轻了您指定字符串列所需字节数的麻烦。(通过指定 eg 来省略字节数是np.str行不通的。)

In [58]: np.genfromtxt('data.txt', delimiter=',', dtype=None, names=('sepal length', 'sepal width', 'petal length', 'petal width', 'label'))
Out[58]: 
array([(5.1, 3.5, 1.4, 0.2, 'Iris-setosa'),
       (4.9, 3.0, 1.4, 0.2, 'Iris-setosa'),
       (5.8, 2.7, 4.1, 1.0, 'Iris-versicolor'),
       (6.2, 2.2, 4.5, 1.5, 'Iris-versicolor'),
       (6.4, 3.1, 5.5, 1.8, 'Iris-virginica'),
       (6.0, 3.0, 4.8, 1.8, 'Iris-virginica')], 
      dtype=[('sepal_length', '<f8'), ('sepal_width', '<f8'), ('petal_length', '<f8'), ('petal_width', '<f8'), ('label', 'S15')])


If you do want to use np.loadtxt, then to fix your code with minimal changes, you could use:

如果您确实想使用np.loadtxt,然后以最少的更改修复您的代码,您可以使用:

np.loadtxt("data.txt",
   dtype={'names': ('sepal length', 'sepal width', 'petal length', 'petal width', 'label'),
          'formats': (np.float, np.float, np.float, np.float, '|S15')},
   delimiter=',', skiprows=0)

The main difference is simply changing np.strto |S15(a 15-byte string).

主要区别只是更改np.str|S15(一个 15 字节的字符串)。

Also note that open("data.txt"), 'r'should be open("data.txt", 'r'). But since np.loadtxtcan accept a filename, you don't really need to use openat all.

还要注意, open("data.txt"), 'r'应该是open("data.txt", 'r'). 但是由于np.loadtxt可以接受文件名,因此您根本不需要使用open

回答by mauve

It seems that keeping the numbers and text together has been causing you so much trouble - if you end up deciding to separate them, my workaround is:

似乎将数字和文本放在一起给您带来了很多麻烦 - 如果您最终决定将它们分开,我的解决方法是:

values = np.loadtxt('data', delimiter=',', usecols=[0,1,2,3])
labels = np.loadtxt('data', delimiter=',', usecols=[4])