在 Python 中从文本文件导入数据和变量名称

Question

提问by Michal

I have a text file containing simulation data (60 columns, 100k rows):

我有一个包含模拟数据的文本文件（60 列，100k 行）：

... where in the first row are variable names, and beneath (in columns) is the corresponding data (float type).

...其中第一行是变量名，下面（列中）是相应的数据（浮点型）。

I need to use all these variables with their data in Python for further calculations. For example, when I insert:

我需要使用所有这些变量及其在 Python 中的数据进行进一步计算。例如，当我插入：

print(b)

I need to receive the values from the second column.

我需要从第二列接收值。

I know how to import data:

我知道如何导入数据：

data=np.genfromtxt("1.txt", unpack=True, skiprows = 1)

Assign variables "manually":

“手动”分配变量：

a,b,c=np.genfromtxt("1.txt", unpack=True, skiprows = 1)

But I'm having trouble with getting variable names:

但是我在获取变量名时遇到了麻烦：

reader = csv.reader(open("1.txt", "rt"))
for row in reader: 
   list.append(row)
variables=(list[0])

How can I change this code to get all variable names from the first row and assign them to the imported arrays ?

如何更改此代码以从第一行获取所有变量名称并将它们分配给导入的数组？

Answer 1

采纳答案by andyg0808

Instead of trying to assign names, you might think about using an associative array, which is known in Python as a dict, to store your variables and their values. The code could then look something like this (borrowing liberally from the csvdocs):

您可以考虑使用关联数组（在 Python 中称为 a dict）来存储变量及其值，而不是尝试分配名称。然后代码看起来像这样（从csv文档中大量借用）：

import csv
with open('1.txt', 'rt') as f:
  reader = csv.reader(f, delimiter=' ', skipinitialspace=True)

  lineData = list()

  cols = next(reader)
  print(cols)

  for col in cols:
    # Create a list in lineData for each column of data.
    lineData.append(list())


  for line in reader:
    for i in xrange(0, len(lineData)):
      # Copy the data from the line into the correct columns.
      lineData[i].append(line[i])

  data = dict()

  for i in xrange(0, len(cols)):
    # Create each key in the dict with the data in its column.
    data[cols[i]] = lineData[i]

print(data)

datathen contains each of your variables, which can be accessed via data['varname'].

data然后包含您的每个变量，可以通过data['varname'].

So, for example, you could do data['a']to get the list ['1', '2', '3', '4']given the input provided in your question.

因此，例如，您可以data['a']根据问题中['1', '2', '3', '4']提供的输入来获取列表。

I think trying to create names based on data in your document might be a rather awkward way to do this, compared to the dict-based method shown above. If you really want to do that, though, you might look into reflectionin Python (a subject I don't really know anything about).

我认为，与上面显示的基于 dict 的方法相比，尝试根据文档中的数据创建名称可能是一种相当笨拙的方法。不过，如果你真的想这样做，你可能会研究Python 中的反射（我对这个主题一无所知）。

Answer 2

回答by Zero Piraeus

The answer is: you don't want to do that.

答案是：你不想那样做。

Dictionaries are designed for exactly this purpose: the data structure you actuallywant is going to be something like:

字典正是为此目的而设计的：你真正想要的数据结构将是这样的：

data = {
    "a": [1, 2, 3, 4],
    "b": [11, 22, 33, 44],
    "c": [111, 222, 333, 444],
}

... which you can then easily access using e.g. data["a"].

...然后您可以使用例如轻松访问data["a"]。

It's possibleto do what you want, but the usual way is a hack which relies on the fact that Python uses (drumroll)a dictinternally to store variables - and since your code won't know the names of those variables, you'll be stuck using dictionary access to get at them as well ... so you might as well just use a dictionary in the first place.

它可以做你想做的，但通常的方式是依赖于一个事实，即Python使用黑客攻击（击鼓声）一dict内部存储变量-既然你的代码不会知道那些变量的名字，你会也坚持使用字典访问来获取它们......所以你不妨首先使用字典。

It's worth pointing out that this is deliberatelymade difficult in Python, because if your code doesn't know the names of your variables, they are by definition data rather than logic, and should be treated as such.

值得指出的是，这在 Python 中是故意使困难的，因为如果您的代码不知道变量的名称，那么它们是定义数据而不是逻辑，并且应该被如此对待。

In case you aren't convinced yet, here's a good article on this subject:

如果你还不相信，这里有一篇关于这个主题的好文章：

Stupid Python Ideas: Why you don't want to dynamically create variables

愚蠢的 Python 想法：为什么不想动态创建变量

Answer 3

回答by Michal

Thanks to @andyg0808 and @Zero Piraeus I have found another solution. For me, the most appropriate - using Pandas Data Analysis Library.

感谢@andyg0808 和@Zero Piraeus，我找到了另一个解决方案。对我来说，最合适的——使用 Pandas 数据分析库。

   import pandas as pd

   data=pd.read_csv("1.txt",
           delim_whitespace=True,
           skipinitialspace=True)

  result=data["a"]*data["b"]*3
  print(result)

  0     33
  1    132
  2    297
  3    528

...where 0,1,2,3 are the row index.

...其中 0,1,2,3 是行索引。

Answer 4

回答by Austin Downey

Here is a simple way to convert a .txt file of variable names and data to NumPy arrays.

这是将变量名称和数据的 .txt 文件转换为 NumPy 数组的简单方法。

D = np.genfromtxt('1.txt',dtype='str')    # load the data in as strings
D_data = np.asarray(D[1::,:],dtype=float) # convert the data to floats
D_names = D[0,:]                          # save a list of the variable names

for i in range(len(D_names)):
    key = D_names[i]                      # define the key for this variable 
    val = D_data[:,i]                     # set the value for this variable 
    exec(key + '=val')                    # build the variable  code here

I like this method because it is easy to follow and simple to maintain. We can compact this code as follows:

我喜欢这种方法，因为它易于遵循且易于维护。我们可以将这段代码压缩如下：

D = np.genfromtxt('1.txt',dtype='str')     # load the data in as strings
for i in range(D.shape[1]):
    val = np.asarray(D[1::,i],dtype=float) # set the value for this variable 
    exec(D[0,i] + '=val')                  # build the variable

Both codes do the same thing, return NumPy arrays named a,b, and c with their associated data.

两个代码都做同样的事情，返回名为 a、b 和 c 的 NumPy 数组及其关联数据。

在 Python 中从文本文件导入数据和变量名称

提问by Michal

采纳答案by andyg0808

回答by Zero Piraeus

回答by Michal

回答by Austin Downey

相关推荐

最近更新

标签

在 Python 中从文本文件导入数据和变量名称

提问by Michal

采纳答案by andyg0808

回答by Zero Piraeus

回答by Michal

回答by Austin Downey

相关推荐

Python PySpark 中的列过滤

Python 线程和多处理模块之间有什么区别？

使用 ^ 匹配 Python 正则表达式中的行首

Python：捕捉 Ctrl-C 命令。提示“真的要退出（y/n）”，如果没有则继续执行

相关推荐

最近更新

标签