pandas 从字符串列表创建熊猫数据框

Question

提问by user308827

I have the foll. list:

我有一个愚蠢的。列表：

list_vals = ['col_a col_B col_C', '12.0 34.0 10.0', '15.0 111.0 23']

How can I convert it into a pandas dataframe?

如何将其转换为Pandas数据框？

I can start like this:

我可以这样开始：

df = pd.DataFrame(columns=list_vals[0].split())

Is there a way to populate rest of dataframe?

有没有办法填充数据框的其余部分？

Answer 1

回答by DSM

You could use io.StringIOto feed a string into read_csv:

您可以使用io.StringIO将字符串输入到read_csv：

In [23]: pd.read_csv(io.StringIO('\n'.join(list_vals)), delim_whitespace=True)
Out[23]: 
   col_a  col_B  col_C
0   12.0   34.0   10.0
1   15.0  111.0   23.0

This has the advantage that it automatically does the type interpretation that pandas would do if you were reading an ordinary csv-- the columns are floats:

这样做的优点是它会自动执行Pandas在阅读普通 csv 时会做的类型解释——列是浮点数：

In [24]: _.dtypes
Out[24]: 
col_a    float64
col_B    float64
col_C    float64
dtype: object

While you could just feed your list into the DataFrame constructor directly, everything would stay strings:

虽然您可以直接将列表输入到 DataFrame 构造函数中，但一切都将保持字符串：

In [21]: pd.DataFrame(columns=list_vals[0].split(), 
                      data=[row.split() for row in list_vals[1:]])
Out[21]: 
  col_a  col_B col_C
0  12.0   34.0  10.0
1  15.0  111.0    23

In [22]: _.dtypes
Out[22]: 
col_a    object
col_B    object
col_C    object
dtype: object

We could add dtype=floatto fix this, of course, but we might have mixed types, which the read_csvapproach would handle in the usual way and here we'd have to do manually.

dtype=float当然，我们可以添加来解决这个问题，但我们可能有混合类型，该read_csv方法将以通常的方式处理，在这里我们必须手动进行。

Answer 2

回答by AChampion

You can do it by converting to your data to dict, e.g.:

您可以通过将数据转换为 dict 来实现，例如：

>>> pd.DataFrame({a: b for a, *b in (zip(*map(str.split, list_vals)))})
   col_B col_C col_a
0   34.0  10.0  12.0
1  111.0    23  15.0

Or with your original order:

或者使用您的原始订单：

>>> pd.DataFrame({a: b for a, *b in (zip(*map(str.split, list_vals)))},
...              columns=list_vals[0].split())
  col_a  col_B col_C
0  12.0   34.0  10.0
1  15.0  111.0    23

Answer 3

回答by Mike T

You can read this as a numpy structured array, then pass it over to pandas. This is useful if you also need to work with numpy, and have the data types defined before reading (otherwise numpy is a step back to work with compared to pandas).

您可以将其作为numpy 结构化数组读取，然后将其传递给Pandas。如果您还需要使用 numpy，并在读取之前定义数据类型，这将非常有用（否则，与 Pandas 相比，numpy 是一个退步工作）。

import numpy as np
import pandas as pd

list_vals = ['col_a col_B col_C', '12.0 34.0 10.0', '15.0 111.0 23']

# Gather names from first line, assume all column types are 'd' (i.e. float)
list_dtype = np.dtype([(name, 'd') for name in list_vals[0].split()])

# Create a numpy structured array
ar = np.fromiter((tuple(x.split()) for x in list_vals[1:]), dtype=list_dtype)

# Now convert it to a pandas DataFrame
dat = pd.DataFrame(ar)

pandas 从字符串列表创建熊猫数据框

提问by user308827

回答by DSM

回答by AChampion

回答by Mike T

相关推荐

最近更新

标签

pandas 从字符串列表创建熊猫数据框

提问by user308827

回答by DSM

回答by AChampion

回答by Mike T

相关推荐

pandas 熊猫需要关闭连接吗？

将包含列表的 Pandas 列“unstack”成多行

pandas 如何加载excel表并清理python中的数据？

使用 groupby 划分两列的 Pandas

相关推荐

最近更新

标签