pandas 从字符串列表创建熊猫数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42171709/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Creating pandas dataframe from a list of strings
提问by user308827
I have the foll. list:
我有一个愚蠢的。列表:
list_vals = ['col_a col_B col_C', '12.0 34.0 10.0', '15.0 111.0 23']
How can I convert it into a pandas dataframe?
如何将其转换为Pandas数据框?
I can start like this:
我可以这样开始:
df = pd.DataFrame(columns=list_vals[0].split())
Is there a way to populate rest of dataframe?
有没有办法填充数据框的其余部分?
回答by DSM
You could use io.StringIO
to feed a string into read_csv
:
您可以使用io.StringIO
将字符串输入到read_csv
:
In [23]: pd.read_csv(io.StringIO('\n'.join(list_vals)), delim_whitespace=True)
Out[23]:
col_a col_B col_C
0 12.0 34.0 10.0
1 15.0 111.0 23.0
This has the advantage that it automatically does the type interpretation that pandas would do if you were reading an ordinary csv-- the columns are floats:
这样做的优点是它会自动执行Pandas在阅读普通 csv 时会做的类型解释——列是浮点数:
In [24]: _.dtypes
Out[24]:
col_a float64
col_B float64
col_C float64
dtype: object
While you could just feed your list into the DataFrame constructor directly, everything would stay strings:
虽然您可以直接将列表输入到 DataFrame 构造函数中,但一切都将保持字符串:
In [21]: pd.DataFrame(columns=list_vals[0].split(),
data=[row.split() for row in list_vals[1:]])
Out[21]:
col_a col_B col_C
0 12.0 34.0 10.0
1 15.0 111.0 23
In [22]: _.dtypes
Out[22]:
col_a object
col_B object
col_C object
dtype: object
We could add dtype=float
to fix this, of course, but we might have mixed types, which the read_csv
approach would handle in the usual way and here we'd have to do manually.
dtype=float
当然,我们可以添加来解决这个问题,但我们可能有混合类型,该read_csv
方法将以通常的方式处理,在这里我们必须手动进行。
回答by AChampion
You can do it by converting to your data to dict, e.g.:
您可以通过将数据转换为 dict 来实现,例如:
>>> pd.DataFrame({a: b for a, *b in (zip(*map(str.split, list_vals)))})
col_B col_C col_a
0 34.0 10.0 12.0
1 111.0 23 15.0
Or with your original order:
或者使用您的原始订单:
>>> pd.DataFrame({a: b for a, *b in (zip(*map(str.split, list_vals)))},
... columns=list_vals[0].split())
col_a col_B col_C
0 12.0 34.0 10.0
1 15.0 111.0 23
回答by Mike T
You can read this as a numpy structured array, then pass it over to pandas. This is useful if you also need to work with numpy, and have the data types defined before reading (otherwise numpy is a step back to work with compared to pandas).
您可以将其作为numpy 结构化数组读取,然后将其传递给Pandas。如果您还需要使用 numpy,并在读取之前定义数据类型,这将非常有用(否则,与 Pandas 相比,numpy 是一个退步工作)。
import numpy as np
import pandas as pd
list_vals = ['col_a col_B col_C', '12.0 34.0 10.0', '15.0 111.0 23']
# Gather names from first line, assume all column types are 'd' (i.e. float)
list_dtype = np.dtype([(name, 'd') for name in list_vals[0].split()])
# Create a numpy structured array
ar = np.fromiter((tuple(x.split()) for x in list_vals[1:]), dtype=list_dtype)
# Now convert it to a pandas DataFrame
dat = pd.DataFrame(ar)