Pandas DataFrame 将多种类型转换为列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23999225/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas DataFrame cast multiple types to columns
提问by Mike Vella
I'd like to declare different types for the columns of a pandas DataFrame at instantiation:
我想在实例化时为 Pandas DataFrame 的列声明不同的类型:
frame = pandas.DataFrame({..some data..},dtype=[str,int,int])
This works if dtype is only one type (e.g dtype=float), but not multiple types as above - is there a way to do this?
如果 dtype 只有一种类型(例如dtype=float),而不是如上所述的多种类型,这有效 - 有没有办法做到这一点?
The common solution seems to be to cast later:
常见的解决方案似乎是稍后施放:
frame['some column'] = frame['some column'].astype(float)
but this has a couple of issues:
但这有几个问题:
- It's messy
- Looks like it involves an unnecessary copy operation - this could be expensive on large data sets.
- 很乱
- 看起来它涉及不必要的复制操作 - 这在大型数据集上可能会很昂贵。
采纳答案by R. Max
As an alternative, you can specify the dtypefor each column by creating the Seriesobjects first.
作为替代方法,您可以dtype通过Series首先创建对象来为每列指定。
In [2]: df = pd.DataFrame({'x': pd.Series(['1.0', '2.0', '3.0'], dtype=float), 'y': pd.Series(['1', '2', '3'], dtype=int)})
In [3]: df
Out[3]:
x y
0 1 1
1 2 2
2 3 3
[3 rows x 2 columns]
In [4]: df.dtypes
Out[4]:
x float64
y int64
dtype: object
回答by drastega
You can also create a NumPy array with specific dtypes and then convert it to DataFrame.
您还可以创建具有特定 dtypes 的 NumPy 数组,然后将其转换为 DataFrame。
data = np.zeros((2,),dtype=[('A', 'i4'),('B', 'f4'),('C', 'a10')])
data[:] = [(1,2.,'Hello'),(2,3.,"World")]
DataFrame(data)
See From structured or record array
请参阅来自结构化或记录数组

