Pandas DataFrame 将多种类型转换为列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23999225/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:06:44  来源:igfitidea点击:

Pandas DataFrame cast multiple types to columns

pythonpandas

提问by Mike Vella

I'd like to declare different types for the columns of a pandas DataFrame at instantiation:

我想在实例化时为 Pandas DataFrame 的列声明不同的类型:

frame = pandas.DataFrame({..some data..},dtype=[str,int,int])

This works if dtype is only one type (e.g dtype=float), but not multiple types as above - is there a way to do this?

如果 dtype 只有一种类型(例如dtype=float),而不是如上所述的多种类型,这有效 - 有没有办法做到这一点?

The common solution seems to be to cast later:

常见的解决方案似乎是稍后施放:

frame['some column'] = frame['some column'].astype(float)

but this has a couple of issues:

但这有几个问题:

  1. It's messy
  2. Looks like it involves an unnecessary copy operation - this could be expensive on large data sets.
  1. 很乱
  2. 看起来它涉及不必要的复制操作 - 这在大型数据集上可能会很昂贵。

采纳答案by R. Max

As an alternative, you can specify the dtypefor each column by creating the Seriesobjects first.

作为替代方法,您可以dtype通过Series首先创建对象来为每列指定。

In [2]: df = pd.DataFrame({'x': pd.Series(['1.0', '2.0', '3.0'], dtype=float), 'y': pd.Series(['1', '2', '3'], dtype=int)})

In [3]: df
Out[3]: 
   x  y
0  1  1
1  2  2
2  3  3

[3 rows x 2 columns]

In [4]: df.dtypes
Out[4]: 
x    float64
y      int64
dtype: object

回答by drastega

You can also create a NumPy array with specific dtypes and then convert it to DataFrame.

您还可以创建具有特定 dtypes 的 NumPy 数组,然后将其转换为 DataFrame。

data = np.zeros((2,),dtype=[('A', 'i4'),('B', 'f4'),('C', 'a10')])
data[:] = [(1,2.,'Hello'),(2,3.,"World")]
DataFrame(data)

See From structured or record array

请参阅来自结构化或记录数组