initialize pandas DataFrame with defined dtypes

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38235992/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:31:58  来源:igfitidea点击:

initialize pandas DataFrame with defined dtypes

pythonpandasdataframe

提问by Dima Lituiev

The pd.DataFramedocstring specifies a scalar argument for the whole dataframe:

The pd.DataFramedocstring specifies a scalar argument for the whole dataframe:

dtype : dtype, default None Data type to force, otherwise infer

dtype : dtype, default None Data type to force, otherwise infer

Seemingly it is indeed intended to be a scalar, as following leads to an error:

Seemingly it is indeed intended to be a scalar, as following leads to an error:

dfbinseq = pd.DataFrame([],
                        columns = ["chr", "centre", "seq_binary"],
                        dtype = ["O", pd.np.int64, "O"])

dfbinseq = pd.DataFrame([],
                        columns = ["chr", "centre", "seq_binary"],
                        dtype = [pd.np.object, pd.np.int64, pd.np.object])

The only workaround for creating an empty data frame (which I need to put in a HDF5 store for further appends) for me was

The only workaround for creating an empty data frame (which I need to put in a HDF5 store for further appends) for me was

dfbinseq.centre.dtype = np.int64

Is there a way to set dtypesarguments at once?

Is there a way to set dtypesarguments at once?

回答by jezrael

You can set dtypeto Series:

You can set dtypeto Series:

import pandas as pd

df = pd.DataFrame({'A':pd.Series([], dtype='str'),
                   'B':pd.Series([], dtype='int'),
                   'C':pd.Series([], dtype='float')})

print (df)
Empty DataFrame
Columns: [A, B, C]
Index: []

print (df.dtypes)
A     object
B      int32
C    float64
dtype: object

With data:

With data:

df = pd.DataFrame({'A':pd.Series([1,2,3], dtype='str'),
                   'B':pd.Series([4,5,6], dtype='int'),
                   'C':pd.Series([7,8,9], dtype='float')})

print (df)
   A  B    C
0  1  4  7.0
1  2  5  8.0
2  3  6  9.0

print (df.dtypes)
A     object
B      int32
C    float64
dtype: object