Python 有没有办法只复制 Pandas DataFrame 的结构(而不​​是数据)?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27467730/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 01:49:45  来源:igfitidea点击:

Is there a way to copy only the structure (not the data) of a Pandas DataFrame?

pythonpandasdataframe

提问by bmello

I received a DataFrame from somewhere and want to create another DataFrame with the same number and names of columns and rows (indexes). For example, suppose that the original data frame was created as

我从某个地方收到了一个 DataFrame,并想创建另一个具有相同数量和名称的列和行(索引)的 DataFrame。例如,假设原始数据框被创建为

import pandas as pd
df1 = pd.DataFrame([[11,12],[21,22]], columns=['c1','c2'], index=['i1','i2'])

I copied the structure by explicitly defining the columns and names:

我通过显式定义列和名称来复制结构:

df2 = pd.DataFrame(columns=df1.columns, index=df1.index)    

I don't want to copy the data, otherwise I could just write df2 = df1.copy(). In other words, after df2 being created it must contain only NaN elements:

我不想复制数据,否则我只能写df2 = df1.copy(). 换句话说,在创建 df2 之后,它必须只包含 NaN 元素:

In [1]: df1
Out[1]: 
    c1  c2
i1  11  12
i2  21  22

In [2]: df2
Out[2]: 
     c1   c2
i1  NaN  NaN
i2  NaN  NaN

Is there a more idiomatic way of doing it?

有没有更惯用的方式来做到这一点?

采纳答案by ayhan

That's a job for reindex_like. Start with the original:

这是一份工作reindex_like。从原文开始:

df1 = pd.DataFrame([[11, 12], [21, 22]], columns=['c1', 'c2'], index=['i1', 'i2'])

Construct an empty DataFrame and reindex it like df1:

构造一个空的 DataFrame 并像 df1 一样重新索引它:

pd.DataFrame().reindex_like(df1)
Out: 
    c1  c2
i1 NaN NaN
i2 NaN NaN   

回答by firelynx

In version 0.18 of pandas, the DataFrame constructorhas no options for creating a dataframe like another dataframe with NaN instead of the values.

0.18 版的 pandas 中,DataFrame 构造函数没有用于创建数据帧的选项,如使用 NaN 而不是值的另一个数据帧。

The code you use df2 = pd.DataFrame(columns=df1.columns, index=df1.index)is the most logical way, the only way to improve on it is to spell out even more what you are doing is to add data=None, so that other coders directly see that you intentionally leave out the data from this new DataFrame you are creating.

您使用的代码df2 = pd.DataFrame(columns=df1.columns, index=df1.index)是最合乎逻辑的方式,改进它的唯一方法是将您正在做的事情拼写得更多data=None,以便其他编码人员直接看到您是故意从这个新的 DataFrame创造。

TLDR: So my suggestion is:

TLDR:所以我的建议是:

Explicit is better than implicit

显式优于隐式

df2 = pd.DataFrame(data=None, columns=df1.columns, index=df1.index)

Very much like yours, but more spelled out.

非常像你的,但更详细。

回答by Pedro M Duarte

Let's start with some sample data

让我们从一些示例数据开始

In [1]: import pandas as pd

In [2]: df = pd.DataFrame([[1, 'a'], [2, 'b'], [3, 'c']],
   ...:                   columns=['num', 'char'])

In [3]: df
Out[3]: 
   num char
0    1    a
1    2    b
2    3    c

In [4]: df.dtypes
Out[4]: 
num      int64
char    object
dtype: object

Now let's use a simple DataFrameinitialization using the columns of the original DataFramebut providing no data:

现在让我们使用DataFrame原始列进行简单的初始化,DataFrame但不提供数据:

In [5]: empty_copy_1 = pd.DataFrame(data=None, columns=df.columns)

In [6]: empty_copy_1
Out[6]: 
Empty DataFrame
Columns: [num, char]
Index: []

In [7]: empty_copy_1.dtypes
Out[7]: 
num     object
char    object
dtype: object

As you can see, the column data types are not the same as in our original DataFrame.

如您所见,列数据类型与我们原来的DataFrame.

So, if you want to preserve the column dtype...

所以,如果你想保留列dtype......

If you want to preserve the column data types you need to construct the DataFrameone Seriesat a time

如果你想保留列的数据类型,你需要构造DataFrame一个Series在同一时间

In [8]: empty_copy_2 = pd.DataFrame.from_items([
   ...:     (name, pd.Series(data=None, dtype=series.dtype))
   ...:     for name, series in df.iteritems()])

In [9]: empty_copy_2
Out[9]: 
Empty DataFrame
Columns: [num, char]
Index: []

In [10]: empty_copy_2.dtypes
Out[10]: 
num      int64
char    object
dtype: object

回答by davmarc

A simple alternative -- first copy the basic structure or indexes and columns with datatype from the original dataframe (df1) into df2

一个简单的替代方案——首先将基本结构或索引和具有数据类型的列从原始数据帧 (df1) 复制到 df2

df2 = df1.iloc[0:0]

Then fill your dataframe with empty rows -- pseudocode that will need to be adapted to better match your actual structure:

然后用空行填充您的数据框 - 需要调整以更好地匹配您的实际结构的伪代码:

s = pd.Series([Nan,Nan,Nan], index=['Col1', 'Col2', 'Col3'])

loop through the rows in df1

循环遍历 df1 中的行

df2 = df2.append(s)

回答by Bharath

You can simply maskby notna()i.e

你可以简单地mask通过notna()

df1 = pd.DataFrame([[11, 12], [21, 22]], columns=['c1', 'c2'], index=['i1', 'i2'])

df2 = df1.mask(df1.notna())

    c1  c2
i1 NaN NaN
i2 NaN NaN

回答by felocru

This has worked for me in pandas 0.22: df2 = pd.DataFrame(index=df.index.delete(slice(None)), columns=df.columns)

这在熊猫 0.22 中对我有用: df2 = pd.DataFrame(index=df.index.delete(slice(None)), columns=df.columns)

Convert types: df2 = df2.astype(df.dtypes)

转换类型: df2 = df2.astype(df.dtypes)

delete(slice(None))In case you do not want to keep the values ??of the indexes.

delete(slice(None))如果您不想保留索引的值。

回答by Phil

I know this is an old question, but I thought I would add my two cents.

我知道这是一个老问题,但我想我会加两分钱。

def df_cols_like(df):
    """
    Returns an empty data frame with the same column names and types as df
    """
    df2 = pd.DataFrame({i[0]: pd.Series(dtype=i[1])
                        for i in df.dtypes.iteritems()},
                       columns=df.dtypes.index)
    return df2

This approach centers around the df.dtypesattribute of the input data frame, df, which is a pd.Series. A pd.DataFrameis constructed from a dictionary of empty pd.Seriesobjects named using the input column names with the column order being taken from the input df.

这种方法df.dtypes以输入数据框的属性为中心df,即pd.Series。Apd.DataFrame是从pd.Series使用输入列名称命名的空对象字典中构造的,列顺序取自输入df

回答by Martijn Lentink

Not exactly answering this question, but a similar one for people coming here via a search engine

不完全回答这个问题,但对于通过搜索引擎来到这里的人来说是一个类似的问题

My case was creating a copy of the data frame without data and without index. One can achieve this by doing the following. This will maintain the dtypes of the columns.

我的情况是创建一个没有 data 和 index数据框的副本。可以通过执行以下操作来实现这一点。这将保持列的 dtypes。

empty_copy = df.drop(df.index)