Python 有没有办法只复制 Pandas DataFrame 的结构（而不是数据）？

Question

提问by bmello

I received a DataFrame from somewhere and want to create another DataFrame with the same number and names of columns and rows (indexes). For example, suppose that the original data frame was created as

我从某个地方收到了一个 DataFrame，并想创建另一个具有相同数量和名称的列和行（索引）的 DataFrame。例如，假设原始数据框被创建为

import pandas as pd
df1 = pd.DataFrame([[11,12],[21,22]], columns=['c1','c2'], index=['i1','i2'])

I copied the structure by explicitly defining the columns and names:

我通过显式定义列和名称来复制结构：

df2 = pd.DataFrame(columns=df1.columns, index=df1.index)

I don't want to copy the data, otherwise I could just write df2 = df1.copy(). In other words, after df2 being created it must contain only NaN elements:

我不想复制数据，否则我只能写df2 = df1.copy(). 换句话说，在创建 df2 之后，它必须只包含 NaN 元素：

In [1]: df1
Out[1]: 
    c1  c2
i1  11  12
i2  21  22

In [2]: df2
Out[2]: 
     c1   c2
i1  NaN  NaN
i2  NaN  NaN

Is there a more idiomatic way of doing it?

有没有更惯用的方式来做到这一点？

Answer 1

采纳答案by ayhan

That's a job for reindex_like. Start with the original:

这是一份工作reindex_like。从原文开始：

df1 = pd.DataFrame([[11, 12], [21, 22]], columns=['c1', 'c2'], index=['i1', 'i2'])

Construct an empty DataFrame and reindex it like df1:

构造一个空的 DataFrame 并像 df1 一样重新索引它：

pd.DataFrame().reindex_like(df1)
Out: 
    c1  c2
i1 NaN NaN
i2 NaN NaN

Answer 2

回答by firelynx

In version 0.18 of pandas, the DataFrame constructorhas no options for creating a dataframe like another dataframe with NaN instead of the values.

在0.18 版的 pandas 中，DataFrame 构造函数没有用于创建数据帧的选项，如使用 NaN 而不是值的另一个数据帧。

The code you use df2 = pd.DataFrame(columns=df1.columns, index=df1.index)is the most logical way, the only way to improve on it is to spell out even more what you are doing is to add data=None, so that other coders directly see that you intentionally leave out the data from this new DataFrame you are creating.

您使用的代码df2 = pd.DataFrame(columns=df1.columns, index=df1.index)是最合乎逻辑的方式，改进它的唯一方法是将您正在做的事情拼写得更多data=None，以便其他编码人员直接看到您是故意从这个新的 DataFrame创造。

TLDR: So my suggestion is:

TLDR：所以我的建议是：

Explicit is better than implicit

显式优于隐式

df2 = pd.DataFrame(data=None, columns=df1.columns, index=df1.index)

Very much like yours, but more spelled out.

非常像你的，但更详细。

Answer 3

回答by Pedro M Duarte

Let's start with some sample data

让我们从一些示例数据开始

In [1]: import pandas as pd

In [2]: df = pd.DataFrame([[1, 'a'], [2, 'b'], [3, 'c']],
   ...:                   columns=['num', 'char'])

In [3]: df
Out[3]: 
   num char
0    1    a
1    2    b
2    3    c

In [4]: df.dtypes
Out[4]: 
num      int64
char    object
dtype: object

Now let's use a simple `DataFrame`initialization using the columns of the original `DataFrame`but providing no data:

现在让我们使用`DataFrame`原始列进行简单的初始化，`DataFrame`但不提供数据：

In [5]: empty_copy_1 = pd.DataFrame(data=None, columns=df.columns)

In [6]: empty_copy_1
Out[6]: 
Empty DataFrame
Columns: [num, char]
Index: []

In [7]: empty_copy_1.dtypes
Out[7]: 
num     object
char    object
dtype: object

As you can see, the column data types are not the same as in our original DataFrame.

如您所见，列数据类型与我们原来的DataFrame.

So, if you want to preserve the column `dtype`...

所以，如果你想保留列`dtype`......

If you want to preserve the column data types you need to construct the DataFrameone Seriesat a time

如果你想保留列的数据类型，你需要构造DataFrame一个Series在同一时间

In [8]: empty_copy_2 = pd.DataFrame.from_items([
   ...:     (name, pd.Series(data=None, dtype=series.dtype))
   ...:     for name, series in df.iteritems()])

In [9]: empty_copy_2
Out[9]: 
Empty DataFrame
Columns: [num, char]
Index: []

In [10]: empty_copy_2.dtypes
Out[10]: 
num      int64
char    object
dtype: object

Answer 4

回答by davmarc

A simple alternative -- first copy the basic structure or indexes and columns with datatype from the original dataframe (df1) into df2

一个简单的替代方案——首先将基本结构或索引和具有数据类型的列从原始数据帧 (df1) 复制到 df2

df2 = df1.iloc[0:0]

Then fill your dataframe with empty rows -- pseudocode that will need to be adapted to better match your actual structure:

然后用空行填充您的数据框 - 需要调整以更好地匹配您的实际结构的伪代码：

s = pd.Series([Nan,Nan,Nan], index=['Col1', 'Col2', 'Col3'])

loop through the rows in df1

循环遍历 df1 中的行

df2 = df2.append(s)

Answer 5

回答by Bharath

You can simply maskby notna()i.e

你可以简单地mask通过notna()即

df1 = pd.DataFrame([[11, 12], [21, 22]], columns=['c1', 'c2'], index=['i1', 'i2'])

df2 = df1.mask(df1.notna())

    c1  c2
i1 NaN NaN
i2 NaN NaN

Answer 6

回答by felocru

This has worked for me in pandas 0.22: df2 = pd.DataFrame(index=df.index.delete(slice(None)), columns=df.columns)

这在熊猫 0.22 中对我有用： df2 = pd.DataFrame(index=df.index.delete(slice(None)), columns=df.columns)

Convert types: df2 = df2.astype(df.dtypes)

转换类型： df2 = df2.astype(df.dtypes)

delete(slice(None))In case you do not want to keep the values ??of the indexes.

delete(slice(None))如果您不想保留索引的值。

Answer 7

回答by Phil

I know this is an old question, but I thought I would add my two cents.

我知道这是一个老问题，但我想我会加两分钱。

def df_cols_like(df):
    """
    Returns an empty data frame with the same column names and types as df
    """
    df2 = pd.DataFrame({i[0]: pd.Series(dtype=i[1])
                        for i in df.dtypes.iteritems()},
                       columns=df.dtypes.index)
    return df2

This approach centers around the df.dtypesattribute of the input data frame, df, which is a pd.Series. A pd.DataFrameis constructed from a dictionary of empty pd.Seriesobjects named using the input column names with the column order being taken from the input df.

这种方法df.dtypes以输入数据框的属性为中心df，即pd.Series。Apd.DataFrame是从pd.Series使用输入列名称命名的空对象字典中构造的，列顺序取自输入df。

Answer 8

回答by Martijn Lentink

Not exactly answering this question, but a similar one for people coming here via a search engine

不完全回答这个问题，但对于通过搜索引擎来到这里的人来说是一个类似的问题

My case was creating a copy of the data frame without data and without index. One can achieve this by doing the following. This will maintain the dtypes of the columns.

我的情况是创建一个没有 data 和 index的数据框的副本。可以通过执行以下操作来实现这一点。这将保持列的 dtypes。

empty_copy = df.drop(df.index)

Python 有没有办法只复制 Pandas DataFrame 的结构（而不是数据）？

提问by bmello

采纳答案by ayhan

回答by firelynx

Explicit is better than implicit

显式优于隐式

回答by Pedro M Duarte

Let's start with some sample data

让我们从一些示例数据开始

Now let's use a simple `DataFrame`initialization using the columns of the original `DataFrame`but providing no data:

现在让我们使用`DataFrame`原始列进行简单的初始化，`DataFrame`但不提供数据：

So, if you want to preserve the column `dtype`...

所以，如果你想保留列`dtype`......

回答by davmarc

回答by Bharath

回答by felocru

回答by Phil

回答by Martijn Lentink

Not exactly answering this question, but a similar one for people coming here via a search engine

不完全回答这个问题，但对于通过搜索引擎来到这里的人来说是一个类似的问题

相关推荐

最近更新

标签

Python 有没有办法只复制 Pandas DataFrame 的结构（而不​​是数据）？

提问by bmello

采纳答案by ayhan

回答by firelynx

Explicit is better than implicit

显式优于隐式

回答by Pedro M Duarte

Let's start with some sample data

让我们从一些示例数据开始

Now let's use a simple DataFrameinitialization using the columns of the original DataFramebut providing no data:

现在让我们使用DataFrame原始列进行简单的初始化，DataFrame但不提供数据：

So, if you want to preserve the column dtype...

所以，如果你想保留列dtype......

回答by davmarc

回答by Bharath

回答by felocru

回答by Phil

回答by Martijn Lentink

Not exactly answering this question, but a similar one for people coming here via a search engine

不完全回答这个问题，但对于通过搜索引擎来到这里的人来说是一个类似的问题

相关推荐

Python WebDriver 如何打印整页源代码 (html)

>> 和 << 在 Python 中是什么意思？

Python 难以导入 .dat 文件

python - 如果不在列表中

相关推荐

最近更新

标签

Python 有没有办法只复制 Pandas DataFrame 的结构（而不是数据）？

Now let's use a simple `DataFrame`initialization using the columns of the original `DataFrame`but providing no data:

现在让我们使用`DataFrame`原始列进行简单的初始化，`DataFrame`但不提供数据：

So, if you want to preserve the column `dtype`...

所以，如果你想保留列`dtype`......