Python 有没有办法只复制 Pandas DataFrame 的结构(而不是数据)?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27467730/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is there a way to copy only the structure (not the data) of a Pandas DataFrame?
提问by bmello
I received a DataFrame from somewhere and want to create another DataFrame with the same number and names of columns and rows (indexes). For example, suppose that the original data frame was created as
我从某个地方收到了一个 DataFrame,并想创建另一个具有相同数量和名称的列和行(索引)的 DataFrame。例如,假设原始数据框被创建为
import pandas as pd
df1 = pd.DataFrame([[11,12],[21,22]], columns=['c1','c2'], index=['i1','i2'])
I copied the structure by explicitly defining the columns and names:
我通过显式定义列和名称来复制结构:
df2 = pd.DataFrame(columns=df1.columns, index=df1.index)
I don't want to copy the data, otherwise I could just write df2 = df1.copy()
. In other words, after df2 being created it must contain only NaN elements:
我不想复制数据,否则我只能写df2 = df1.copy()
. 换句话说,在创建 df2 之后,它必须只包含 NaN 元素:
In [1]: df1
Out[1]:
c1 c2
i1 11 12
i2 21 22
In [2]: df2
Out[2]:
c1 c2
i1 NaN NaN
i2 NaN NaN
Is there a more idiomatic way of doing it?
有没有更惯用的方式来做到这一点?
采纳答案by ayhan
That's a job for reindex_like
. Start with the original:
这是一份工作reindex_like
。从原文开始:
df1 = pd.DataFrame([[11, 12], [21, 22]], columns=['c1', 'c2'], index=['i1', 'i2'])
Construct an empty DataFrame and reindex it like df1:
构造一个空的 DataFrame 并像 df1 一样重新索引它:
pd.DataFrame().reindex_like(df1)
Out:
c1 c2
i1 NaN NaN
i2 NaN NaN
回答by firelynx
In version 0.18 of pandas, the DataFrame constructorhas no options for creating a dataframe like another dataframe with NaN instead of the values.
在0.18 版的 pandas 中,DataFrame 构造函数没有用于创建数据帧的选项,如使用 NaN 而不是值的另一个数据帧。
The code you use df2 = pd.DataFrame(columns=df1.columns, index=df1.index)
is the most logical way, the only way to improve on it is to spell out even more what you are doing is to add data=None
, so that other coders directly see that you intentionally leave out the data from this new DataFrame you are creating.
您使用的代码df2 = pd.DataFrame(columns=df1.columns, index=df1.index)
是最合乎逻辑的方式,改进它的唯一方法是将您正在做的事情拼写得更多data=None
,以便其他编码人员直接看到您是故意从这个新的 DataFrame创造。
TLDR: So my suggestion is:
TLDR:所以我的建议是:
Explicit is better than implicit
显式优于隐式
df2 = pd.DataFrame(data=None, columns=df1.columns, index=df1.index)
Very much like yours, but more spelled out.
非常像你的,但更详细。
回答by Pedro M Duarte
Let's start with some sample data
让我们从一些示例数据开始
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([[1, 'a'], [2, 'b'], [3, 'c']],
...: columns=['num', 'char'])
In [3]: df
Out[3]:
num char
0 1 a
1 2 b
2 3 c
In [4]: df.dtypes
Out[4]:
num int64
char object
dtype: object
Now let's use a simple DataFrame
initialization using the columns of the original DataFrame
but providing no data:
现在让我们使用DataFrame
原始列进行简单的初始化,DataFrame
但不提供数据:
In [5]: empty_copy_1 = pd.DataFrame(data=None, columns=df.columns)
In [6]: empty_copy_1
Out[6]:
Empty DataFrame
Columns: [num, char]
Index: []
In [7]: empty_copy_1.dtypes
Out[7]:
num object
char object
dtype: object
As you can see, the column data types are not the same as in our original DataFrame
.
如您所见,列数据类型与我们原来的DataFrame
.
So, if you want to preserve the column dtype
...
所以,如果你想保留列dtype
......
If you want to preserve the column data types you need to construct the DataFrame
one Series
at a time
如果你想保留列的数据类型,你需要构造DataFrame
一个Series
在同一时间
In [8]: empty_copy_2 = pd.DataFrame.from_items([
...: (name, pd.Series(data=None, dtype=series.dtype))
...: for name, series in df.iteritems()])
In [9]: empty_copy_2
Out[9]:
Empty DataFrame
Columns: [num, char]
Index: []
In [10]: empty_copy_2.dtypes
Out[10]:
num int64
char object
dtype: object
回答by davmarc
A simple alternative -- first copy the basic structure or indexes and columns with datatype from the original dataframe (df1) into df2
一个简单的替代方案——首先将基本结构或索引和具有数据类型的列从原始数据帧 (df1) 复制到 df2
df2 = df1.iloc[0:0]
Then fill your dataframe with empty rows -- pseudocode that will need to be adapted to better match your actual structure:
然后用空行填充您的数据框 - 需要调整以更好地匹配您的实际结构的伪代码:
s = pd.Series([Nan,Nan,Nan], index=['Col1', 'Col2', 'Col3'])
loop through the rows in df1
循环遍历 df1 中的行
df2 = df2.append(s)
回答by Bharath
You can simply mask
by notna()
i.e
你可以简单地mask
通过notna()
即
df1 = pd.DataFrame([[11, 12], [21, 22]], columns=['c1', 'c2'], index=['i1', 'i2'])
df2 = df1.mask(df1.notna())
c1 c2
i1 NaN NaN
i2 NaN NaN
回答by felocru
This has worked for me in pandas 0.22:
df2 = pd.DataFrame(index=df.index.delete(slice(None)), columns=df.columns)
这在熊猫 0.22 中对我有用:
df2 = pd.DataFrame(index=df.index.delete(slice(None)), columns=df.columns)
Convert types:
df2 = df2.astype(df.dtypes)
转换类型:
df2 = df2.astype(df.dtypes)
delete(slice(None))
In case you do not want to keep the values ??of the indexes.
delete(slice(None))
如果您不想保留索引的值。
回答by Phil
I know this is an old question, but I thought I would add my two cents.
我知道这是一个老问题,但我想我会加两分钱。
def df_cols_like(df):
"""
Returns an empty data frame with the same column names and types as df
"""
df2 = pd.DataFrame({i[0]: pd.Series(dtype=i[1])
for i in df.dtypes.iteritems()},
columns=df.dtypes.index)
return df2
This approach centers around the df.dtypes
attribute of the input data frame, df
, which is a pd.Series
. A pd.DataFrame
is constructed from a dictionary of empty pd.Series
objects named using the input column names with the column order being taken from the input df
.
这种方法df.dtypes
以输入数据框的属性为中心df
,即pd.Series
。Apd.DataFrame
是从pd.Series
使用输入列名称命名的空对象字典中构造的,列顺序取自输入df
。
回答by Martijn Lentink
Not exactly answering this question, but a similar one for people coming here via a search engine
不完全回答这个问题,但对于通过搜索引擎来到这里的人来说是一个类似的问题
My case was creating a copy of the data frame without data and without index. One can achieve this by doing the following. This will maintain the dtypes of the columns.
我的情况是创建一个没有 data 和 index的数据框的副本。可以通过执行以下操作来实现这一点。这将保持列的 dtypes。
empty_copy = df.drop(df.index)