Python 将 NumPy 数组与 Pandas DataFrame 连接(加入)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39698363/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Concatenate (join) a NumPy array with a pandas DataFrame
提问by Jamgreen
I have a pandas dataframe with 10 rows and 5 columns and a numpy matrix of zeros np.zeros((10,3))
.
我有一个有 10 行和 5 列的 Pandas 数据框和一个 numpy 零矩阵np.zeros((10,3))
。
I want to concat the numpy matrix to the pandas dataframe but I want to delete the last column from the pandas dataframe before concatenating the numpy array to it.
我想将 numpy 矩阵连接到 Pandas 数据帧,但我想在将 numpy 数组连接到它之前从 Pandas 数据帧中删除最后一列。
So I will end up with a matrix of 10 rows and 5 - 1 + 3 = 7 columns.
所以我最终会得到一个 10 行和 5 - 1 + 3 = 7 列的矩阵。
I guess I could use
我想我可以用
new_dataframe = pd.concat([
original_dataframe,
pd.DataFrame(np.zeros((10, 3)), dtype=np.int)
], axis=1, ignore_index=True)
where original_dataframe
has 10 rows and 5 columns.
其中original_dataframe
有 10 行和 5 列。
How do I delete the last column from original_dataframe
before concatenating the numpy array? And how do I make sure I preserve all the data types?
如何original_dataframe
在连接 numpy 数组之前删除最后一列?以及如何确保保留所有数据类型?
回答by cs95
Setup
设置
np.random.seed(0)
df = pd.DataFrame(np.random.choice(10, (3, 3)), columns=list('ABC'))
df
A B C
0 5 0 3
1 3 7 9
2 3 5 2
np.column_stack
/ stack(axis=1)
/ hstack
np.column_stack
/ stack(axis=1)
/hstack
pd.DataFrame(pd.np.column_stack([df, np.zeros((df.shape[0], 3), dtype=int)]))
0 1 2 3 4 5
0 5 0 3 0 0 0
1 3 7 9 0 0 0
2 3 5 2 0 0 0
Useful (and performant), but does not retain the column names from df
. If you really want to slice out the last column, use iloc
and slice it out:
有用(和高性能),但不保留df
. 如果您真的想切出最后一列,请使用iloc
并将其切出:
pd.DataFrame(pd.np.column_stack([
df.iloc[:, :-1], np.zeros((df.shape[0], 3), dtype=int)]))
0 1 2 3 4
0 5 0 0 0 0
1 3 7 0 0 0
2 3 5 0 0 0
pd.concat
pd.concat
You will need to convert the array to a DataFrame.
您需要将数组转换为 DataFrame。
df2 = pd.DataFrame(np.zeros((df.shape[0], 3), dtype=int), columns=list('DEF'))
pd.concat([df, df2], axis=1)
A B C D E F
0 5 0 3 0 0 0
1 3 7 9 0 0 0
2 3 5 2 0 0 0
DataFrame.assign
DataFrame.assign
If it's only adding constant values, you can use assign
:
如果它只是添加常量值,则可以使用assign
:
df.assign(**dict.fromkeys(list('DEF'), 0))
A B C D E F
0 5 0 3 0 0 0
1 3 7 9 0 0 0
2 3 5 2 0 0 0