Python 将一列从一个 DataFrame 复制到另一个会给出 NaN 值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45747589/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Copying a column from one DataFrame to another gives NaN values?
提问by i.n.n.m
This questionhas been asked so many times, and it seemed to work for others, however, I am getting NaN
values when I copy a column from a different DataFrame(df1
and df2
are same length).
这个问题已经被问过很多次了,它似乎对其他人NaN
有用,但是,当我从不同的 DataFrame(df1
并且df2
长度相同)复制一列时,我得到了值。
df1
df1
date hour var1
a 2017-05-01 00:00:00 456585
b 2017-05-01 01:00:00 899875
c 2017-05-01 02:00:00 569566
d 2017-05-01 03:00:00 458756
e 2017-05-01 04:00:00 231458
f 2017-05-01 05:00:00 986545
df2
df2
MyVar1 MyVar2
0 6169.719338 3688.045368
1 5861.148007 3152.238704
2 5797.053347 2700.469871
3 5779.102340 2730.471948
4 6708.219647 3181.298291
5 8550.380343 3793.580394
I need like this in my df2
我需要这样 df2
MyVar1 MyVar2 date hour
0 6169.719338 3688.045368 2017-05-01 00:00:00
1 5861.148007 3152.238704 2017-05-01 01:00:00
2 5797.053347 2700.469871 2017-05-01 02:00:00
3 5779.102340 2730.471948 2017-05-01 03:00:00
4 6708.219647 3181.298291 2017-05-01 04:00:00
5 8550.380343 3793.580394 2017-05-01 05:00:00
I tried the following,
我尝试了以下,
df2['date'] = df1['date']
df2['hour'] = df1['hour']
type(df1)
>> pandas.core.frame.DataFrame
type(df2)
>> pandas.core.frame.DataFrame
I am getting the following,
我得到以下信息,
MyVar1 MyVar2 date hour
0 6169.719338 3688.045368 NaN NaN
1 5861.148007 3152.238704 NaN NaN
2 5797.053347 2700.469871 NaN NaN
Why is this happening? There is another postthat discusses merge
, but I just need to copy it. Any help would be appreciated.
为什么会这样?还有另一个帖子讨论了merge
,但我只需要复制它。任何帮助,将不胜感激。
回答by cs95
The culprit is unalignable indexes
罪魁祸首是不可对齐的索引
Your DataFrames' indexes are different(and correspondingly, the indexes for each columns), so when trying to assign a column of one DataFrame to another, pandas will try to align the indexes, and failing to do so, insert NaNs.
您的 DataFrame 的索引不同(相应地,每列的索引也不同),因此当尝试将一个 DataFrame 的一列分配给另一列时,pandas 会尝试对齐索引,如果不这样做,则插入 NaN。
Consider the following examples to understand what this means:
考虑以下示例以了解这意味着什么:
# Setup
A = pd.DataFrame(index=['a', 'b', 'c'])
B = pd.DataFrame(index=['b', 'c', 'd', 'f'])
C = pd.DataFrame(index=[1, 2, 3])
# Example of alignable indexes - A & B (complete or partial overlap of indexes)
A.index B.index
a
b b (overlap)
c c (overlap)
d
f
# Example of unalignable indexes - A & C (no overlap at all)
A.index C.index
a
b
c
1
2
3
When there are no overlaps, pandas cannot match even a single value between the two DataFrames to put in the result of the assignment, so the output is a column full of NaNs.
当没有重叠时,pandas 甚至无法匹配两个 DataFrame 之间的单个值来放入赋值结果中,因此输出是一列充满 NaN 的列。
If you're working on an IPython notebook, you can check that this is indeed the root cause using,
如果您正在处理 IPython 笔记本,则可以使用以下命令检查这确实是根本原因,
df1.index.equals(df2.index)
# False
df1.index.intersection(df2.index).empty
# True
You can use any of the following solutions to fix this issue.
您可以使用以下任何一种解决方案来解决此问题。
Solution 1: Reset both DataFrames' indexes
解决方案 1:重置两个 DataFrame 的索引
You may prefer this option if you didn't mean to have different indices in the first place, or if you don't particularly care about preserving the index.
如果您一开始并不打算使用不同的索引,或者您不是特别关心保留索引,则您可能更喜欢此选项。
# Optional, if you want a RangeIndex => [0, 1, 2, ...]
# df1.index = pd.RangeIndex(len(df))
# Homogenize the index values,
df2.index = df1.index
# Assign the columns.
df2[['date', 'hour']] = df1[['date', 'hour']]
If you want to keep the existing index, but as a column, you may use reset_index()
instead.
如果要保留现有索引,但作为列,则可以reset_index()
改用。
Solution 2: Assign NumPy arrays (bypass index alignment)
解决方案 2:分配 NumPy 数组(绕过索引对齐)
This solution will only work if the lengths of the two DataFrames match.
此解决方案仅在两个 DataFrame 的长度匹配时才有效。
# pandas >= 0.24
df2['date'] = df1['date'].to_numpy()
# pandas < 0.24
df2['date'] = df1['date'].values
To assign multiple columns easily, use,
要轻松分配多列,请使用,
df2 = df2.assign(**{c: df1[c].to_numpy() for c in ('date', 'hour')})
回答by YOBEN_S
Try this ?
尝试这个 ?
df2['date'] = df1['date'].values
df2['hour'] = df1['hour'].values