Python 将一列从一个 DataFrame 复制到另一个会给出 NaN 值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45747589/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:16:02  来源:igfitidea点击:

Copying a column from one DataFrame to another gives NaN values?

pythonpandastypescopy

提问by i.n.n.m

This questionhas been asked so many times, and it seemed to work for others, however, I am getting NaNvalues when I copy a column from a different DataFrame(df1and df2are same length).

这个问题已经被问过很多次了,它似乎对其他人NaN有用,但是,当我从不同的 DataFrame(df1并且df2长度相同)复制一列时,我得到了值。

df1

df1

        date     hour      var1
a   2017-05-01  00:00:00   456585
b   2017-05-01  01:00:00   899875
c   2017-05-01  02:00:00   569566
d   2017-05-01  03:00:00   458756
e   2017-05-01  04:00:00   231458
f   2017-05-01  05:00:00   986545

df2

df2

      MyVar1     MyVar2 
 0  6169.719338 3688.045368
 1  5861.148007 3152.238704
 2  5797.053347 2700.469871
 3  5779.102340 2730.471948
 4  6708.219647 3181.298291
 5  8550.380343 3793.580394

I need like this in my df2

我需要这样 df2

       MyVar1    MyVar2        date        hour
 0  6169.719338 3688.045368  2017-05-01  00:00:00
 1  5861.148007 3152.238704  2017-05-01  01:00:00
 2  5797.053347 2700.469871  2017-05-01  02:00:00
 3  5779.102340 2730.471948  2017-05-01  03:00:00
 4  6708.219647 3181.298291  2017-05-01  04:00:00
 5  8550.380343 3793.580394  2017-05-01  05:00:00

I tried the following,

我尝试了以下,

df2['date'] = df1['date']
df2['hour'] = df1['hour']

type(df1)
>> pandas.core.frame.DataFrame

type(df2)
>> pandas.core.frame.DataFrame

I am getting the following,

我得到以下信息,

       MyVar1    MyVar2      date       hour
 0  6169.719338 3688.045368  NaN        NaN
 1  5861.148007 3152.238704  NaN        NaN
 2  5797.053347 2700.469871  NaN        NaN

Why is this happening? There is another postthat discusses merge, but I just need to copy it. Any help would be appreciated.

为什么会这样?还有另一个帖子讨论了merge,但我只需要复制它。任何帮助,将不胜感激。

回答by cs95

The culprit is unalignable indexes

罪魁祸首是不可对齐的索引

Your DataFrames' indexes are different(and correspondingly, the indexes for each columns), so when trying to assign a column of one DataFrame to another, pandas will try to align the indexes, and failing to do so, insert NaNs.

您的 DataFrame 的索引不同(相应地,每列的索引也不同),因此当尝试将一个 DataFrame 的一列分配给另一列时,pandas 会尝试对齐索引,如果不这样做,则插入 NaN。

Consider the following examples to understand what this means:

考虑以下示例以了解这意味着什么:

# Setup
A = pd.DataFrame(index=['a', 'b', 'c']) 
B = pd.DataFrame(index=['b', 'c', 'd', 'f'])                                  
C = pd.DataFrame(index=[1, 2, 3])

# Example of alignable indexes - A & B (complete or partial overlap of indexes)
A.index B.index
      a        
      b       b   (overlap)
      c       c   (overlap)
              d
              f

# Example of unalignable indexes - A & C (no overlap at all)
A.index C.index
      a        
      b        
      c        
              1
              2
              3

When there are no overlaps, pandas cannot match even a single value between the two DataFrames to put in the result of the assignment, so the output is a column full of NaNs.

当没有重叠时,pandas 甚至无法匹配两个 DataFrame 之间的单个值来放入赋值结果中,因此输出是一列充满 NaN 的列。

If you're working on an IPython notebook, you can check that this is indeed the root cause using,

如果您正在处理 IPython 笔记本,则可以使用以下命令检查这确实是根本原因,

df1.index.equals(df2.index)                                                                                               
# False
df1.index.intersection(df2.index).empty                                                                                     
# True


You can use any of the following solutions to fix this issue.

您可以使用以下任何一种解决方案来解决此问题。

Solution 1: Reset both DataFrames' indexes

解决方案 1:重置两个 DataFrame 的索引

You may prefer this option if you didn't mean to have different indices in the first place, or if you don't particularly care about preserving the index.

如果您一开始并不打算使用不同的索引,或者您不是特别关心保留索引,则您可能更喜欢此选项。

# Optional, if you want a RangeIndex => [0, 1, 2, ...]
# df1.index = pd.RangeIndex(len(df))
# Homogenize the index values,
df2.index = df1.index
# Assign the columns.
df2[['date', 'hour']] = df1[['date', 'hour']]

If you want to keep the existing index, but as a column, you may use reset_index()instead.

如果要保留现有索引,但作为列,则可以reset_index()改用。



Solution 2: Assign NumPy arrays (bypass index alignment)

解决方案 2:分配 NumPy 数组(绕过索引对齐)

This solution will only work if the lengths of the two DataFrames match.

此解决方案仅在两个 DataFrame 的长度匹配时才有效。

# pandas >= 0.24
df2['date'] = df1['date'].to_numpy()
# pandas < 0.24
df2['date'] = df1['date'].values

To assign multiple columns easily, use,

要轻松分配多列,请使用,

df2 = df2.assign(**{c: df1[c].to_numpy() for c in ('date', 'hour')})

回答by YOBEN_S

Try this ?

尝试这个 ?

df2['date'] = df1['date'].values
df2['hour'] = df1['hour'].values