Python 如何合并 Series 和 DataFrame

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26265819/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:17:34  来源:igfitidea点击:

How to merge a Series and DataFrame

pythonpandasdataframe

提问by Nathan Lloyd

If you came here looking for information on how to merge a DataFrameand Serieson the index, please look at this answer.

The OP's original intention was to ask how to assign series elements as columns to another DataFrame. If you are interested in knowing the answer to this, look at the accepted answerby EdChum.

如果您来这里是为了寻找有关如何在索引上合并 aDataFrameSeries的信息,请查看此答案

OP 的初衷是询问如何将系列元素作为列分配给另一个 DataFrame。如果您有兴趣了解这个问题的答案,请查看EdChum接受的答案



Best I can come up with is

我能想到的最好的是

df = pd.DataFrame({'a':[1, 2], 'b':[3, 4]})  # see EDIT below
s = pd.Series({'s1':5, 's2':6})

for name in s.index:
    df[name] = s[name]

   a  b  s1  s2
0  1  3   5   6
1  2  4   5   6

Can anybody suggest better syntax / faster method?

有人可以建议更好的语法/更快的方法吗?

My attempts:

我的尝试:

df.merge(s)
AttributeError: 'Series' object has no attribute 'columns'

and

df.join(s)
ValueError: Other Series must have a name

EDITThe first two answers posted highlighted a problem with my question, so please use the following to construct df:

编辑发布的前两个答案突出显示了我的问题的一个问题,因此请使用以下内容来构建df

df = pd.DataFrame({'a':[np.nan, 2, 3], 'b':[4, 5, 6]}, index=[3, 5, 6])

with the final result

最终结果

    a  b  s1  s2
3 NaN  4   5   6
5   2  5   5   6
6   3  6   5   6

采纳答案by EdChum

You could construct a dataframe from the series and then merge with the dataframe. So you specify the data as the values but multiply them by the length, set the columns to the index and set params for left_index and right_index to True:

您可以从该系列构建一个数据框,然后与该数据框合并。因此,您将数据指定为值,但将它们乘以长度,将列设置为索引并将 left_index 和 right_index 的参数设置为 True:

In [27]:

df.merge(pd.DataFrame(data = [s.values] * len(s), columns = s.index), left_index=True, right_index=True)
Out[27]:
   a  b  s1  s2
0  1  3   5   6
1  2  4   5   6

EDITfor the situation where you want the index of your constructed df from the series to use the index of the df then you can do the following:

对于您希望从系列中构建的 df 索引使用 df 索引的情况进行编辑,然后您可以执行以下操作:

df.merge(pd.DataFrame(data = [s.values] * len(df), columns = s.index, index=df.index), left_index=True, right_index=True)

This assumes that the indices match the length.

这假设索引与长度匹配。

回答by Alex Riley

Here's one way:

这是一种方法:

df.join(pd.DataFrame(s).T).fillna(method='ffill')

To break down what happens here...

分解这里发生的事情......

pd.DataFrame(s).Tcreates a one-row DataFrame from swhich looks like this:

pd.DataFrame(s).T创建一个单行的 DataFrame,s它看起来像这样:

   s1  s2
0   5   6

Next, joinconcatenates this new frame with df:

接下来,join将这个新框架与df

   a  b  s1  s2
0  1  3   5   6
1  2  4 NaN NaN

Lastly, the NaNvalues at index 1 are filled with the previous values in the column using fillnawith the forward-fill (ffill) argument:

最后,NaN使用fillnaforward-fill ( ffill) 参数将索引 1 处的值填充为列中的先前值:

   a  b  s1  s2
0  1  3   5   6
1  2  4   5   6


To avoid using fillna, it's possible to use pd.concatto repeat the rows of the DataFrame constructed from s. In this case, the general solution is:

为了避免使用fillna,可以使用pd.concat重复从 构造的 DataFrame 的行s。在这种情况下,一般的解决方案是:

df.join(pd.concat([pd.DataFrame(s).T] * len(df), ignore_index=True))


Here's another solution to address the indexing challenge posed in the edited question:

这是解决已编辑问题中提出的索引挑战的另一种解决方案:

df.join(pd.DataFrame(s.repeat(len(df)).values.reshape((len(df), -1), order='F'), 
        columns=s.index, 
        index=df.index))

sis transformed into a DataFrame by repeating the values and reshaping (specifying 'Fortran' order), and also passing in the appropriate column names and index. This new DataFrame is then joined to df.

s通过重复值和重塑(指定“Fortran”顺序)并传入适当的列名和索引,将其转换为 DataFrame。然后将这个新的 DataFrame 加入到df.

回答by Nicholas Morley

Update
From v0.24.0 onwards, you can merge on DataFrame and Series as long as the Series is named.

更新
从 v0.24.0 开始,你可以在 DataFrame 和 Series 上合并,只要 Series 被命名。

df.merge(s.rename('new'), left_index=True, right_index=True)
# If series is already named,
# df.merge(s, left_index=True, right_index=True)


Nowadays, you can simply convert the Series to a DataFrame with to_frame(). So (if joining on index):

现在,您可以简单地使用to_frame()将系列转换为数据。所以(如果加入索引):

df.merge(s.to_frame(), left_index=True, right_index=True)

回答by James

If I could suggest setting up your dataframes like this (auto-indexing):

如果我可以建议像这样设置数据框(自动索引):

df = pd.DataFrame({'a':[np.nan, 1, 2], 'b':[4, 5, 6]})

then you can set up your s1 and s2 values thus (using shape() to return the number of rows from df):

然后你可以设置你的 s1 和 s2 值(使用 shape() 从 df 返回行数):

s = pd.DataFrame({'s1':[5]*df.shape[0], 's2':[6]*df.shape[0]})

then the result you want is easy:

那么你想要的结果很简单:

display (df.merge(s, left_index=True, right_index=True))

Alternatively, just add the new values to your dataframe df:

或者,只需将新值添加到您的数据框 df:

df = pd.DataFrame({'a':[nan, 1, 2], 'b':[4, 5, 6]})
df['s1']=5
df['s2']=6
display(df)

Both return:

两者都返回:

     a  b  s1  s2
0  NaN  4   5   6
1  1.0  5   5   6
2  2.0  6   5   6

If you have another list of data (instead of just a single value to apply), and you know it is in the same sequence as df, eg:

如果您有另一个数据列表(而不仅仅是要应用的单个值),并且您知道它与 df 的顺序相同,例如:

s1=['a','b','c']

then you can attach this in the same way:

然后你可以用同样的方式附加它:

df['s1']=s1

returns:

返回:

     a  b s1
0  NaN  4  a
1  1.0  5  b
2  2.0  6  c

回答by Alex

You can easily set a pandas.DataFrame column to a constant. This constant can be an int such as in your example. If the column you specify isn't in the df, then pandas will create a new column with the name you specify. So after your dataframe is constructed, (from your question):

您可以轻松地将 pandas.DataFrame 列设置为常量。该常量可以是 int ,例如在您的示例中。如果您指定的列不在 df 中,那么 Pandas 将使用您指定的名称创建一个新列。因此,在构建数据框之后,(来自您的问题):

df = pd.DataFrame({'a':[np.nan, 2, 3], 'b':[4, 5, 6]}, index=[3, 5, 6])

You can just run:

你可以运行:

df['s1'], df['s2'] = 5, 6

You could write a loop or comprehension to make it do this for all the elements in a list of tuples, or keys and values in a dictionary depending on how you have your real data stored.

您可以编写一个循环或理解,使其对元组列表中的所有元素或字典中的键和值执行此操作,具体取决于您存储真实数据的方式。

回答by aishik roy chaudhury

If dfis a pandas.DataFramethen df['new_col']= Series list_object of length len(df)will add the or Series list_object as a column named 'new_col'. df['new_col']= scalar(such as 5 or 6 in your case) also works and is equivalent to df['new_col']= [scalar]*len(df)

如果df是,pandas.DataFrame则将df['new_col']= Series list_object of length len(df)或系列 list_object 添加为名为 的列'new_col'df['new_col']= scalar(例如在您的情况下为 5 或 6)也有效并且相当于df['new_col']= [scalar]*len(df)

So a two-line code serves the purpose:

所以两行代码可以达到目的:

df = pd.DataFrame({'a':[1, 2], 'b':[3, 4]})
s = pd.Series({'s1':5, 's2':6})
for x in s.index:    
    df[x] = s[x]

Output: 
   a  b  s1  s2
0  1  3   5   6
1  2  4   5   6