在 Pandas 中组合系列

Question

提问by orange

I need to combine multiple Pandas Seriesthat contain string values. The series are messages that result from multiple validation steps. I try to combine these messages into 1 Seriesto attach it to the DataFrame. The problem is that the result is empty.

我需要组合多个Series包含字符串值的Pandas 。该系列是由多个验证步骤产生的消息。我尝试将这些消息合并为 1Series以将其附加到DataFrame. 问题是结果是空的。

This is an example:

这是一个例子：

import pandas as pd

df = pd.DataFrame({'a': ['a', 'b', 'c', 'd'], 'b': ['aa', 'bb', 'cc', 'dd']})

index1 = df[df['a'] == 'b'].index
index2 = df[df['a'] == 'a'].index

series = df.iloc[index1].apply(lambda x: x['b'] + '-bbb', axis=1)
series += df.iloc[index2].apply(lambda x: x['a'] + '-aaa', axis=1)

print series
# >>> series
# 0    NaN
# 1    NaN

Update

更新

import pandas as pd

df = pd.DataFrame({'a': ['a', 'b', 'c', 'd'], 'b': ['aa', 'bb', 'cc', 'dd']})

index1 = df[df['a'] == 'b'].index
index2 = df[df['a'] == 'a'].index

series1 = df.iloc[index1].apply(lambda x: x['b'] + '-bbb', axis=1)
series2 = df.iloc[index2].apply(lambda x: x['a'] + '-aaa', axis=1)
series3 = df.iloc[index2].apply(lambda x: x['a'] + '-ccc', axis=1)

# series3 causes a ValueError: cannot reindex from a duplicate axis
series = pd.concat([series1, series2, series3])
df['series'] = series
print df

Update2

更新2

In this example the indices seem to get mixed up.

在这个例子中，索引似乎混淆了。

import pandas as pd

df = pd.DataFrame({'a': ['a', 'b', 'c', 'd'], 'b': ['aa', 'bb', 'cc', 'dd']})

index1 = df[df['a'] == 'a'].index
index2 = df[df['a'] == 'b'].index
index3 = df[df['a'] == 'c'].index

series1 = df.iloc[index1].apply(lambda x: x['a'] + '-aaa', axis=1)
series2 = df.iloc[index2].apply(lambda x: x['a'] + '-bbb', axis=1)
series3 = df.iloc[index3].apply(lambda x: x['a'] + '-ccc', axis=1)

print series1
print
print series2
print
print series3
print

df['series'] = pd.concat([series1, series2, series3], ignore_index=True)
print df
print

df['series'] = pd.concat([series2, series1, series3], ignore_index=True)
print df
print

df['series'] = pd.concat([series3, series2, series1], ignore_index=True)
print df
print

This results in this output:

这导致此输出：

0    a-aaa
dtype: object

1    b-bbb
dtype: object

2    c-ccc
dtype: object

   a   b series
0  a  aa  a-aaa
1  b  bb  b-bbb
2  c  cc  c-ccc
3  d  dd    NaN

   a   b series
0  a  aa  b-bbb
1  b  bb  a-aaa
2  c  cc  c-ccc
3  d  dd    NaN

   a   b series
0  a  aa  c-ccc
1  b  bb  b-bbb
2  c  cc  a-aaa
3  d  dd    NaN

I would expect only a's in row0, only b's in row1 and only c's in row2, but that's not the case...

我希望第 0 行只有 a，第 1 行只有 b，第 2 行只有 c，但事实并非如此......

Update 3

更新 3

Here's a better example which should demonstrate the expected behaviour. As I said, the use case is that for a given DataFrame, a function evaluates each row and possibly returns an error message for some of the rows as a Series(some indexes are contained, some are not; if no error returns, the error series is empty).

这是一个更好的例子，它应该展示预期的行为。正如我所说，用例是对于给定的DataFrame，一个函数评估每一行，并可能将某些行的错误消息作为 a Series（包含一些索引，有些不包含；如果没有错误返回，错误系列是空的）。

In [12]:

s1 = pd.Series(['b', 'd'], index=[1, 3])
s2 = pd.Series(['a', 'b'], index=[0, 1])
s3 = pd.Series(['c', 'e'], index=[2, 4])
s4 = pd.Series([], index=[])
pd.concat([s1, s2, s3, s4]).sort_index()

# I'd like to get:
#
# 0    a
# 1    b b
# 2    c
# 3    d
# 4    e
Out[12]:
0    a
1    b
1    b
2    c
3    d
4    e
dtype: object

Answer 1

采纳答案by orange

I might have found a solution. I hope someone can comment on it...

我可能已经找到了解决方案。希望有人能评论一下。。。

s1 = pd.Series(['b', 'd'], index=[1, 3])
s2 = pd.Series(['a', 'b'], index=[0, 1])
s3 = pd.Series(['c', 'e'], index=[2, 4])
s4 = pd.Series([], index=[])
pd.concat([s1, s2, s3, s4]).sort_index()


df1 = pd.DataFrame(s1)
df2 = pd.DataFrame(s2)
df3 = pd.DataFrame(s3)
df4 = pd.DataFrame(s4)

d = pd.DataFrame({0:[]})
d = pd.merge(df1, d, how='outer', left_index=True, right_index=True)
d = d.fillna('')
d = pd.DataFrame(d['0_x'] + d['0_y'])

d = pd.merge(df2, d, how='outer', left_index=True, right_index=True)
d = d.fillna('')
d = pd.DataFrame(d['0_x'] + d['0_y'])

d = pd.merge(df3, d, how='outer', left_index=True, right_index=True)
d = d.fillna('')
d = pd.DataFrame(d['0_x'] + d['0_y'])

d = pd.merge(df4, d, how='outer', left_index=True, right_index=True)
d = d.fillna('')
d = pd.DataFrame(d['0_x'] + d['0_y'])
print d

which returns

返回

Answer 2

回答by EdChum

When concatenating the default is to use the existing indices, however if they collide then this will raise a ValueErroras you've found so you need to set ignore_index=True:

连接默认值是使用现有索引时，但是如果它们发生冲突，那么这将引发ValueError您发现的a ，因此您需要设置ignore_index=True：

In [33]:

series = pd.concat([series1, series2, series3], ignore_index=True)
df['series'] = series
print (df)
   a   b  series
0  a  aa  bb-bbb
1  b  bb   a-aaa
2  c  cc   a-ccc
3  d  dd     NaN

EDIT

编辑

I think I know what you want now, you can achieve what you want by converting the series into a dataframe and then merging using the indices:

我想我现在知道您想要什么，您可以通过将系列转换为数据框然后使用索引进行合并来实现您想要的：

In [96]:

df = pd.DataFrame({'a': ['a', 'b', 'c', 'd'], 'b': ['aa', 'bb', 'cc', 'dd']})

index1 = df[df['a'] == 'b'].index
index2 = df[df['a'] == 'a'].index

series1 = df.iloc[index1].apply(lambda x: x['b'] + '-bbb', axis=1)
series2 = df.iloc[index2].apply(lambda x: x['a'] + '-aaa', axis=1)
series3 = df.iloc[index2].apply(lambda x: x['a'] + '-ccc', axis=1)
# we now don't ignore the index in order to preserve the identity of the row we want to merge back to later
series = pd.concat([series1, series2, series3])
# construct a dataframe from the series and give the column a name
df1 = pd.DataFrame({'series':series})
# perform an outer merge on both df's indices
df.merge(df1, left_index=True, right_index=True, how='outer')

Out[96]:
   a   b  series
0  a  aa   a-aaa
0  a  aa   a-ccc
1  b  bb  bb-bbb
2  c  cc     NaN
3  d  dd     NaN

Answer 3

回答by Ankush Shah

how about concat?

连接怎么样？

s1 = df.iloc[index1].apply(lambda x: x['b'] + '-bbb', axis=1)
s2 = df.iloc[index2].apply(lambda x: x['a'] + '-aaa', axis=1)


s = pd.concat([s1,s2])
print s

1    bb-bbb
0    a-aaa
dtype: object

在 Pandas 中组合系列

提问by orange

采纳答案by orange

回答by EdChum

回答by Ankush Shah

相关推荐

最近更新

标签

在 Pandas 中组合系列

提问by orange

采纳答案by orange

回答by EdChum

回答by Ankush Shah

相关推荐

Python pandas：查找两列的余弦相似度

创建 Pandas DataFrame 的元素并将其设置为列表

pandas 在python pandas的数据框中为具有选定列的每行数据创建哈希值

Python pandas - 特定的合并/替换

相关推荐

最近更新

标签