在 Pandas 中组合系列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25973514/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Combining Series in Pandas
提问by orange
I need to combine multiple Pandas Seriesthat contain string values. The series are messages that result from multiple validation steps. I try to combine these messages into 1 Seriesto attach it to the DataFrame. The problem is that the result is empty.
我需要组合多个Series包含字符串值的Pandas 。该系列是由多个验证步骤产生的消息。我尝试将这些消息合并为 1Series以将其附加到DataFrame. 问题是结果是空的。
This is an example:
这是一个例子:
import pandas as pd
df = pd.DataFrame({'a': ['a', 'b', 'c', 'd'], 'b': ['aa', 'bb', 'cc', 'dd']})
index1 = df[df['a'] == 'b'].index
index2 = df[df['a'] == 'a'].index
series = df.iloc[index1].apply(lambda x: x['b'] + '-bbb', axis=1)
series += df.iloc[index2].apply(lambda x: x['a'] + '-aaa', axis=1)
print series
# >>> series
# 0 NaN
# 1 NaN
Update
更新
import pandas as pd
df = pd.DataFrame({'a': ['a', 'b', 'c', 'd'], 'b': ['aa', 'bb', 'cc', 'dd']})
index1 = df[df['a'] == 'b'].index
index2 = df[df['a'] == 'a'].index
series1 = df.iloc[index1].apply(lambda x: x['b'] + '-bbb', axis=1)
series2 = df.iloc[index2].apply(lambda x: x['a'] + '-aaa', axis=1)
series3 = df.iloc[index2].apply(lambda x: x['a'] + '-ccc', axis=1)
# series3 causes a ValueError: cannot reindex from a duplicate axis
series = pd.concat([series1, series2, series3])
df['series'] = series
print df
Update2
更新2
In this example the indices seem to get mixed up.
在这个例子中,索引似乎混淆了。
import pandas as pd
df = pd.DataFrame({'a': ['a', 'b', 'c', 'd'], 'b': ['aa', 'bb', 'cc', 'dd']})
index1 = df[df['a'] == 'a'].index
index2 = df[df['a'] == 'b'].index
index3 = df[df['a'] == 'c'].index
series1 = df.iloc[index1].apply(lambda x: x['a'] + '-aaa', axis=1)
series2 = df.iloc[index2].apply(lambda x: x['a'] + '-bbb', axis=1)
series3 = df.iloc[index3].apply(lambda x: x['a'] + '-ccc', axis=1)
print series1
print
print series2
print
print series3
print
df['series'] = pd.concat([series1, series2, series3], ignore_index=True)
print df
print
df['series'] = pd.concat([series2, series1, series3], ignore_index=True)
print df
print
df['series'] = pd.concat([series3, series2, series1], ignore_index=True)
print df
print
This results in this output:
这导致此输出:
0 a-aaa
dtype: object
1 b-bbb
dtype: object
2 c-ccc
dtype: object
a b series
0 a aa a-aaa
1 b bb b-bbb
2 c cc c-ccc
3 d dd NaN
a b series
0 a aa b-bbb
1 b bb a-aaa
2 c cc c-ccc
3 d dd NaN
a b series
0 a aa c-ccc
1 b bb b-bbb
2 c cc a-aaa
3 d dd NaN
I would expect only a's in row0, only b's in row1 and only c's in row2, but that's not the case...
我希望第 0 行只有 a,第 1 行只有 b,第 2 行只有 c,但事实并非如此......
Update 3
更新 3
Here's a better example which should demonstrate the expected behaviour. As I said, the use case is that for a given DataFrame, a function evaluates each row and possibly returns an error message for some of the rows as a Series(some indexes are contained, some are not; if no error returns, the error series is empty).
这是一个更好的例子,它应该展示预期的行为。正如我所说,用例是对于给定的DataFrame,一个函数评估每一行,并可能将某些行的错误消息作为 a Series(包含一些索引,有些不包含;如果没有错误返回,错误系列是空的)。
In [12]:
s1 = pd.Series(['b', 'd'], index=[1, 3])
s2 = pd.Series(['a', 'b'], index=[0, 1])
s3 = pd.Series(['c', 'e'], index=[2, 4])
s4 = pd.Series([], index=[])
pd.concat([s1, s2, s3, s4]).sort_index()
# I'd like to get:
#
# 0 a
# 1 b b
# 2 c
# 3 d
# 4 e
Out[12]:
0 a
1 b
1 b
2 c
3 d
4 e
dtype: object
采纳答案by orange
I might have found a solution. I hope someone can comment on it...
我可能已经找到了解决方案。希望有人能评论一下。。。
s1 = pd.Series(['b', 'd'], index=[1, 3])
s2 = pd.Series(['a', 'b'], index=[0, 1])
s3 = pd.Series(['c', 'e'], index=[2, 4])
s4 = pd.Series([], index=[])
pd.concat([s1, s2, s3, s4]).sort_index()
df1 = pd.DataFrame(s1)
df2 = pd.DataFrame(s2)
df3 = pd.DataFrame(s3)
df4 = pd.DataFrame(s4)
d = pd.DataFrame({0:[]})
d = pd.merge(df1, d, how='outer', left_index=True, right_index=True)
d = d.fillna('')
d = pd.DataFrame(d['0_x'] + d['0_y'])
d = pd.merge(df2, d, how='outer', left_index=True, right_index=True)
d = d.fillna('')
d = pd.DataFrame(d['0_x'] + d['0_y'])
d = pd.merge(df3, d, how='outer', left_index=True, right_index=True)
d = d.fillna('')
d = pd.DataFrame(d['0_x'] + d['0_y'])
d = pd.merge(df4, d, how='outer', left_index=True, right_index=True)
d = d.fillna('')
d = pd.DataFrame(d['0_x'] + d['0_y'])
print d
which returns
返回
0
0 a
1 bb
2 c
3 d
4 e
回答by EdChum
When concatenating the default is to use the existing indices, however if they collide then this will raise a ValueErroras you've found so you need to set ignore_index=True:
连接默认值是使用现有索引时,但是如果它们发生冲突,那么这将引发ValueError您发现的a ,因此您需要设置ignore_index=True:
In [33]:
series = pd.concat([series1, series2, series3], ignore_index=True)
df['series'] = series
print (df)
a b series
0 a aa bb-bbb
1 b bb a-aaa
2 c cc a-ccc
3 d dd NaN
EDIT
编辑
I think I know what you want now, you can achieve what you want by converting the series into a dataframe and then merging using the indices:
我想我现在知道您想要什么,您可以通过将系列转换为数据框然后使用索引进行合并来实现您想要的:
In [96]:
df = pd.DataFrame({'a': ['a', 'b', 'c', 'd'], 'b': ['aa', 'bb', 'cc', 'dd']})
index1 = df[df['a'] == 'b'].index
index2 = df[df['a'] == 'a'].index
series1 = df.iloc[index1].apply(lambda x: x['b'] + '-bbb', axis=1)
series2 = df.iloc[index2].apply(lambda x: x['a'] + '-aaa', axis=1)
series3 = df.iloc[index2].apply(lambda x: x['a'] + '-ccc', axis=1)
# we now don't ignore the index in order to preserve the identity of the row we want to merge back to later
series = pd.concat([series1, series2, series3])
# construct a dataframe from the series and give the column a name
df1 = pd.DataFrame({'series':series})
# perform an outer merge on both df's indices
df.merge(df1, left_index=True, right_index=True, how='outer')
Out[96]:
a b series
0 a aa a-aaa
0 a aa a-ccc
1 b bb bb-bbb
2 c cc NaN
3 d dd NaN
回答by Ankush Shah
how about concat?
连接怎么样?
s1 = df.iloc[index1].apply(lambda x: x['b'] + '-bbb', axis=1)
s2 = df.iloc[index2].apply(lambda x: x['a'] + '-aaa', axis=1)
s = pd.concat([s1,s2])
print s
1 bb-bbb
0 a-aaa
dtype: object

