pandas 熊猫系列列表到一个系列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30885005/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Series of lists to one series
提问by Max
I have a Pandas Series of lists of strings:
我有一个 Pandas 系列的字符串列表:
0 [slim, waist, man]
1 [slim, waistline]
2 [santa]
As you can see, the lists vary by length. I want an efficient way to collapse this into one series
如您所见,列表因长度而异。我想要一种有效的方法将其折叠为一个系列
0 slim
1 waist
2 man
3 slim
4 waistline
5 santa
I know I can break up the lists using
我知道我可以使用
series_name.split(' ')
But I am having a hard time putting those strings back into one list.
但是我很难将这些字符串放回一个列表中。
Thanks!
谢谢!
采纳答案by tegancp
You are basically just trying to flatten a nested list here.
您基本上只是想在此处展平嵌套列表。
You should just be able to iterate over the elements of the series:
您应该能够迭代系列的元素:
slist =[]
for x in series:
slist.extend(x)
or a slicker (but harder to understand) list comprehension:
或者更简洁(但更难理解)的列表理解:
slist = [st for row in s for st in row]
回答by mcwitt
Here's a simple method using only pandas functions:
这是一个仅使用 Pandas 函数的简单方法:
import pandas as pd
s = pd.Series([
['slim', 'waist', 'man'],
['slim', 'waistline'],
['santa']])
Then
然后
s.apply(pd.Series).stack().reset_index(drop=True)
gives the desired output. In some cases you might want to save the original index and add a second level to index the nested elements, e.g.
给出所需的输出。在某些情况下,您可能希望保存原始索引并添加第二级来索引嵌套元素,例如
0 0 slim
1 waist
2 man
1 0 slim
1 waistline
2 0 santa
If this is what you want, just omit .reset_index(drop=True)from the chain.
如果这是您想要的,只需.reset_index(drop=True)从链中省略即可。
回答by Tadej Magajna
series_name.sum()
does exactly what you need. Do make sure it's a series of lists otherwise your values will be concatenated (if string) or added (if int)
正是您所需要的。请确保它是一系列列表,否则您的值将被连接(如果是字符串)或添加(如果是整数)
回答by Roman Kotov
In pandas version 0.25.0appeared a new method 'explode' for seriesand dataframes. Older versions do not have such method.
在Pandas版本中,0.25.0出现了一种用于系列和数据帧的新方法“爆炸” 。旧版本没有这种方法。
It helps to build the result you need.
它有助于构建您需要的结果。
For example you have such series:
例如你有这样的系列:
import pandas as pd
s = pd.Series([
['slim', 'waist', 'man'],
['slim', 'waistline'],
['santa']])
Then you can use
然后你可以使用
s.explode()
To get such result:
要得到这样的结果:
0 slim
0 waist
0 man
1 slim
1 waistline
2 santa
In case of dataframe:
在数据帧的情况下:
df = pd.DataFrame({
's': pd.Series([
['slim', 'waist', 'man'],
['slim', 'waistline'],
['santa']
]),
'a': 1
})
You will have such DataFrame:
您将拥有这样的 DataFrame:
s a
0 [slim, waist, man] 1
1 [slim, waistline] 1
2 [santa] 1
Applying explode on scolumn:
在s柱上应用爆炸:
df.explode('s')
Will give you such result:
会给你这样的结果:
s a
0 slim 1
0 waist 1
0 man 1
1 slim 1
1 waistline 1
2 santa 1
If your series, contain empty lists
如果您的系列包含空列表
import pandas as pd
s = pd.Series([
['slim', 'waist', 'man'],
['slim', 'waistline'],
['santa'],
[]
])
Then running explodewill introduce NaN values for empty lists, like this:
然后运行explode将为空列表引入 NaN 值,如下所示:
0 slim
0 waist
0 man
1 slim
1 waistline
2 santa
3 NaN
If this is not desired, you can dropnamethod call:
如果不需要,您可以删除方法调用:
s.explode().dropna()
To get this result:
要得到这个结果:
0 slim
0 waist
0 man
1 slim
1 waistline
2 santa
Dataframes also have dropnamethod:
数据框也有dropna方法:
df = pd.DataFrame({
's': pd.Series([
['slim', 'waist', 'man'],
['slim', 'waistline'],
['santa'],
[]
]),
'a': 1
})
Running explodewithout dropna:
explode不使用 dropna运行:
df.explode('s')
Will result into:
将导致:
s a
0 slim 1
0 waist 1
0 man 1
1 slim 1
1 waistline 1
2 santa 1
3 NaN 1
with dropna:
与 dropna:
df.explode('s').dropna(subset=['s'])
Result:
结果:
s a
0 slim 1
0 waist 1
0 man 1
1 slim 1
1 waistline 1
2 santa 1
回答by peterfields
You can try using itertools.chain to simply flatten the lists:
您可以尝试使用 itertools.chain 来简单地展平列表:
In [70]: from itertools import chain
In [71]: import pandas as pnd
In [72]: s = pnd.Series([['slim', 'waist', 'man'], ['slim', 'waistline'], ['santa']])
In [73]: s
Out[73]:
0 [slim, waist, man]
1 [slim, waistline]
2 [santa]
dtype: object
In [74]: new_s = pnd.Series(list(chain(*s.values)))
In [75]: new_s
Out[75]:
0 slim
1 waist
2 man
3 slim
4 waistline
5 santa
dtype: object
回答by Anand S Kumar
You can use the list concatenation operator like below -
您可以使用如下所示的列表连接运算符 -
lst1 = ['hello','world']
lst2 = ['bye','world']
newlst = lst1 + lst2
print(newlst)
>> ['hello','world','bye','world']
Or you can use list.extend()function as below -
或者您可以使用list.extend()如下功能 -
lst1 = ['hello','world']
lst2 = ['bye','world']
lst1.extend(lst2)
print(lst1)
>> ['hello', 'world', 'bye', 'world']
Benefits of using extendfunction is that it can work on multiple types, where as concatenationoperator will only work if both LHS and RHS are lists.
使用extendfunction 的好处是它可以在多种类型上工作,而 asconcatenation操作符只有在 LHS 和 RHS 都是列表时才能工作。
Other examples of extendfunction -
其他extend功能示例-
lst1.extend(('Bye','Bye'))
>> ['hello', 'world', 'Bye', 'Bye']
回答by Adarsh Namdev
You may also try:
你也可以试试:
combined = []
for i in s.index:
combined = combined + s.iloc[i]
print(combined)
s = pd.Series(combined)
print(s)
output:
输出:
['slim', 'waist', 'man', 'slim', 'waistline', 'santa']
0 slim
1 waist
2 man
3 slim
4 waistline
5 santa
dtype: object
回答by EliadL
If your pandasversion is too old to use series_name.explode(), this should work too:
如果您的pandas版本太旧而无法使用series_name.explode(),这也应该有效:
from itertools import chain
pd.Series(
chain.from_iterable(
value
for i, value
in series_name.iteritems()
)
)
回答by vozman
Flattening and unflattening can be done using this function
可以使用此功能完成展平和反展平
def flatten(df, col):
col_flat = pd.DataFrame([[i, x] for i, y in df[col].apply(list).iteritems() for x in y], columns=['I', col])
col_flat = col_flat.set_index('I')
df = df.drop(col, 1)
df = df.merge(col_flat, left_index=True, right_index=True)
return df
Unflattening:
不平整:
def unflatten(flat_df, col):
flat_df.groupby(level=0).agg({**{c:'first' for c in flat_df.columns}, col: list})
After unflattening we get the same dataframe except column order:
展开后,我们得到相同的数据框,但列顺序除外:
(df.sort_index(axis=1) == unflatten(flatten(df)).sort_index(axis=1)).all().all()
>> True

