pandas 熊猫系列列表到一个系列

Question

提问by Max

I have a Pandas Series of lists of strings:

我有一个 Pandas 系列的字符串列表：

0                           [slim, waist, man]
1                                [slim, waistline]
2                                     [santa]

As you can see, the lists vary by length. I want an efficient way to collapse this into one series

如您所见，列表因长度而异。我想要一种有效的方法将其折叠为一个系列

0 slim
1 waist
2 man
3 slim
4 waistline
5 santa

I know I can break up the lists using

我知道我可以使用

series_name.split(' ')

But I am having a hard time putting those strings back into one list.

但是我很难将这些字符串放回一个列表中。

Thanks!

谢谢！

Answer 1

采纳答案by tegancp

You are basically just trying to flatten a nested list here.

您基本上只是想在此处展平嵌套列表。

You should just be able to iterate over the elements of the series:

您应该能够迭代系列的元素：

slist =[]
for x in series:
    slist.extend(x)

or a slicker (but harder to understand) list comprehension:

或者更简洁（但更难理解）的列表理解：

slist = [st for row in s for st in row]

Answer 2

回答by mcwitt

Here's a simple method using only pandas functions:

这是一个仅使用 Pandas 函数的简单方法：

import pandas as pd

s = pd.Series([
    ['slim', 'waist', 'man'],
    ['slim', 'waistline'],
    ['santa']])

Then

然后

s.apply(pd.Series).stack().reset_index(drop=True)

gives the desired output. In some cases you might want to save the original index and add a second level to index the nested elements, e.g.

给出所需的输出。在某些情况下，您可能希望保存原始索引并添加第二级来索引嵌套元素，例如

0  0         slim
   1        waist
   2          man
1  0         slim
   1    waistline
2  0        santa

If this is what you want, just omit .reset_index(drop=True)from the chain.

如果这是您想要的，只需.reset_index(drop=True)从链中省略即可。

Answer 3

回答by Tadej Magajna

series_name.sum()

does exactly what you need. Do make sure it's a series of lists otherwise your values will be concatenated (if string) or added (if int)

正是您所需要的。请确保它是一系列列表，否则您的值将被连接（如果是字符串）或添加（如果是整数）

Answer 4

回答by Roman Kotov

In pandas version 0.25.0appeared a new method 'explode' for seriesand dataframes. Older versions do not have such method.

在Pandas版本中，0.25.0出现了一种用于系列和数据帧的新方法“爆炸” 。旧版本没有这种方法。

It helps to build the result you need.

它有助于构建您需要的结果。

For example you have such series:

例如你有这样的系列：

import pandas as pd

s = pd.Series([
    ['slim', 'waist', 'man'],
    ['slim', 'waistline'],
    ['santa']])

Then you can use

然后你可以使用

s.explode()

To get such result:

要得到这样的结果：

0         slim
0        waist
0          man
1         slim
1    waistline
2        santa

In case of dataframe:

在数据帧的情况下：

df = pd.DataFrame({
  's': pd.Series([
    ['slim', 'waist', 'man'],
    ['slim', 'waistline'],
    ['santa']
   ]),
   'a': 1
})

You will have such DataFrame:

您将拥有这样的 DataFrame：

                    s  a
0  [slim, waist, man]  1
1   [slim, waistline]  1
2             [santa]  1

Applying explode on scolumn:

在s柱上应用爆炸：

df.explode('s')

Will give you such result:

会给你这样的结果：

           s  a
0       slim  1
0      waist  1
0        man  1
1       slim  1
1  waistline  1
2      santa  1

If your series, contain empty lists

如果您的系列包含空列表

import pandas as pd

s = pd.Series([
    ['slim', 'waist', 'man'],
    ['slim', 'waistline'],
    ['santa'],
    []
])

Then running explodewill introduce NaN values for empty lists, like this:

然后运行explode将为空列表引入 NaN 值，如下所示：

0         slim
0        waist
0          man
1         slim
1    waistline
2        santa
3          NaN

If this is not desired, you can dropnamethod call:

如果不需要，您可以删除方法调用：

s.explode().dropna()

To get this result:

要得到这个结果：

0         slim
0        waist
0          man
1         slim
1    waistline
2        santa

Dataframes also have dropnamethod:

数据框也有dropna方法：

df = pd.DataFrame({
  's': pd.Series([
    ['slim', 'waist', 'man'],
    ['slim', 'waistline'],
    ['santa'],
    []
   ]),
   'a': 1
})

Running explodewithout dropna:

explode不使用 dropna运行：

df.explode('s')

Will result into:

将导致：

           s  a
0       slim  1
0      waist  1
0        man  1
1       slim  1
1  waistline  1
2      santa  1
3        NaN  1

with dropna:

与 dropna：

df.explode('s').dropna(subset=['s'])

Result:

结果：

           s  a
0       slim  1
0      waist  1
0        man  1
1       slim  1
1  waistline  1
2      santa  1

Answer 5

回答by peterfields

You can try using itertools.chain to simply flatten the lists:

您可以尝试使用 itertools.chain 来简单地展平列表：

In [70]: from itertools import chain
In [71]: import pandas as pnd
In [72]: s = pnd.Series([['slim', 'waist', 'man'], ['slim', 'waistline'], ['santa']])
In [73]: s
Out[73]: 
0    [slim, waist, man]
1     [slim, waistline]
2               [santa]
dtype: object
In [74]: new_s = pnd.Series(list(chain(*s.values)))
In [75]: new_s
Out[75]: 
0         slim
1        waist
2          man
3         slim
4    waistline
5        santa
dtype: object

Answer 6

回答by Anand S Kumar

You can use the list concatenation operator like below -

您可以使用如下所示的列表连接运算符 -

lst1 = ['hello','world']
lst2 = ['bye','world']
newlst = lst1 + lst2
print(newlst)
>> ['hello','world','bye','world']

Or you can use list.extend()function as below -

或者您可以使用list.extend()如下功能 -

lst1 = ['hello','world']
lst2 = ['bye','world']
lst1.extend(lst2)
print(lst1)
>> ['hello', 'world', 'bye', 'world']

Benefits of using extendfunction is that it can work on multiple types, where as concatenationoperator will only work if both LHS and RHS are lists.

使用extendfunction 的好处是它可以在多种类型上工作，而 asconcatenation操作符只有在 LHS 和 RHS 都是列表时才能工作。

Other examples of extendfunction -

其他extend功能示例-

lst1.extend(('Bye','Bye'))
>> ['hello', 'world', 'Bye', 'Bye']

Answer 7

回答by Adarsh Namdev

You may also try:

你也可以试试：

combined = []
for i in s.index:
    combined = combined + s.iloc[i]

print(combined)

s = pd.Series(combined)
print(s)

output:

输出：

['slim', 'waist', 'man', 'slim', 'waistline', 'santa']

0         slim
1        waist
2          man
3         slim
4    waistline
5        santa

dtype: object

Answer 8

回答by EliadL

If your pandasversion is too old to use series_name.explode(), this should work too:

如果您的pandas版本太旧而无法使用series_name.explode()，这也应该有效：

from itertools import chain

pd.Series(
    chain.from_iterable(
        value
        for i, value
        in series_name.iteritems()
    )
)

Answer 9

回答by vozman

Flattening and unflattening can be done using this function

可以使用此功能完成展平和反展平

def flatten(df, col):
    col_flat = pd.DataFrame([[i, x] for i, y in df[col].apply(list).iteritems() for x in y], columns=['I', col])
    col_flat = col_flat.set_index('I')
    df = df.drop(col, 1)
    df = df.merge(col_flat, left_index=True, right_index=True)

    return df

Unflattening:

不平整：

def unflatten(flat_df, col):
    flat_df.groupby(level=0).agg({**{c:'first' for c in flat_df.columns}, col: list})

After unflattening we get the same dataframe except column order:

展开后，我们得到相同的数据框，但列顺序除外：

(df.sort_index(axis=1) == unflatten(flatten(df)).sort_index(axis=1)).all().all()
>> True

pandas 熊猫系列列表到一个系列

提问by Max

采纳答案by tegancp

回答by mcwitt

回答by Tadej Magajna

回答by Roman Kotov

回答by peterfields

回答by Anand S Kumar

回答by Adarsh Namdev

回答by EliadL

回答by vozman

相关推荐

最近更新

标签

pandas 熊猫系列列表到一个系列

提问by Max

采纳答案by tegancp

回答by mcwitt

回答by Tadej Magajna

回答by Roman Kotov

回答by peterfields

回答by Anand S Kumar

回答by Adarsh Namdev

回答by EliadL

回答by vozman

相关推荐

pandas 在 Seaborn FacetGrid 中绘制多个 DataFrame 列

pandas 如何在熊猫中删除数据框？

pandas 如何使用pandas从excel文件中读取特定行

Pandas msgpack 与泡菜

相关推荐

最近更新

标签