pandas 删除熊猫系列中的空列表

Question

提问by The Unfun Cat

I have a long series like the following:

我有一个很长的系列，如下所示：

series = pd.Series([[(1,2)],[(3,5)],[],[(3,5)]])

In [151]: series
Out[151]:
0    [(1, 2)]
1    [(3, 5)]
2          []
3    [(3, 5)]
dtype: object

I want to remove all entries with an empty list. For some reason, boolean indexing does not work.

我想删除所有带有空列表的条目。出于某种原因，布尔索引不起作用。

The following tests both give the same error:

以下测试都给出了相同的错误：

series == [[(1,2)]]
series == [(1,2)]

ValueError: Arrays were different lengths: 4 vs 1

This is very strange, because in the simple example below, indexing works just like above:

这很奇怪，因为在下面的简单示例中，索引的工作方式与上面一样：

In [146]: pd.Series([1,2,3]) == [3]
Out[146]:
0    False
1    False
2     True
dtype: bool

P.S. ideally, I'd like to split the tuples in the series into a DataFrame of two columns also.

PS 理想情况下，我还想将系列中的元组拆分为两列的 DataFrame。

Answer 1

回答by Alex Riley

You could check to see if the lists are empty using str.len():

您可以使用以下命令检查列表是否为空str.len()：

series.str.len() == 0

and then use this boolean series to remove the rows containing empty lists.

然后使用这个布尔系列删除包含空列表的行。

If each of your entries is a list containing a two-tuple (or else empty), you could create a two-column DataFrame by using the straccessor twice (once to select the first element of the list, then to access the elements of the tuple):

如果您的每个条目都是一个包含双元组（或为空）的列表，您可以通过使用str访问器两次（一次选择列表的第一个元素，然后访问列表的元素）来创建一个两列的 DataFrame元组）：

pd.DataFrame({'a': series.str[0].str[0], 'b': series.str[0].str[1]})

Missing entries default to NaNwith this method.

缺少条目默认NaN使用此方法。

Answer 2

回答by unutbu

Your seriesis in a bad state -- having a Series of lists of tuples of ints buries the useful data, the ints, inside too many layers of containers.

你的series状态很糟糕——有一系列整数元组列表将有用的数据，整数，埋在太多层的容器中。

However, to form the desired DataFrame, you could use

但是，要形成所需的 DataFrame，您可以使用

df = series.apply(lambda x: pd.Series(x[0]) if x else pd.Series()).dropna()

which yields

这产生

A better way would be to avoid building the malformed seriesaltogether and form dfdirectly from the data:

更好的方法是避免series完全构建格式错误并df直接从数据中形成：

data = [[(1,2)],[(3,5)],[],[(3,5)]]
data = [pair for row in data for pair in row]
df = pd.DataFrame(data)

Answer 3

回答by Meow

Using the built in apply you can filter by the length of the list:

使用内置的应用程序，您可以按列表的长度进行过滤：

series = pd.Series([[(1,2)],[(3,5)],[],[(3,5)]])
series = series[series.apply(len) > 0]

pandas 删除熊猫系列中的空列表

提问by The Unfun Cat

回答by Alex Riley

回答by unutbu

回答by Meow

相关推荐

最近更新

标签

pandas 删除熊猫系列中的空列表

提问by The Unfun Cat

回答by Alex Riley

回答by unutbu

回答by Meow

相关推荐

pandas pyplot 反转 x 轴和反转表格子图

Pandas Python - 将 HH:MM:SS 转换为聚合中的秒数（csv 文件）

pandas - 在分组数据帧后仅保留 True 值

pandas 如何在read_csv中指定日期时间格式

相关推荐

最近更新

标签