pandas 删除熊猫系列中的空列表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29100380/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove empty lists in pandas series
提问by The Unfun Cat
I have a long series like the following:
我有一个很长的系列,如下所示:
series = pd.Series([[(1,2)],[(3,5)],[],[(3,5)]])
In [151]: series
Out[151]:
0 [(1, 2)]
1 [(3, 5)]
2 []
3 [(3, 5)]
dtype: object
I want to remove all entries with an empty list. For some reason, boolean indexing does not work.
我想删除所有带有空列表的条目。出于某种原因,布尔索引不起作用。
The following tests both give the same error:
以下测试都给出了相同的错误:
series == [[(1,2)]]
series == [(1,2)]
ValueError: Arrays were different lengths: 4 vs 1
This is very strange, because in the simple example below, indexing works just like above:
这很奇怪,因为在下面的简单示例中,索引的工作方式与上面一样:
In [146]: pd.Series([1,2,3]) == [3]
Out[146]:
0 False
1 False
2 True
dtype: bool
P.S. ideally, I'd like to split the tuples in the series into a DataFrame of two columns also.
PS 理想情况下,我还想将系列中的元组拆分为两列的 DataFrame。
回答by Alex Riley
You could check to see if the lists are empty using str.len():
您可以使用以下命令检查列表是否为空str.len():
series.str.len() == 0
and then use this boolean series to remove the rows containing empty lists.
然后使用这个布尔系列删除包含空列表的行。
If each of your entries is a list containing a two-tuple (or else empty), you could create a two-column DataFrame by using the straccessor twice (once to select the first element of the list, then to access the elements of the tuple):
如果您的每个条目都是一个包含双元组(或为空)的列表,您可以通过使用str访问器两次(一次选择列表的第一个元素,然后访问列表的元素)来创建一个两列的 DataFrame元组):
pd.DataFrame({'a': series.str[0].str[0], 'b': series.str[0].str[1]})
Missing entries default to NaNwith this method.
缺少条目默认NaN使用此方法。
回答by unutbu
Your seriesis in a bad state -- having a Series of lists of tuples of ints
buries the useful data, the ints, inside too many layers of containers.
你的series状态很糟糕——有一系列整数元组列表将有用的数据,整数,埋在太多层的容器中。
However, to form the desired DataFrame, you could use
但是,要形成所需的 DataFrame,您可以使用
df = series.apply(lambda x: pd.Series(x[0]) if x else pd.Series()).dropna()
which yields
这产生
0 1
0 1 2
1 3 5
2 3 5
A better way would be to avoid building the malformed seriesaltogether and
form dfdirectly from the data:
更好的方法是避免series完全构建格式错误并df直接从数据中形成:
data = [[(1,2)],[(3,5)],[],[(3,5)]]
data = [pair for row in data for pair in row]
df = pd.DataFrame(data)
回答by Meow
Using the built in apply you can filter by the length of the list:
使用内置的应用程序,您可以按列表的长度进行过滤:
series = pd.Series([[(1,2)],[(3,5)],[],[(3,5)]])
series = series[series.apply(len) > 0]

