Python：Pandas 根据字符串长度过滤字符串数据

Question

提问by notilas

I like to filter out data whose string length is not equal to 10.

我喜欢过滤掉字符串长度不等于10的数据。

If I try to filter out any row whose column A's or B's string length is not equal to 10, I tried this.

如果我尝试过滤掉 A 列或 B 列的字符串长度不等于 10 的任何行，我会尝试这样做。

df=pd.read_csv('filex.csv')
df.A=df.A.apply(lambda x: x if len(x)== 10 else np.nan)
df.B=df.B.apply(lambda x: x if len(x)== 10 else np.nan)
df=df.dropna(subset=['A','B'], how='any')

This works slow, but is working.

这工作缓慢，但正在工作。

However, it sometimes produce error when the data in A is not a string but a number (interpreted as a number when read_csv read the input file).

但是，当A中的数据不是字符串而是数字（read_csv读取输入文件时解释为数字）时，有时会产生错误。

  File "<stdin>", line 1, in <lambda>
TypeError: object of type 'float' has no len()

I believe there should be more efficient and elegant code instead of this.

我相信应该有更高效、更优雅的代码而不是这个。

Based on the answers and comments below, the simplest solution I found are:

根据下面的答案和评论，我找到的最简单的解决方案是：

df=df[df.A.apply(lambda x: len(str(x))==10]
df=df[df.B.apply(lambda x: len(str(x))==10]

or

或者

df=df[(df.A.apply(lambda x: len(str(x))==10) & (df.B.apply(lambda x: len(str(x))==10)]

or

或者

df=df[(df.A.astype(str).str.len()==10) & (df.B.astype(str).str.len()==10)]

Answer 1

采纳答案by unutbu

import pandas as pd

df = pd.read_csv('filex.csv')
df['A'] = df['A'].astype('str')
df['B'] = df['B'].astype('str')
mask = (df['A'].str.len() == 10) & (df['B'].str.len() == 10)
df = df.loc[mask]
print(df)

Applied to filex.csv:

应用于filex.csv：

A,B
123,abc
1234,abcd
1234567890,abcdefghij

the code above prints

上面的代码打印

            A           B
2  1234567890  abcdefghij

Answer 2

回答by przemo_li

If You have numbers in rows, then they will convert as floats.

如果您在行中有数字，那么它们将转换为浮点数。

Convert all the rows to strings after importing from cvs. For better performance split that lambdas into multiple threads.

从 cvs 导入后将所有行转换为字符串。为了获得更好的性能，将 lambda 拆分为多个线程。

Answer 3

回答by Mahdi Ghelichi

A more Pythonic way of filtering out rows based on given conditions of other columns and their values:

根据其他列的给定条件及其值过滤行的更 Pythonic 方式：

Assuming a df of:

假设 df 为：

data={"names":["Alice","Zac","Anna","O"],"cars":["Civic","BMW","Mitsubishi","Benz"],
     "age":["1","4","2","0"]}

df=pd.DataFrame(data)
df:
  age        cars  names
0   1       Civic  Alice
1   4         BMW    Zac
2   2  Mitsubishi   Anna
3   0        Benz      O

Then:

然后：

df[
df['names'].apply(lambda x: len(x)>1) &
df['cars'].apply(lambda x: "i" in x) &
df['age'].apply(lambda x: int(x)<2)
  ]

We will have :

我们将有：

  age   cars  names
0   1  Civic  Alice

In the conditions above we are looking first at the length of strings, then we check whether a letter ("i") exists in the strings or not, finally, we check for the value of integers in the first column.

在上面的条件中，我们首先查看字符串的长度，然后检查字符串中是否存在字母（“i”），最后检查第一列中整数的值。

Answer 4

回答by Vishal Suryavanshi

you can use df.apply(len). it will give you the result

你可以使用df.apply(len). 它会给你结果

Answer 5

回答by spongebob

I personally found this way to be the easiest:

我个人认为这种方式是最简单的：

df['column_name'] = df[df['column_name'].str.len()!=10]

Python：Pandas 根据字符串长度过滤字符串数据

提问by notilas

采纳答案by unutbu

回答by przemo_li

回答by Mahdi Ghelichi

回答by Vishal Suryavanshi

回答by spongebob

相关推荐

最近更新

标签

Python：Pandas 根据字符串长度过滤字符串数据

提问by notilas

采纳答案by unutbu

回答by przemo_li

回答by Mahdi Ghelichi

回答by Vishal Suryavanshi

回答by spongebob

相关推荐

Python Beautiful Soup 如何将 JSON 解码为`dict`？

Python 如果可以通过`==`比较dicts，为什么需要assertDictEqual？

你怎么知道你的列表在python中是否升序

Python 发送 JSON 字符串作为 post 请求

相关推荐

最近更新

标签