pandas 使用列表理解修改数据框列

Question

提问by Luis Ramon Ramirez Rodriguez

I have a list with about 90k strings, and a Data Frame with several columns, I'm interested in checking whether a string of the list is in column_1 and if it is assign the same value at column_2.

我有一个包含大约 90k 个字符串的列表和一个包含多列的数据框，我有兴趣检查列表的字符串是否在 column_1 中，以及它是否在 column_2 中分配了相同的值。

I can do this:

我可以做这个：

for i in range(len(my_list)):
    item = list[i]
    for j in range(len(df)):
         if item == df['column_1'][j]:
             df['column_2'][j] = item

But I would prefer to avoid the nested loops

但我宁愿避免嵌套循环

I tried this

我试过这个

for item in my list:
    if item in list(df['column _1']):
          position = df[df['column_1']==item]].index.values[0]
          df['column_2'][position]  = item

but I think that this solution is even slower and harder to read, can this operation be done with a simple list comprehension?

但我认为这个解决方案更慢更难阅读，这个操作可以通过简单的列表理解来完成吗？

Edit.

编辑。

Second solution it's considerable faster, about an order of magnitude. why is that? seems that in that case it has to search twice for the mach:

第二个解决方案要快得多，大约一个数量级。这是为什么？似乎在这种情况下它必须搜索两次以获得马赫数：

here:

这里：

if item in list(df['column _1'])

and here:

和这里：

possition = df[df['column_1]=='tem]].index.values[0]

Still I would prefer a simpler solution.

我仍然更喜欢更简单的解决方案。

Answer 1

采纳答案by res_edit

You can do this by splitting the filtering and assignment actions you described into two distinct steps.

您可以通过将您描述的过滤和分配操作分成两个不同的步骤来做到这一点。

Pandas series objects include an 'isin' method that could let you identify rows whose column_1 values are in my_list and saves the results off in a boolean-valued series. This can in turn be used with the .loc indexing method to copy the values from the appropriate rows from column 1 to column 2

Pandas 系列对象包括一个“isin”方法，可以让您识别 column_1 值在 my_list 中的行，并将结果保存在布尔值系列中。这又可以与 .loc 索引方法一起使用，以将值从第 1 列的适当行复制到第 2 列

# Identify the matching rows
matches = df['column_1'].isin(my_list)
# Set the column_2 entries to column_1 in the matching rows
df.loc[matches,'column_2'] = df.loc[matches,'column_1']

If column_2 doesn't already exist, this approach creates column_2 and sets the non_matching values to NaN. The .loc method is used to avoid operating on a copy of the data when performing the indexing operations.

如果 column_2 尚不存在，则此方法会创建 column_2 并将非匹配值设置为 NaN。.loc 方法用于避免在执行索引操作时对数据副本进行操作。

Answer 2

回答by Kevin

Let's say you have a list:

假设您有一个列表：

l = ['foo', 'bar']

and a DataFrame:

和一个数据帧：

df = pd.DataFrame(['some', 'short', 'string', 'has', 'foo'], columns=['col1'])

You can use df.apply

您可以使用 df.apply

df['col2'] = df.apply(lambda x: x['col1'] if x['col1'] in l else None, axis=1)

df
    col1    col2
0   some    None
1   short   None
2   string  None
3   has     None
4   foo     foo

Answer 3

回答by MaxU

Try this one-liner:

试试这个单线：

df.loc[(df['column_1'].isin(my_list)), 'column_2'] = df['column_1']

The difference to @res_edit's solution is the lack of the second df.loc[]which should work bit faster...

与@res_edit 解决方案的不同之处在于缺少第二个df.loc[]应该更快地工作...

Answer 4

回答by Stop harming Monica

According to conventional wisdom you shouldn't be using a list comprehension for side effects. You'd be creating a (maybe huge) list that you don't need, wasting resources and hurting readability.

根据传统观点，您不应该对副作用使用列表理解。你会创建一个你不需要的（可能是巨大的）列表，浪费资源并损害可读性。

https://codereview.stackexchange.com/questions/58050/is-it-pythonic-to-create-side-effect-inside-list-comprehension Is it Pythonic to use list comprehensions for just side effects?Python loops vs comprehension lists vs map for side effects (i.e. not using return values)

https://codereview.stackexchange.com/questions/58050/is-it-pythonic-to-create-side-effect-inside-list-comprehension 将列表推导用于副作用是否是 Pythonic？Python 循环 vs 理解列表 vs 副作用映射（即不使用返回值）

pandas 使用列表理解修改数据框列

提问by Luis Ramon Ramirez Rodriguez

采纳答案by res_edit

回答by Kevin

回答by MaxU

回答by Stop harming Monica

相关推荐

最近更新

标签

pandas 使用列表理解修改数据框列

提问by Luis Ramon Ramirez Rodriguez

采纳答案by res_edit

回答by Kevin

回答by MaxU

回答by Stop harming Monica

相关推荐

按边距（“全部”）值列对 Pandas 数据透视表进行排序

pandas 在所有子图中绘制带有列的 DataFrame

pandas 系列对象没有属性 'strip'

pandas 向 Python 中的数据框列添加百分号

相关推荐

最近更新

标签