根据字符串条件为 Pandas 数据框列赋值

Question

提问by haimen

Suppose I have a dataframe,

假设我有一个数据框，

data
id  URL
1   www.pandora.com
2   m.jcpenney.com
3   www.youtube.com
4   www.facebook.com

I want to create a new column based on a condition that if the URL contains some particular word. Suppose if it contains 'youtube', I want my column value as youtube. So I tried the following,

我想根据 URL 是否包含某些特定单词的条件创建一个新列。假设它包含“youtube”，我希望我的列值为 youtube。所以我尝试了以下方法，

data['test'] = 'other'

so once we do that we have,

所以一旦我们这样做了，我们就有了

data['test']
other
other
other
other

then I tried this,

然后我尝试了这个，

data[data['URL'].str.contains("youtub") == True]['test'] = 'Youtube'
data[data['URL'].str.contains("face") == True]['test'] = 'Facebook'

Though this runs without any error, the value of the test column, doesn't change. It still has other only for all the columns. When I run this statement, ideally 3rd row alone show change to 'Youtube' and 4th to 'Facebook'. But it doesn't change. Can anybody tell me what mistake I am doing here?

虽然这运行没有任何错误，但测试列的值不会改变。它仍然只有其他所有列。当我运行此语句时，理想情况下，仅第 3 行显示更改为“Youtube”，第 4 行更改为“Facebook”。但它不会改变。谁能告诉我我在这里犯了什么错误？

Answer 1

回答by jezrael

I think you can use locwith boolean mask created by contains:

我认为您可以使用loc由contains以下方法创建的布尔掩码：

print data['URL'].str.contains("youtub")
0    False
1    False
2     True
3    False
Name: URL, dtype: bool

data.loc[data['URL'].str.contains("youtub"),'test'] = 'Youtube'
data.loc[data['URL'].str.contains("face"),'test'] = 'Facebook'
print data
   id               URL      test
0   1   www.pandora.com       NaN
1   2    m.jcpenney.com       NaN
2   3   www.youtube.com   Youtube
3   4  www.facebook.com  Facebook

Answer 2

回答by MaxU

i would do it in one shot:

我会一口气做到：

replacements = {
  r'.*youtube.*': 'Youtube',
  r'.*face.*': 'Facebook',
  r'.*pandora.*': 'Pandora'
}

df['text'] = df.URL.replace(replacements, regex=True)
df.loc[df.text.str.contains('\.'), 'text'] = 'other'
print(df)

Output:

输出：

                 URL      text
id
1    www.pandora.com   Pandora
2     m.jcpenney.com     other
3    www.youtube.com   Youtube
4   www.facebook.com  Facebook

Answer 3

回答by Alexander

Given that you probably want to check if the host name matches (rather than any word in the url), you could split the string on the dot and check if the second item (host name) is in your list.

鉴于您可能想检查主机名是否匹配（而不是 url 中的任何单词），您可以在点上拆分字符串并检查第二项（主机名）是否在您的列表中。

targets = ['pandora', 'youtube', 'facebook']
data['target_url'] = [url[1] if url[1] in targets else None 
                      for url in data.URL.str.split('.')]

data
   id               URL target_url
0   1   www.pandora.com    pandora
1   2    m.jcpenney.com       None
2   3   www.youtube.com    youtube
3   4  www.facebook.com   facebook

根据字符串条件为 Pandas 数据框列赋值

提问by haimen

回答by jezrael

回答by MaxU

回答by Alexander

相关推荐

最近更新

标签

根据字符串条件为 Pandas 数据框列赋值

提问by haimen

回答by jezrael

回答by MaxU

回答by Alexander

相关推荐

在 Pandas 中创建 DateTimeIndex

pandas group by with mode as aggregator

pandas 如何使用python pandas通过多索引获取价值？

pandas 删除PANDAS中标题的第二行

相关推荐

最近更新

标签