pandas Panda 的数据框将一列拆分为多列

Question

提问by Emdadul

I have a pandas dataframe looks like as below:

我有一个Pandas数据框，如下所示：

date     |    location          | occurance <br>
------------------------------------------------------
somedate |united_kingdom_london | 5  
somedate |united_state_newyork  | 5

I want it to transform into

我想让它变成

date     | country        | city    | occurance <br>
---------------------------------------------------
somedate | united kingdom | london  | 5  
---------------------------------------------------
somedate | united state   | newyork | 5

I am new to Python and after some research I have written following code, but seems to unable to extract country and city:

我是 Python 新手，经过一些研究，我编写了以下代码，但似乎无法提取国家和城市：

df.location= df.location.replace({'-': ' '}, regex=True)
df.location= df.location.replace({'_': ' '}, regex=True)

temp_location = df['location'].str.split(' ').tolist() 

location_data = pd.DataFrame(temp_location, columns=['country', 'city'])

I appreciate your response.

我很感激你的回应。

Answer 1

回答by Merlin

Starting with this:

从这个开始：

df = pd.DataFrame({'Date': ['somedate', 'somedate'],
                   'location': ['united_kingdom_london', 'united_state_newyork'],
                   'occurence': [5, 5]})

Try this:

尝试这个：

df['Country'] = df['location'].str.rpartition('_')[0].str.replace("_", " ")
df['City']    = df['location'].str.rpartition('_')[2]
df[['Date','Country', 'City', 'occurence']]

      Date        Country      City  occurence
0  somedate  united kingdom   london          5
1  somedate    united state  newyork          5

Borrowing idea from @MaxU

借用@MaxU 的想法

df[['Country'," " , 'City']] = (df.location.str.replace('_',' ').str.rpartition(' ', expand= True ))
df[['Date','Country', 'City','occurence' ]]

      Date        Country      City  occurence
0  somedate  united kingdom   london          5
1  somedate    united state  newyork          5

Answer 2

回答by Kartik

Try this:

尝试这个：

temp_location = {}
splits = df['location'].str.split(' ')
temp_location['country'] = splits[0:-1].tolist() 
temp_location['city'] = splits[-1].tolist() 

location_data = pd.DataFrame(temp_location)

If you want it back in the original df:

如果你想要它回到原来的 df：

df['country'] = splits[0:-1].tolist() 
df['city'] = splits[-1].tolist()

Answer 3

回答by Parfait

Consider splitting the column's string value using rfind()

考虑使用拆分列的字符串值 rfind()

import pandas as pd

df = pd.DataFrame({'Date': ['somedate', 'somedate'],
                   'location': ['united_kingdom_london', 'united_state_newyork'],
                   'occurence': [5, 5]})

df['country'] = df['location'].apply(lambda x: x[0:x.rfind('_')])
df['city'] = df['location'].apply(lambda x: x[x.rfind('_')+1:])

df = df[['Date', 'country', 'city', 'occurence']]
print(df)

#        Date         country     city  occurence
# 0  somedate  united_kingdom   london          5
# 1  somedate    united_state  newyork          5

Answer 4

回答by mgilbert

Something like this works

像这样的工作

import pandas as pd

df = pd.DataFrame({'Date': ['somedate', 'somedate'],
                   'location': ['united_kingdom_london', 'united_state_newyork'],
                   'occurence': [5, 5]})

df.location = df.location.str[::-1].str.replace("_", " ", 1).str[::-1]
newcols = df.location.str.split(" ")
newcols = pd.DataFrame(df.location.str.split(" ").tolist(),
                         columns=["country", "city"])
newcols.country = newcols.country.str.replace("_", " ")
df = pd.concat([df, newcols], axis=1)
df.drop("location", axis=1, inplace=True)
print(df)

         Date  occurence         country     city
  0  somedate          5  united kingdom   london
  1  somedate          5    united state  newyork

You could use regex in the replace for a more complicated pattern but if it's just the word after the last _I find it easier to just reverse the str twice as a hack rather than fiddling around with regular expressions

您可以在替换中使用正则表达式来获得更复杂的模式，但如果它只是最后一个之后的单词，_我发现将 str 反转两次作为一种黑客攻击更容易，而不是摆弄正则表达式

Answer 5

回答by MaxU

I would use .str.extract()method:

我会使用.str.extract()方法：

In [107]: df
Out[107]:
       Date               location  occurence
0  somedate  united_kingdom_london          5
1  somedate   united_state_newyork          5
2  somedate         germany_munich          5

In [108]: df[['country','city']] = (df.location.str.replace('_',' ')
   .....:                             .str.extract(r'(.*)\s+([^\s]*)', expand=True))

In [109]: df
Out[109]:
       Date               location  occurence         country     city
0  somedate  united_kingdom_london          5  united kingdom   london
1  somedate   united_state_newyork          5    united state  newyork
2  somedate         germany_munich          5         germany   munich

In [110]: df = df.drop('location', 1)

In [111]: df
Out[111]:
       Date  occurence         country     city
0  somedate          5  united kingdom   london
1  somedate          5    united state  newyork
2  somedate          5         germany   munich

PS please be aware that it's not possible to parse properly (to distinguish) between rows containing two-words country + one-word city and rows containing one-word country + two-words city (unless you have a full list of countries so you check it against this list)...

PS请注意，无法正确解析（区分）包含两个词国家+一个词城市的行和包含一个词国家+两个词城市的行（除非您有完整的国家/地区列表，因此您请对照此列表进行检查）...

pandas Panda 的数据框将一列拆分为多列

提问by Emdadul

回答by Merlin

回答by Kartik

回答by Parfait

回答by mgilbert

回答by MaxU

相关推荐

最近更新

标签

pandas Panda 的数据框将一列拆分为多列

提问by Emdadul

回答by Merlin

回答by Kartik

回答by Parfait

回答by mgilbert

回答by MaxU

相关推荐

Pandas - 关于应用功能缓慢的解释

在 Pandas 中使用 .notnull() 时正确的语法是什么？

如何在 Pandas 中按降序对两列进行排序？

使用 numpy/pandas 在 Python 中读取 CSV 文件的最后 N 行

相关推荐

最近更新

标签