python pandas用数字替换数据框中的字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17114904/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python pandas replacing strings in dataframe with numbers
提问by jonas
Is there anyway to use the mapping function or something better to replace values in an entire dataframe?
无论如何使用映射函数或更好的东西来替换整个数据帧中的值?
I only know how to perform the mapping on series.
我只知道如何在系列上执行映射。
I would like to replace the strings in the 'tesst' and 'set' column with a number for example set = 1, test =2
我想用数字替换“testst”和“set”列中的字符串,例如 set = 1, test =2
Here is a example of my dataset: (Original dataset is very large)
这是我的数据集的示例:(原始数据集非常大)
ds_r
respondent brand engine country aware aware_2 aware_3 age tesst set
0 a volvo p swe 1 0 1 23 set set
1 b volvo None swe 0 0 1 45 set set
2 c bmw p us 0 0 1 56 test test
3 d bmw p us 0 1 1 43 test test
4 e bmw d germany 1 0 1 34 set set
5 f audi d germany 1 0 1 59 set set
6 g volvo d swe 1 0 0 65 test set
7 h audi d swe 1 0 0 78 test set
8 i volvo d us 1 1 1 32 set set
Final result should be
最终结果应该是
ds_r
respondent brand engine country aware aware_2 aware_3 age tesst set
0 a volvo p swe 1 0 1 23 1 1
1 b volvo None swe 0 0 1 45 1 1
2 c bmw p us 0 0 1 56 2 2
3 d bmw p us 0 1 1 43 2 2
4 e bmw d germany 1 0 1 34 1 1
5 f audi d germany 1 0 1 59 1 1
6 g volvo d swe 1 0 0 65 2 1
7 h audi d swe 1 0 0 78 2 1
8 i volvo d us 1 1 1 32 1 1
grateful for advise,
感谢指教,
采纳答案by Dan Allan
What about DataFrame.replace?
In [9]: mapping = {'set': 1, 'test': 2}
In [10]: df.replace({'set': mapping, 'tesst': mapping})
Out[10]:
Unnamed: 0 respondent brand engine country aware aware_2 aware_3 age \
0 0 a volvo p swe 1 0 1 23
1 1 b volvo None swe 0 0 1 45
2 2 c bmw p us 0 0 1 56
3 3 d bmw p us 0 1 1 43
4 4 e bmw d germany 1 0 1 34
5 5 f audi d germany 1 0 1 59
6 6 g volvo d swe 1 0 0 65
7 7 h audi d swe 1 0 0 78
8 8 i volvo d us 1 1 1 32
tesst set
0 2 1
1 1 2
2 2 1
3 1 2
4 2 1
5 1 2
6 2 1
7 1 2
8 2 1
As @Jeff pointed out in the comments, in pandas versions < 0.11.1, manually tack .convert_objects()onto the end to properly convert tesst and set to int64columns, in case that matters in subsequent operations.
正如@Jeff 在评论中指出的那样,在 Pandas 版本 < 0.11.1 中,手动添加.convert_objects()到末尾以正确转换测试并设置为int64列,以防在后续操作中很重要。
回答by bdiamante
You can use the applymapDataFrame function to do this:
您可以使用applymapDataFrame 函数来执行此操作:
In [26]: df = DataFrame({"A": [1,2,3,4,5], "B": ['a','b','c','d','e'],
"C": ['b','a','c','c','d'], "D": ['a','c',7,9,2]})
In [27]: df
Out[27]:
A B C D
0 1 a b a
1 2 b a c
2 3 c c 7
3 4 d c 9
4 5 e d 2
In [28]: mymap = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}
In [29]: df.applymap(lambda s: mymap.get(s) if s in mymap else s)
Out[29]:
A B C D
0 1 1 2 1
1 2 2 1 3
2 3 3 3 7
3 4 4 3 9
4 5 5 4 2
回答by Brandon
I know this is old, but adding for those searching as I was. Create a dataframe in pandas, df in this code
我知道这是旧的,但为那些像我一样搜索的人添加。在 Pandas 中创建一个数据框,在此代码中为 df
ip_addresses = df.source_ip.unique()
ip_dict = dict(zip(ip_addresses, range(len(ip_addresses))))
That will give you a dictionary map of the ip addresses without having to write it out.
这将为您提供 IP 地址的字典映射,而无需将其写出来。
回答by Samer Ayoub
To convert Strings like 'volvo','bmw' into integers first convert it to a dataframe then pass it to pandas.get_dummies()
要将 'volvo','bmw' 之类的字符串转换为整数,首先将其转换为数据帧,然后将其传递给 pandas.get_dummies()
df = DataFrame.from_csv("myFile.csv")
df_transform = pd.get_dummies( df )
print( df_transform )
Better alternative: passing a dictionary to map() of a pandas series (df.myCol) (by specifying the column brand for example)
更好的选择:将字典传递给 Pandas 系列 (df.myCol) 的 map()(例如通过指定列品牌)
df.brand = df.brand.map( {'volvo':0 , 'bmw':1, 'audi':2} )
回答by tsando
You can also do this with pandas rename_categories. You would first need to define the column as dtype="category"e.g.
您也可以使用 pandas 执行此操作rename_categories。您首先需要将列定义为dtype="category"例如
In [66]: s = pd.Series(["a","b","c","a"], dtype="category")
In [67]: s
Out[67]:
0 a
1 b
2 c
3 a
dtype: category
Categories (3, object): [a, b, c]
and then rename them:
然后重命名它们:
In [70]: s.cat.rename_categories([1,2,3])
Out[70]:
0 1
1 2
2 3
3 1
dtype: category
Categories (3, int64): [1, 2, 3]
You can also pass a dict-like object to map the renaming, e.g.:
您还可以传递一个类似 dict 的对象来映射重命名,例如:
In [72]: s.cat.rename_categories({1: 'x', 2: 'y', 3: 'z'})
回答by Akash Kandpal
When no of features are not much :
当功能不多时:
mymap = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}
df.applymap(lambda s: mymap.get(s) if s in mymap else s)
When it's not possible manually :
当无法手动时:
temp_df2 = pd.DataFrame({'data': data.data.unique(), 'data_new':range(len(data.data.unique()))})# create a temporary dataframe
data = data.merge(temp_df2, on='data', how='left')# Now merge it by assigning different values to different strings.
回答by Chapo
df.replace(to_replace=['set', 'test'], value=[1, 2])from @Ishnark comment on accepted answer.
df.replace(to_replace=['set', 'test'], value=[1, 2])来自@Ishnark 对已接受答案的评论。
回答by Kapilfreeman
The simplest way to replace any value in the dataframe:
替换数据框中任何值的最简单方法:
df=df.replace(to_replace="set",value="1")
df=df.replace(to_replace="test",value="2")
Hope this will help.
希望这会有所帮助。

