python pandas用数字替换数据框中的字符串

Question

提问by jonas

Is there anyway to use the mapping function or something better to replace values in an entire dataframe?

无论如何使用映射函数或更好的东西来替换整个数据帧中的值？

I only know how to perform the mapping on series.

我只知道如何在系列上执行映射。

I would like to replace the strings in the 'tesst' and 'set' column with a number for example set = 1, test =2

我想用数字替换“testst”和“set”列中的字符串，例如 set = 1, test =2

Here is a example of my dataset: (Original dataset is very large)

这是我的数据集的示例：（原始数据集非常大）

ds_r
  respondent  brand engine  country  aware  aware_2  aware_3  age tesst   set
0          a  volvo      p      swe      1        0        1   23   set   set
1          b  volvo   None      swe      0        0        1   45   set   set
2          c    bmw      p       us      0        0        1   56  test  test
3          d    bmw      p       us      0        1        1   43  test  test
4          e    bmw      d  germany      1        0        1   34   set   set
5          f   audi      d  germany      1        0        1   59   set   set
6          g  volvo      d      swe      1        0        0   65  test   set
7          h   audi      d      swe      1        0        0   78  test   set
8          i  volvo      d       us      1        1        1   32   set   set

Final result should be

最终结果应该是

 ds_r
  respondent  brand engine  country  aware  aware_2  aware_3  age  tesst  set
0          a  volvo      p      swe      1        0        1   23      1    1
1          b  volvo   None      swe      0        0        1   45      1    1
2          c    bmw      p       us      0        0        1   56      2    2
3          d    bmw      p       us      0        1        1   43      2    2
4          e    bmw      d  germany      1        0        1   34      1    1
5          f   audi      d  germany      1        0        1   59      1    1
6          g  volvo      d      swe      1        0        0   65      2    1
7          h   audi      d      swe      1        0        0   78      2    1
8          i  volvo      d       us      1        1        1   32      1    1

grateful for advise,

感谢指教，

Answer 1

采纳答案by Dan Allan

What about DataFrame.replace?

怎么样DataFrame.replace？

In [9]: mapping = {'set': 1, 'test': 2}

In [10]: df.replace({'set': mapping, 'tesst': mapping})
Out[10]: 
   Unnamed: 0 respondent  brand engine  country  aware  aware_2  aware_3  age  \
0           0          a  volvo      p      swe      1        0        1   23   
1           1          b  volvo   None      swe      0        0        1   45   
2           2          c    bmw      p       us      0        0        1   56   
3           3          d    bmw      p       us      0        1        1   43   
4           4          e    bmw      d  germany      1        0        1   34   
5           5          f   audi      d  germany      1        0        1   59   
6           6          g  volvo      d      swe      1        0        0   65   
7           7          h   audi      d      swe      1        0        0   78   
8           8          i  volvo      d       us      1        1        1   32   

  tesst set  
0     2   1  
1     1   2  
2     2   1  
3     1   2  
4     2   1  
5     1   2  
6     2   1  
7     1   2  
8     2   1

As @Jeff pointed out in the comments, in pandas versions < 0.11.1, manually tack .convert_objects()onto the end to properly convert tesst and set to int64columns, in case that matters in subsequent operations.

正如@Jeff 在评论中指出的那样，在 Pandas 版本 < 0.11.1 中，手动添加.convert_objects()到末尾以正确转换测试并设置为int64列，以防在后续操作中很重要。

Answer 2

回答by bdiamante

You can use the applymapDataFrame function to do this:

您可以使用applymapDataFrame 函数来执行此操作：

In [26]: df = DataFrame({"A": [1,2,3,4,5], "B": ['a','b','c','d','e'],
                         "C": ['b','a','c','c','d'], "D": ['a','c',7,9,2]})
In [27]: df
Out[27]:
   A  B  C  D
0  1  a  b  a
1  2  b  a  c
2  3  c  c  7
3  4  d  c  9
4  5  e  d  2

In [28]: mymap = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}

In [29]: df.applymap(lambda s: mymap.get(s) if s in mymap else s)
Out[29]:
   A  B  C  D
0  1  1  2  1
1  2  2  1  3
2  3  3  3  7
3  4  4  3  9
4  5  5  4  2

Answer 3

回答by Brandon

I know this is old, but adding for those searching as I was. Create a dataframe in pandas, df in this code

我知道这是旧的，但为那些像我一样搜索的人添加。在 Pandas 中创建一个数据框，在此代码中为 df

ip_addresses = df.source_ip.unique()
ip_dict = dict(zip(ip_addresses, range(len(ip_addresses))))

That will give you a dictionary map of the ip addresses without having to write it out.

这将为您提供 IP 地址的字典映射，而无需将其写出来。

Answer 4

回答by Samer Ayoub

To convert Strings like 'volvo','bmw' into integers first convert it to a dataframe then pass it to pandas.get_dummies()

要将 'volvo','bmw' 之类的字符串转换为整数，首先将其转换为数据帧，然后将其传递给 pandas.get_dummies()

  df  = DataFrame.from_csv("myFile.csv")
  df_transform = pd.get_dummies( df )
  print( df_transform )

Better alternative: passing a dictionary to map() of a pandas series (df.myCol) (by specifying the column brand for example)

更好的选择：将字典传递给 Pandas 系列 (df.myCol) 的 map()（例如通过指定列品牌）

df.brand = df.brand.map( {'volvo':0 , 'bmw':1, 'audi':2} )

Answer 5

回答by tsando

You can also do this with pandas rename_categories. You would first need to define the column as dtype="category"e.g.

您也可以使用 pandas 执行此操作rename_categories。您首先需要将列定义为dtype="category"例如

In [66]: s = pd.Series(["a","b","c","a"], dtype="category")

In [67]: s
Out[67]: 
0    a
1    b
2    c
3    a
dtype: category
Categories (3, object): [a, b, c]

and then rename them:

然后重命名它们：

In [70]: s.cat.rename_categories([1,2,3])
Out[70]: 
0    1
1    2
2    3
3    1
dtype: category
Categories (3, int64): [1, 2, 3]

You can also pass a dict-like object to map the renaming, e.g.:

您还可以传递一个类似 dict 的对象来映射重命名，例如：

In [72]: s.cat.rename_categories({1: 'x', 2: 'y', 3: 'z'})

Answer 6

回答by Akash Kandpal

When no of features are not much :

当功能不多时：

mymap = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}
df.applymap(lambda s: mymap.get(s) if s in mymap else s)

When it's not possible manually :

当无法手动时：

temp_df2 = pd.DataFrame({'data': data.data.unique(), 'data_new':range(len(data.data.unique()))})# create a temporary dataframe 
data = data.merge(temp_df2, on='data', how='left')# Now merge it by assigning different values to different strings.

Answer 7

回答by Chapo

df.replace(to_replace=['set', 'test'], value=[1, 2])from @Ishnark comment on accepted answer.

df.replace(to_replace=['set', 'test'], value=[1, 2])来自@Ishnark 对已接受答案的评论。

Answer 8

回答by Kapilfreeman

The simplest way to replace any value in the dataframe:

替换数据框中任何值的最简单方法：

df=df.replace(to_replace="set",value="1")
df=df.replace(to_replace="test",value="2")

Hope this will help.

希望这会有所帮助。

python pandas用数字替换数据框中的字符串

提问by jonas

采纳答案by Dan Allan

回答by bdiamante

回答by Brandon

回答by Samer Ayoub

回答by tsando

回答by Akash Kandpal

回答by Chapo

回答by Kapilfreeman

相关推荐

最近更新

标签

python pandas用数字替换数据框中的字符串

提问by jonas

采纳答案by Dan Allan

回答by bdiamante

回答by Brandon

回答by Samer Ayoub

回答by tsando

回答by Akash Kandpal

回答by Chapo

回答by Kapilfreeman

相关推荐

Python 规范化熊猫数据框的列

Python循环导入？

为什么在使用 json.dumps 时，python dict 的 int 键会变成字符串？

python中的“语句结束”字符串语法错误

相关推荐

最近更新

标签