Pandas 根据列中的值将字符串映射到 int

Question

提问by Vijay P R

I have a data frame with two columns :

我有一个包含两列的数据框：

state  total_sales
AL      16714
AR      6498
AZ      107296
CA      33717

Now I want to map the strings in state column to int from 1 to N(where N is the no of rows,here 4 ) based on increasing order of values in total_sales . Result should be stored in another column (say label). That is, wanted a result like this :

现在我想根据 total_sales 中值的递增顺序将 state 列中的字符串从 1 映射到 int 从 1 到 N（其中 N 是行数，这里是 4 ）。结果应存储在另一列（例如标签）中。也就是说，想要这样的结果：

state  total_sales label
AL      16714         3
AR      6498          4
AZ      107296        1
CA      33717         2

Please suggest a vectorised implementation .

请建议一个矢量化的实现。

Answer 1

采纳答案by jezrael

You can use rankwith cast to int:

您可以rank与 cast 一起使用int：

df['label'] = df['total_sales'].rank(method='dense', ascending=False).astype(int)
print (df)
  state  total_sales  label
0    AL        16714      3
1    AR         6498      4
2    AZ       107296      1
3    CA        33717      2

Answer 2

回答by Magnus Persson

After running into the same issue while taking care of Fitbit sleep stages I worked out another solution (where I can control the mapping to ints). Here I use Pandas way of representing categorical variables. The following is a simple example showing the solution to your MWE.

在处理 Fitbit 睡眠阶段时遇到同样的问题后，我制定了另一个解决方案（我可以控制到整数的映射）。这里我使用 Pandas 的方式来表示分类变量。以下是一个简单的示例，展示了您的 MWE 的解决方案。

df = pd.DataFrame(data={'state':['AL','AR','AZ','CA'] , 
                        'total_sales':[16714,6498,107296,33717] })

Then we simply ask for the "state" column out but as a categorical variable:

然后我们简单地要求“状态”列，但作为分类变量：

df['label'] = df.state.astype("category").cat.codes
print(df)
  state  total_sales  label
0    AL        16714      0
1    AR         6498      1
2    AZ       107296      2
3    CA        33717      3

If you need to control the sequence (e.g. if it is not ordered the same way as they appear) you can supply a list of allowed categories, and in what order:

如果您需要控制顺序（例如，如果顺序与它们出现的顺序不同），您可以提供一个允许类别的列表，以及按什么顺序：

df_cats = ['CA','AZ' ,'AL','AR']
df['label'] = df.state.astype("category",  categories=df_cats).cat.codes
print(df)
  state  total_sales  label
0    AL        16714      2
1    AR         6498      3
2    AZ       107296      1
3    CA        33717      0

Any label not in the category list will yield "-1". There's also a keyword ordered=Truethat you can use, but I don't think it matters here. For more information about Pandas categorical data dtype see: https://pandas.pydata.org/pandas-docs/stable/categorical.html

任何不在类别列表中的标签都将产生“-1”。还有一个ordered=True您可以使用的关键字，但我认为这并不重要。有关 Pandas 分类数据 dtype 的更多信息，请参阅：https://pandas.pydata.org/pandas-docs/stable/categorical.html

Pandas 根据列中的值将字符串映射到 int

提问by Vijay P R

采纳答案by jezrael

回答by Magnus Persson

相关推荐

最近更新

标签

Pandas 根据列中的值将字符串映射到 int

提问by Vijay P R

采纳答案by jezrael

回答by Magnus Persson

相关推荐

pandas 如何采取地板和上限以去除异常值

pandas 根据 2 个现有列的值将新列分配（添加）到 dask 数据框 - 涉及条件语句

pandas Python 将时间戳与输入时间进行比较

Pandas - 将派生日期时间转换为整数

相关推荐

最近更新

标签