pandas 使用熊猫将字符串拆分为数字和文本

Question

提问by Paul T.

The Setup

设置

I have a pandasdataframe that contains a column 'iso' containing chemical isotope symbols, such as '4He', '16O', '197Au'. I want to label many (but not all) isotopes on a plot using the annotate()function in matplotlib. The label format should have the atomic mass in superscript. I can do this with the LaTeX style formatting:

我有一个pandas数据框，其中包含一列“iso”，其中包含化学同位素符号，例如“4He”、“16O”、“197Au”。我想使用中的annotate()函数在绘图上标记许多（但不是全部）同位素matplotlib。标签格式应该在上标中有原子质量。我可以使用 LaTeX 样式格式来做到这一点：

axis.annotate('$^{4}$He', xy=(x, y), xycoords='data')

I could write dozens of annotate()statements like the one above for each isotope I want to label, but I'd rather automate.

我可以annotate()为我想要标记的每个同位素编写几十个类似上面的语句，但我宁愿自动化。

The Question

问题

How can I extract the isotope number and name from my iso column?

如何从我的 iso 列中提取同位素编号和名称？

With those pieces extracted I can make the labels. Lets say we dump them into the variables Numand Sym. Now I can loop over my isotopes and do something like this:

提取这些碎片后，我可以制作标签。假设我们将它们转储到变量Num和Sym. 现在我可以遍历我的同位素并执行如下操作：

for i in list_of_isotopes:
  (Num, Sym) = df[df.iso==i].iso.str.MISSING_STRING_METHOD(???)
  axis.annotate('$^{%s}$%s' %(Num, Sym), xy=(x[Num], y[Num]), xycoords='data')

Presumably, there is a pandasstring methods that I can drop into the above. But I'm having trouble coming up with a solution. I've been trying split()and extract()with a few different patterns, but can't get the desired effect.

大概有一个pandas字符串方法可以放到上面。但是我在想出解决方案时遇到了麻烦。我一直在尝试split()并extract()使用几种不同的模式，但无法获得预期的效果。

Answer 1

回答by Romain

This is my answer using split. The regexp used can be improved, I'm very bad at that sort of things :-)

这是我使用split. 使用的正则表达式可以改进，我很不擅长这类事情:-)

(\d+)stands for the integers, and ([A-Za-z]+)stands for the strings.

(\d+)代表整数，([A-Za-z]+)代表字符串。

df = pd.DataFrame({'iso': ['4He', '16O', '197Au']})
result = df['iso'].str.split('(\d+)([A-Za-z]+)', expand=True)
result = result.loc[:,[1,2]]
result.rename(columns={1:'x', 2:'y'}, inplace=True)
print(result)

Produces

生产

Answer 2

回答by taesu

I'd use simple string manipulation, without the hassle of regex.

我会使用简单的字符串操作，没有正则表达式的麻烦。

isotopes = ['4He', '16O', '197Au']
def get_num(isotope):
    return filter(str.isdigit, isotope)

def get_sym(isotope):
    return isotope.replace(get_num(isotope),'')

def get_num_sym(isotope):
    return (get_num(isotope),get_sym(isotope))


for isotope in isotopes:
    num,sym = get_num_sym(isotope)
    print num,sym

Answer 3

回答by albert

To extract the number and the element of an isotope symbol you can use a regular expression(short: regex) in combination with Python's remodule. The regex looks for number digits and after that it looks for characters which are grouped and accessible using the group's name. If the regex matches you can extract the data and .format()the desired annotation string:

要提取同位素符号的数字和元素，您可以将正则表达式（简称：regex）与 Python 的re模块结合使用。正则表达式查找数字，然后查找使用组名分组和访问的字符。如果正则表达式匹配，您可以提取数据和.format()所需的注释字符串：

#!/usr/bin/env python3
# coding: utf-8

import re

iso_num = '16O'

preg = re.compile('^(?P<num>[0-9]*)(?P<element>[A-Za-z]*)$')
m = preg.match(iso_num)

if m:
    num = m.group('num')
    element = m.group('element')

    note = '$^{}${}'.format(num, element)

    # axis.annotate(note, xy=(x, y), xycoords='data')

Answer 4

回答by Fei Yuan

Did you tried strip(), maybe you can consider this:

你试过了吗strip()，也许你可以考虑一下：

import string

for i in list_of_isotopes:
  Num = df[df.iso==i].iso.str.strip(string.ascii_letters)
  Sym = df[df.iso==i].iso.str.strip(string.digits)
  axis.annotate('$^%s$%s' %(Num, Sym), xy=(x[Num], y[Num]), xycoords='data')

pandas 使用熊猫将字符串拆分为数字和文本

提问by Paul T.

The Setup

设置

The Question

问题

回答by Romain

回答by taesu

回答by albert

回答by Fei Yuan

相关推荐

最近更新

标签

pandas 使用熊猫将字符串拆分为数字和文本

提问by Paul T.

The Setup

设置

The Question

问题

回答by Romain

回答by taesu

回答by albert

回答by Fei Yuan

相关推荐

5000 万行的 Pandas groupby+transform 需要 3 小时

pandas 如何通过广播将pandas数据帧与numpy数组相乘

将 Pandas groupby 数据行值重塑为列标题

pandas 熊猫 - 绘制排序列以增加整数索引

相关推荐

最近更新

标签