如何找到所有（）正则表达式序列到 Pandas 数据帧？

Question

提问by J.Do

I am extracting some patterns with pandas findallfunction. However, I have several regular expressions. This, how can I findallNregular expressions with pandas?.

我正在使用 pandas findall函数提取一些模式。但是，我有几个正则表达式。这，我怎么能用findallNPandas正则表达式？

For example, lets say that I would like to extract the all the numbers and all the dates inside an specific column:

例如，假设我想提取特定列中的所有数字和所有日期：

In:

在：

dfs = pd.DataFrame(data={'c1': ['This dataset 11/12/98 contains 5,000 rows, which were sampled from a 500,000 11/12/12 row dataset spanning the same time period. Throughout these analyses', 

                                'the number of events you count will be about 100 times smaller than they 11/12/78 actually were, but the 01/12/11 proportions of events will still generally be reflective that larger dataset. In this case, a sample is fine because our purpose is to learn methods of data analysis with Python, not to create 100% accurate recommendations to Watsi.']})

dfs

Out:

出去：

    c1
0   This dataset 11/12/98 contains 5,000 rows, whi...
1   the number of events you count will be about 1...

I tried to, but I am getting the following error:

我尝试过，但出现以下错误：

In:

在：

dfs['patterns'] = dfs['c1'].str.findall([r'\d+',r'(\d+/\d+/\d+)']).apply(', '.join)

dfs

Out:

出去：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-64-af2969e06a61> in <module>()
----> 1 dfs['patterns'] = dfs['c1'].str.findall([r'\d+',r'(\d+/\d+/\d+)']).apply(', '.join)
      2 dfs

/usr/local/lib/python3.5/site-packages/pandas/core/strings.py in wrapper2(self, pat, flags, **kwargs)
   1268 
   1269     def wrapper2(self, pat, flags=0, **kwargs):
-> 1270         result = f(self._data, pat, flags=flags, **kwargs)
   1271         return self._wrap_result(result)
   1272 

/usr/local/lib/python3.5/site-packages/pandas/core/strings.py in str_findall(arr, pat, flags)
    827     extractall : returns DataFrame with one column per capture group
    828     """
--> 829     regex = re.compile(pat, flags=flags)
    830     return _na_map(regex.findall, arr)
    831 

/usr/local/Cellar/python3/3.5.2_2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/re.py in compile(pattern, flags)
    222 def compile(pattern, flags=0):
    223     "Compile a regular expression pattern, returning a pattern object."
--> 224     return _compile(pattern, flags)
    225 
    226 def purge():

/usr/local/Cellar/python3/3.5.2_2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/re.py in _compile(pattern, flags)
    279     # internal: compile pattern
    280     try:
--> 281         p, loc = _cache[type(pattern), pattern, flags]
    282         if loc is None or loc == _locale.setlocale(_locale.LC_CTYPE):
    283             return p

TypeError: unhashable type: 'list'

Therefore how can I "stack", "nest" or apply several regex with findallfunction?. What I expect as an output is the resolution of each regular expression separated by ,in a single column:

因此，我如何“堆叠”、“嵌套”或应用多个具有findall函数的正则表达式？我期望作为输出的是每个正则表达式的分辨率，由,单个列分隔：

   col
0  '11/12/98', '5', '000', '500', '000', '11/12/12'
1  '100', '11/12/78', '01/12/11', '100'

UPDATE

更新

I tried to:

我试过了：

dfs['patterns'] = dfs['c1'].str.map(findall(),[r'\d+',r'(\d+/\d+/\d+)']).apply(', '.join)
dfs

Answer 1

采纳答案by su79eu7k

Still not clear your desired output. But please check below code.

仍然没有清除您想要的输出。但请检查下面的代码。

dfs['patterns'] = dfs['c1'].str.findall(r'\d+\/\d+\/\d+|\d+')
print dfs['patterns'].sum()

['11/12/98', '5', '000', '500', '000', '11/12/12', '100', '11/12/78', '01/12/11', '100']

如何找到所有（）正则表达式序列到 Pandas 数据帧？

提问by J.Do

采纳答案by su79eu7k

相关推荐

最近更新

标签

如何找到所有（）正则表达式序列到 Pandas 数据帧？

提问by J.Do

采纳答案by su79eu7k

相关推荐

用列表中的值替换 pandas.DataFrame 的 NaN 值

Python：pandas apply 与 map

pandas 从字符串列表创建熊猫数据框

pandas 从熊猫数据框中选择排序组的第一行

相关推荐

最近更新

标签