pandas - 将字符串转换为字符串列表

Question

提问by Fabio Lamanna

I have this 'file.csv' file to read with pandas:

我有这个“file.csv”文件可以用Pandas读取：

Title|Tags
T1|"[Tag1,Tag2]"
T1|"[Tag1,Tag2,Tag3]"
T2|"[Tag3,Tag1]"

using

使用

df = pd.read_csv('file.csv', sep='|')

the output is:

输出是：

  Title              Tags
0    T1       [Tag1,Tag2]
1    T1  [Tag1,Tag2,Tag3]
2    T2       [Tag3,Tag1]

I know that the column Tagsis a full string, since:

我知道该列Tags是一个完整的字符串，因为：

In [64]: df['Tags'][0][0]
Out[64]: '['

I need to read it as a list of strings like ["Tag1","Tag2"]. I tried the solution provided in thisquestion but no luck there, since I have the [and ]characters that actually mess up the things.

我需要将它作为一个字符串列表来阅读，比如["Tag1","Tag2"]. 我尝试了这个问题中提供的解决方案，但没有运气，因为我有实际上搞砸了事情的[和]字符。

The expecting output should be:

预期的输出应该是：

In [64]: df['Tags'][0][0]
Out[64]: 'Tag1'

Answer 1

回答by Mike Müller

You can split the string manually:

您可以手动拆分字符串：

>>> df['Tags'] = df.Tags.apply(lambda x: x[1:-1].split(','))
>>> df.Tags[0]
['Tag1', 'Tag2']

Answer 2

回答by YOBEN_S

Or

或者

df.Tags=df.Tags.str[1:-1].str.split(',').tolist()

Answer 3

回答by Scott Boston

You can convert the string to a list using stripand split.

您可以使用strip和将字符串转换为列表split。

df_out = df.assign(Tags=df.Tags.str.strip('[]').str.split(','))

df_out.Tags[0][0]

Output:

输出：

'Tag1'

Answer 4

回答by RHSmith159

I think you could use the json module.

我认为您可以使用 json 模块。

import json
import pandas

df = pd.read_csv('file.csv', sep='|')
df['Tags'] = df['Tags'].apply(lambda x: json.loads(x))

So this will load your dataframe as before, then apply a lambda function to each of the items in the Tagscolumn. The lambda function calls json.loads()which converts the string representation of the list to an actual list.

因此，这将像以前一样加载您的数据框，然后将 lambda 函数应用于Tags列中的每个项目。lambda 函数调用json.loads()将列表的字符串表示形式转换为实际列表。

Answer 5

回答by Veggiet

Your df['Tags']appears to be a list of strings. If you print that list you should get ["[tag1,tag2]","[Tag1,Tag2,Tag3]","[Tag3,Tag1]"]this is why when you call the first element of the first element you're actually getting the first single character of the string, rather than what you want.

您df['Tags']似乎是一个字符串列表。如果您打印该列表，您应该得到["[tag1,tag2]","[Tag1,Tag2,Tag3]","[Tag3,Tag1]"]这就是为什么当您调用第一个元素的第一个元素时，您实际上获得的是字符串的第一个单个字符，而不是您想要的。

You either need to parse that string afterward. Performing something like

您要么需要在之后解析该字符串。执行类似

df['Tags'][0] = df['Tags'][0].split(',')

But as you saw in your cited example this will give you a list that looks like

但是正如您在引用的示例中看到的那样，这将为您提供一个看起来像的列表

in: df['Tags'][0][0] 
out: '[tag1'`

What you need is a way to parse the string editing out multiple characters. You can use a simple regex expression to do this. Something like:

您需要的是一种解析字符串并编辑出多个字符的方法。您可以使用简单的正则表达式来执行此操作。就像是：

 import re
 df['Tags'][0] = re.findall(r"[\w']+", df['Tags'][0])
 print(df['Tags'][0][0])

will print:

将打印：

 'tag1'

Using the other answer involving Pandas converters you might write a converter like this:

使用涉及 Pandas 转换器的其他答案，您可能会编写这样的转换器：

 def clean(seq_string):
      return re.findall(r"[\w']+", seq_string)

If you don't know regex, they can be quite powerful, but also unpredictable if you're not sure on the content of your input strings. The expression used here r"[\w']+"will match any common word character alpha-numeric and underscores and treat everything else as a point for re.findallto split the list at.

如果您不了解正则表达式，它们可能非常强大，但如果您不确定输入字符串的内容，它们也会变得不可预测。此处使用的表达式r"[\w']+"将匹配任何常见单词字符字母数字和下划线，并将其他所有内容视为re.findall拆分列表的点。

pandas - 将字符串转换为字符串列表

提问by Fabio Lamanna

回答by Mike Müller

回答by YOBEN_S

回答by Scott Boston

回答by RHSmith159

回答by Veggiet

相关推荐

最近更新

标签

pandas - 将字符串转换为字符串列表

提问by Fabio Lamanna

回答by Mike Müller

回答by YOBEN_S

回答by Scott Boston

回答by RHSmith159

回答by Veggiet

相关推荐

Python Pandas - 在 Groupby DF 上将列转换为百分比

Pandas DataFrame：如何在多个条件下选择行？

如何在 Jupyter 中为 Pandas 修复 tqdm progress_apply？

pandas 替换熊猫数据框中的特殊字符

相关推荐

最近更新

标签