从 Pandas 数据框中的单元格中提取字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32896387/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:57:29  来源:igfitidea点击:

Extract string from cell in Pandas dataframe

pythonnumpypandas

提问by robahall

I have a data frame, df:

我有一个数据框,df

         Filename         Weight
0  '\file path\file.txt'    NaN
1  '\file path\file.txt'    NaN
2  '\file path\file.txt'    NaN

and I have an function where I input the file name and it extracts a float value for me from the file. What I want is to call the file path from Filenamefrom each row in dfinto my function and then output the data into the Weightcolumn. My current code is:

我有一个函数,我输入文件名,它从文件中为我提取一个浮点值。我想要的是将文件路径从Filename每一行调用df到我的函数中,然后将数据输出到Weight列中。我目前的代码是:

df['Weight'] = df['Weight'].apply(x_wgt_pct(df['filename'].to_string()), axis = 1)

My error is:

我的错误是:

pandas\parser.pyx in pandas.parser.TextReader.__cinit__ (pandas\parser.c:3173)()

pandas\parser.pyx in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:5912)()

IOError: File 0      file0.txt
1      file1.txt
2      file2.txt
3      file3.txt does not exist

Not sure whether this error is bc it is calling all the file paths simultaneously as a string or I did not input the file path correctly.

不确定此错误是否是 bc 它同时将所有文件路径作为字符串调用,或者我没有正确输入文件路径。

采纳答案by Andy Hayden

to_stringcreates a string from the column, which isn't what you want:

to_string从列中创建一个字符串,这不是您想要的:

In [11]: df['Filename'].to_string()
Out[11]: "0  '\file    path\file.txt'\n1  '\file    path\file.txt'\n2  '\file    path\file.txt'"

Assuming that x_wgt_pctis the function that takes a filepath and returns a float... you can loop through the entries:

假设这x_wgt_pct是采用文件路径并返回浮点数的函数......您可以遍历条目:

for i, f in enumerate(df["Filename"]):
    weight = x_wgt_pct(f)  # Note: you may have to slice off the 's i.e. f[1:-1]
    df.ix[i, "Weight"] = weight

Note: some further care has to be taken if you have duplicate rows indices.

注意:如果您有重复的行索引,则必须进一步小心。