pandas 匹配列名时出现值错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25144315/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:19:51  来源:igfitidea点击:

value error while matching column names

pandas

提问by shantanuo

The following code shows an error. But it works if I remove usercols parameter.

下面的代码显示了一个错误。但是如果我删除 usercols 参数它会起作用。

from StringIO import StringIO
import pandas as pd

u_cols = ['page_id','web_id']
audit_trail = StringIO('''
page_id | web_id
3|0
7|3
11|4
15|5
19|6
''')

df = pd.read_csv(audit_trail, sep="|", usecols = u_cols  )

ValueError: Passed header names mismatches usecols

ValueError:传递的标头名称与 usecols 不匹配

I need to use u_cols list because the column headings are being generated dynamically.

我需要使用 u_cols 列表,因为列标题是动态生成的。

回答by shantanuo

"names" should be used instead of "usecolmns"

应该使用“名称”而不是“usecolmns”

from StringIO import StringIO
import pandas as pd

u_cols = ['page_id','web_id']
audit_trail = StringIO('''
page_id | web_id
3|0
7|3
11|4
15|5
19|6
''')

df11 = pd.read_csv(audit_trail, sep="|", names = u_cols  )

回答by ZJS

This is because of the white space next to the | seperator. When you run pd.read_csv(audit_trail,sep="|")you actually have the columns ['page_id(whitespace)','(whitespace)web_id'] instead of ['page_id','web_id'].

这是因为 | 旁边的空白区域 分隔符。当您运行时,pd.read_csv(audit_trail,sep="|")您实际上拥有列 ['page_id(whitespace)','(whitespace)web_id'] 而不是 ['page_id','web_id']。

I would suggest passing the following regex pattern as your seperator \s*\|\s*, which will remove any whitespace around the | seperator. Here is the full solution...

我建议将以下正则表达式模式作为分隔符传递\s*\|\s*,这将删除 | 周围的任何空格。分隔符。这是完整的解决方案......

u_cols = ['page_id','web_id']

"""page_id | web_id
3|0
7|3
11|4
15|5
19|6"""

df = pd.read_csv(StringIO(s),sep="\s*\|\s*",usecols = u_cols)

output

输出

   page_id  web_id
0        3       0
1        7       3
2       11       4
3       15       5
4       19       6