pandas 匹配列名时出现值错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25144315/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
value error while matching column names
提问by shantanuo
The following code shows an error. But it works if I remove usercols parameter.
下面的代码显示了一个错误。但是如果我删除 usercols 参数它会起作用。
from StringIO import StringIO
import pandas as pd
u_cols = ['page_id','web_id']
audit_trail = StringIO('''
page_id | web_id
3|0
7|3
11|4
15|5
19|6
''')
df = pd.read_csv(audit_trail, sep="|", usecols = u_cols )
ValueError: Passed header names mismatches usecols
ValueError:传递的标头名称与 usecols 不匹配
I need to use u_cols list because the column headings are being generated dynamically.
我需要使用 u_cols 列表,因为列标题是动态生成的。
回答by shantanuo
"names" should be used instead of "usecolmns"
应该使用“名称”而不是“usecolmns”
from StringIO import StringIO
import pandas as pd
u_cols = ['page_id','web_id']
audit_trail = StringIO('''
page_id | web_id
3|0
7|3
11|4
15|5
19|6
''')
df11 = pd.read_csv(audit_trail, sep="|", names = u_cols )
回答by ZJS
This is because of the white space next to the | seperator. When you run pd.read_csv(audit_trail,sep="|")you actually have the columns ['page_id(whitespace)','(whitespace)web_id'] instead of ['page_id','web_id'].
这是因为 | 旁边的空白区域 分隔符。当您运行时,pd.read_csv(audit_trail,sep="|")您实际上拥有列 ['page_id(whitespace)','(whitespace)web_id'] 而不是 ['page_id','web_id']。
I would suggest passing the following regex pattern as your seperator \s*\|\s*, which will remove any whitespace around the | seperator. Here is the full solution...
我建议将以下正则表达式模式作为分隔符传递\s*\|\s*,这将删除 | 周围的任何空格。分隔符。这是完整的解决方案......
u_cols = ['page_id','web_id']
"""page_id | web_id
3|0
7|3
11|4
15|5
19|6"""
df = pd.read_csv(StringIO(s),sep="\s*\|\s*",usecols = u_cols)
output
输出
page_id web_id
0 3 0
1 7 3
2 11 4
3 15 5
4 19 6

