pandas 如何将 DataFrame 从 Stack Overflow 复制/粘贴到 Python 中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31610889/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to copy/paste DataFrame from Stack Overflow into Python
提问by LondonRob
In questionsand answers, users very often post an example DataFramewhich their question/answer works with:
在问题和答案,用户经常发布一个例子DataFrame,他们的提问/回答与工作:
In []: x
Out[]:
bar foo
0 4 1
1 5 2
2 6 3
It'd be really useful to be able to get this DataFrameinto my Python interpreter so I can start debugging the question, or testing the answer.
能够将其DataFrame输入到我的 Python 解释器中真的很有用,这样我就可以开始调试问题或测试答案。
How can I do this?
我怎样才能做到这一点?
回答by LondonRob
Pandas is written by people that really know what people want to do.
Pandas 是由真正知道人们想要做什么的人编写的。
Since version 0.13there's a function pd.read_clipboardwhich is absurdly effective at making this "just work".
从版本开始,0.13有一个功能pd.read_clipboard在使这个“正常工作”方面非常有效。
Copy and paste the part of the code in the question that starts bar foo, (i.e. the DataFrame) and do this in a Python interpreter:
复制并粘贴以 , 开头的问题中的代码部分bar foo(即 DataFrame),然后在 Python 解释器中执行此操作:
In [53]: import pandas as pd
In [54]: df = pd.read_clipboard()
In [55]: df
Out[55]:
bar foo
0 4 1
1 5 2
2 6 3
Caveats
注意事项
- Don't include the iPython
InorOutstuff or it won't work - If you have a named index, you currently need to add
engine='python'(see this issueon GitHub). The 'c' engine is currently broken when the index is named. - It's not brilliant at MultiIndexes:
- 不要包含 iPython
In或其他Out东西,否则它将无法工作 - 如果您有命名索引,则当前需要添加
engine='python'(请参阅GitHub 上的此问题)。命名索引时,“c”引擎当前已损坏。 - 它在 MultiIndexes 上并不出色:
Try this:
尝试这个:
0 1 2
level1 level2
foo a 0.518444 0.239354 0.364764
b 0.377863 0.912586 0.760612
bar a 0.086825 0.118280 0.592211
which doesn't work at all, or this:
这根本不起作用,或者:
0 1 2
foo a 0.859630 0.399901 0.052504
b 0.231838 0.863228 0.017451
bar a 0.422231 0.307960 0.801993
Which works, but returns something totally incorrect!
哪个有效,但返回的东西完全不正确!
回答by tel
pd.read_clipboard()is nifty. However, if you're writing code in a script or a notebook (and you want your code to work in the future) it's not a great fit. Here's an alternative way to copy/paste the output of a dataframe into a new dataframe object that ensures that dfwill outlive the contents of your clipboard:
pd.read_clipboard()很漂亮。但是,如果您在脚本或笔记本中编写代码(并且您希望您的代码在未来工作),则它不太适合。这是将数据帧的输出复制/粘贴到新的数据帧对象中的另一种方法,以确保它df比剪贴板的内容更有效:
# py3 only, see below for py2
import pandas as pd
from io import StringIO
d = '''0 1 2 3 4
A Y N N Y
B N Y N N
C N N N N
D Y Y N Y
E N Y Y Y
F Y Y N Y
G Y N N Y'''
df = pd.read_csv(StringIO(d), sep='\s+')
A few notes:
一些注意事项:
- The triple-quoted string preserves the newlines in the output.
StringIOwraps the output in a file-like object, whichread_csvrequires.- Setting
septo\s+makes it so that each contiguous block of whitespace is treated as a single delimiter.
- 三引号字符串保留输出中的换行符。
StringIO将输出包装在一个类似文件的对象中,这read_csv需要。- 设置
sep为\s+使每个连续的空白块都被视为单个分隔符。
update
更新
The above answer is Python 3 only. If you're stuck in Python 2, replace the import line:
上面的答案仅适用于 Python 3。如果您被困在 Python 2 中,请替换导入行:
from io import StringIO
with instead:
相反:
from StringIO import StringIO
If you have an old version of pandas(v0.24or older) there's an easy way to write a Py2/Py3 compatible version of the above code:
如果您有pandas(v0.24或更旧)的旧版本,则有一种简单的方法可以编写上述代码的 Py2/Py3 兼容版本:
import pandas as pd
d = ...
df = pd.read_csv(pd.compat.StringIO(d), sep='\s+')
The newest versions of pandashave dropped the compatmodule along with Python 2 support.
的最新版本pandas已删除该compat模块以及 Python 2 支持。
回答by Harvey
If you are copy-pasting from CSV file which has standard entries like this:
如果您从 CSV 文件中复制粘贴,该文件具有如下标准条目:
2016,10,M,0600,0610,13,1020,24
2016,3,F,0300,0330,21,6312,1
2015,4,M,0800,0830,8,7112,30
2015,10,M,0800,0810,19,0125,1
2016,8,M,1500,1510,21,0910,2
2015,10,F,0800,0810,3,8413,5
df =pd.read_clipboard(sep=",", header=None)
df.rename(columns={0: "Name0", 1: "Name1",2:"Name2",3:"Name3",4:"Name4",5:"Name5",6:"Name6",7:"Name7",8:"Name8"})
will give you properly defined pandas Dataframe.
将为您提供正确定义的Pandas数据框。


