pandas 如何将 DataFrame 从 Stack Overflow 复制/粘贴到 Python 中

Question

提问by LondonRob

In questionsand answers, users very often post an example DataFramewhich their question/answer works with:

在问题和答案，用户经常发布一个例子DataFrame，他们的提问/回答与工作：

In []: x
Out[]: 
   bar  foo
0    4    1
1    5    2
2    6    3

It'd be really useful to be able to get this DataFrameinto my Python interpreter so I can start debugging the question, or testing the answer.

能够将其DataFrame输入到我的 Python 解释器中真的很有用，这样我就可以开始调试问题或测试答案。

How can I do this?

我怎样才能做到这一点？

Answer 1

回答by LondonRob

Pandas is written by people that really know what people want to do.

Pandas 是由真正知道人们想要做什么的人编写的。

Since version 0.13there's a function pd.read_clipboardwhich is absurdly effective at making this "just work".

从版本开始，0.13有一个功能pd.read_clipboard在使这个“正常工作”方面非常有效。

Copy and paste the part of the code in the question that starts bar foo, (i.e. the DataFrame) and do this in a Python interpreter:

复制并粘贴以 , 开头的问题中的代码部分bar foo（即 DataFrame），然后在 Python 解释器中执行此操作：

In [53]: import pandas as pd
In [54]: df = pd.read_clipboard()

In [55]: df
Out[55]: 
   bar  foo
0    4    1
1    5    2
2    6    3

Caveats

注意事项

Don't include the iPython Inor Outstuff or it won't work
If you have a named index, you currently need to add engine='python'(see this issueon GitHub). The 'c' engine is currently broken when the index is named.
It's not brilliant at MultiIndexes:

不要包含 iPythonIn或其他Out东西，否则它将无法工作
如果您有命名索引，则当前需要添加engine='python'（请参阅GitHub 上的此问题）。命名索引时，“c”引擎当前已损坏。
它在 MultiIndexes 上并不出色：

Try this:

尝试这个：

                      0         1         2
level1 level2                              
foo    a       0.518444  0.239354  0.364764
       b       0.377863  0.912586  0.760612
bar    a       0.086825  0.118280  0.592211

which doesn't work at all, or this:

这根本不起作用，或者：

              0         1         2
foo a  0.859630  0.399901  0.052504
    b  0.231838  0.863228  0.017451
bar a  0.422231  0.307960  0.801993

Which works, but returns something totally incorrect!

哪个有效，但返回的东西完全不正确！

Answer 2

回答by tel

pd.read_clipboard()is nifty. However, if you're writing code in a script or a notebook (and you want your code to work in the future) it's not a great fit. Here's an alternative way to copy/paste the output of a dataframe into a new dataframe object that ensures that dfwill outlive the contents of your clipboard:

pd.read_clipboard()很漂亮。但是，如果您在脚本或笔记本中编写代码（并且您希望您的代码在未来工作），则它不太适合。这是将数据帧的输出复制/粘贴到新的数据帧对象中的另一种方法，以确保它df比剪贴板的内容更有效：

# py3 only, see below for py2
import pandas as pd
from io import StringIO

d = '''0   1   2   3   4
A   Y   N   N   Y
B   N   Y   N   N
C   N   N   N   N
D   Y   Y   N   Y
E   N   Y   Y   Y
F   Y   Y   N   Y
G   Y   N   N   Y'''

df = pd.read_csv(StringIO(d), sep='\s+')

A few notes:

一些注意事项：

The triple-quoted string preserves the newlines in the output.
StringIOwraps the output in a file-like object, which read_csvrequires.
Setting septo \s+makes it so that each contiguous block of whitespace is treated as a single delimiter.

三引号字符串保留输出中的换行符。
StringIO将输出包装在一个类似文件的对象中，这read_csv需要。
设置sep为\s+使每个连续的空白块都被视为单个分隔符。

update

更新

The above answer is Python 3 only. If you're stuck in Python 2, replace the import line:

上面的答案仅适用于 Python 3。如果您被困在 Python 2 中，请替换导入行：

from io import StringIO

with instead:

相反：

from StringIO import StringIO

If you have an old version of pandas(v0.24or older) there's an easy way to write a Py2/Py3 compatible version of the above code:

如果您有pandas（v0.24或更旧）的旧版本，则有一种简单的方法可以编写上述代码的 Py2/Py3 兼容版本：

import pandas as pd

d = ...
df = pd.read_csv(pd.compat.StringIO(d), sep='\s+')

The newest versions of pandashave dropped the compatmodule along with Python 2 support.

的最新版本pandas已删除该compat模块以及 Python 2 支持。

Answer 3

回答by Harvey

If you are copy-pasting from CSV file which has standard entries like this:

如果您从 CSV 文件中复制粘贴，该文件具有如下标准条目：

2016,10,M,0600,0610,13,1020,24
2016,3,F,0300,0330,21,6312,1
2015,4,M,0800,0830,8,7112,30
2015,10,M,0800,0810,19,0125,1
2016,8,M,1500,1510,21,0910,2
2015,10,F,0800,0810,3,8413,5

df =pd.read_clipboard(sep=",", header=None)
df.rename(columns={0: "Name0", 1: "Name1",2:"Name2",3:"Name3",4:"Name4",5:"Name5",6:"Name6",7:"Name7",8:"Name8"})

will give you properly defined pandas Dataframe.

将为您提供正确定义的Pandas数据框。

pandas 如何将 DataFrame 从 Stack Overflow 复制/粘贴到 Python 中

提问by LondonRob

回答by LondonRob

Caveats

注意事项

回答by tel

update

更新

回答by Harvey

相关推荐

最近更新

标签

pandas 如何将 DataFrame 从 Stack Overflow 复制/粘贴到 Python 中

提问by LondonRob

回答by LondonRob

Caveats

注意事项

回答by tel

update

更新

回答by Harvey

相关推荐

Python pandas.cut

使用 geopy pandas 坐标的新列

pandas 大熊猫 value_counts() 主要按降序排序，其次按升序排序

pandas AttributeError: 'Series' 对象没有属性 'items'

相关推荐

最近更新

标签