我需要你关于 python pandas 中 read_fwf 的帮助

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29320527/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:07:31  来源:igfitidea点击:

I need your help about read_fwf in python pandas

pythonpandas

提问by seminj

The example of text file is picture enter image description here

文本文件的例子是图片 在此处输入图片说明

According to file, the direction of data will be changed after the word 'chapter' In the other word, Direction of reading is changed horizontal to vertical.

根据文件,数据的方向将在'chapter'这个词之后发生变化,换句话说,阅读方向从水平变为垂直。

In order to solve this big problem, I find read_fwf in pandas module and apply it but failed.

为了解决这个大问题,我在pandas模块中找到read_fwf并应用它但失败了。

linefwf = pandas.read_fwf('File.txt', widths=[33,33,33], header=None, nwors = 3)

The gap between categories(Chapter, Title, Assignment) is 33.

类别(章节、标题、作业)之间的差距为 33。

But the command(linefwf) prints all of pages line which includes horizontal categories such as Title, Date, Reservation as well as blank lines.

但是命令(linefwf)打印所有页面行,其中包括水平类别,如标题、日期、预订以及空行。

Please, I want to know 'How to export vertical data only'

拜托,我想知道“如何仅导出垂直数据”

回答by Jonathan Eunice

Let me take a stab in the dark: you wish to turn this table into a column (aka "vertical category"), ignoring the other columns?

让我在黑暗中试一试:您希望将此表变成一列(又名“垂直类别”),而忽略其他列?

I didn't have your precise text, so I guesstimated it. My column widths were different than yours ([11,21,31]) and I omitted the nworsargument (you probably meant to use nrows, but it's superfluous in this case). While the column spec isn't very precise, a few seconds of fiddling left me with a workable DataFrame:

我没有你的精确文本,所以我猜测它。我的列宽与您的 ( [11,21,31])不同,我省略了nwors参数(您可能打算使用nrows,但在这种情况下它是多余的)。虽然列规范不是很精确,但几秒钟的摆弄给我留下了一个可行的DataFrame

enter image description here

在此处输入图片说明

This is pretty typical of read-in datasets. Let's clean it up slightly, by giving it real column names, and taking out the separator rows:

这是读入数据集的典型特征。让我们稍微清理一下,给它真正的列名,并取出分隔行:

df.columns = list(df.loc[0])
df = df.ix[2:6]

This has the following effect:

这具有以下效果:

enter image description here

在此处输入图片说明

Leaving us with dfas:

离开我们df

enter image description here

在此处输入图片说明

We won't take the time to reindex the rows. Assuming we want the value of a column, we can get it by indexing:

我们不会花时间重新索引行。假设我们想要一个列的值,我们可以通过索引来获取它:

df['Chapter']

Yields:

产量:

2    1-1
3    1-2
4    1-3
5    1-4
6    1-5
Name: Chapter, dtype: object

Or if you want it not as a pandas.Seriesbut a native Python list:

或者,如果您不希望它作为pandas.Series本机 Python 而是使用它list

list(df['Chapter'])

Yields:

产量:

['1-1', '1-2', '1-3', '1-4', '1-5']