pandas 如何在pandas.read_csv的标题之前跳过未知数量的空行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39297878/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:56:48  来源:igfitidea点击:

How to skip an unknown number of empty lines before header on pandas.read_csv?

pythoncsvpandasfile-iodata-import

提问by bmello

I want to read a dataframe from a csv file where the header is not in the first line. For example:

我想从标题不在第一行的 csv 文件中读取数据帧。例如:

In [1]: import pandas as pd

In [2]: import io

In [3]: temp=u"""#Comment 1
   ...: #Comment 2
   ...: 
   ...: #The previous line is empty
   ...: Header1|Header2|Header3
   ...: 1|2|3
   ...: 4|5|6
   ...: 7|8|9"""

In [4]: df = pd.read_csv(io.StringIO(temp), sep="|", comment="#", 
   ...:                  skiprows=4).dropna()

In [5]: df
Out[5]: 
   Header1  Header2  Header3
0        1        2        3
1        4        5        6
2        7        8        9

[3 rows x 3 columns]

The problem with the above code is that I don't now how many lines will exist before the header, therefore, I cannot use skiprows=4as I did here.

上面代码的问题是我现在不知道标题之前会存在多少行,因此,我不能skiprows=4像这里那样使用。

I aware I can iterate through the file, as in the question Read pandas dataframe from csv beginning with non-fix header.

我知道我可以遍历该文件,如问题Read pandas dataframe from csv begin with non-fix header

What I am looking for is a simpler solution, like making pandas.read_csvdisregard any empty line and taking the first non-empty line as the header.

我正在寻找的是一个更简单的解决方案,比如pandas.read_csv忽略任何空行并将第一个非空行作为标题。

回答by ode2k

You need to set skip_blank_lines=True

你需要设置 skip_blank_lines=True

df = pd.read_csv(io.StringIO(temp), sep="|", comment="#", skip_blank_lines=True).dropna()