python读取带有行号的多列tsv文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25747985/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 23:34:56  来源:igfitidea点击:

python reading in multi-column tsv file with row numbers

pythondataframetsv

提问by 719016

What is the cleanest way of reading in a multi-column tsv file in python with headers, but where the first column has no header and instead contains the row numbers for each row?

在带有标题的 python 中读取多列 tsv 文件的最干净方法是什么,但第一列没有标题,而是包含每行的行号?

This is apparently a common format from files coming from R data frames.

这显然是来自 R 数据帧的文件的常见格式。

Example:

例子:

    A      B  C
1   a1     b1 c1
2   a2     b2 c2
3   a3     b3 c3

Any ideas?

有任何想法吗?

采纳答案by skyuuka

Depends on what you want to do with the data afterwards (and if the file is truly a tsv with a \t delimiter). If you just want it in a set of lists you can use the csvmodule like so:

取决于您之后要对数据做什么(以及文件是否确实是带有 \t 分隔符的 tsv)。如果你只想要它在一组列表中,你可以csv像这样使用模块:

import csv
with open("tsv.tsv") as tsvfile:
    tsvreader = csv.reader(tsvfile, delimiter="\t")
    for line in tsvreader:
        print line[1:]

However I'd also recommend the DataFramemodule from pandasfor anything outside of simple python operations. It can be used as such:

但是,我也建议DataFramepandas简单的 python 操作之外的任何东西中使用该模块。它可以这样使用:

from pandas import DataFrame
df = DataFrame.from_csv("tsv.tsv", sep="\t")

DataFrames allow for high level manipulation of data sets such as adding columns, finding averages, etc..

DataFrames 允许对数据集进行高级操作,例如添加列、查找平均值等。

回答by skyuuka

How about using the following native Python codes:

如何使用以下原生 Python 代码:

with open('tsvfilename') as f:
    lines = f.read().split('\n')[:-1]
    for i, line in enumerate(lines):
        if i == 0: # header
            column_names = line.split()
            # ...
        else:
            data = line.split();
            # ...

回答by Pil Kwon

df = DataFrame.from_csv("tsv.tsv", sep="\t")is deprecated

df = DataFrame.from_csv("tsv.tsv", sep="\t")已弃用

df.read_csv("tsv.tsv", sep="\t")is probably working

df.read_csv("tsv.tsv", sep="\t")可能正在工作

回答by Rohail

DataFrame.from_csv("tsv.tsv", sep="\t")

is not working anymore. Use

不再工作了。用

df.read_csv("tsv.tsv", sep="\t")

回答by Roshan Salian

pandas.read_csv("file.tsv")

pandas.read_csv("file.tsv")

DataFrame.from_csv()doesn't work. DataFrame.read_csv()isn't right.

DataFrame.from_csv()不起作用。DataFrame.read_csv()不对。