python读取带有行号的多列tsv文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25747985/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python reading in multi-column tsv file with row numbers
提问by 719016
What is the cleanest way of reading in a multi-column tsv file in python with headers, but where the first column has no header and instead contains the row numbers for each row?
在带有标题的 python 中读取多列 tsv 文件的最干净方法是什么,但第一列没有标题,而是包含每行的行号?
This is apparently a common format from files coming from R data frames.
这显然是来自 R 数据帧的文件的常见格式。
Example:
例子:
A B C
1 a1 b1 c1
2 a2 b2 c2
3 a3 b3 c3
Any ideas?
有任何想法吗?
采纳答案by skyuuka
Depends on what you want to do with the data afterwards (and if the file is truly a tsv with a \t delimiter). If you just want it in a set of lists you can use the csvmodule like so:
取决于您之后要对数据做什么(以及文件是否确实是带有 \t 分隔符的 tsv)。如果你只想要它在一组列表中,你可以csv像这样使用模块:
import csv
with open("tsv.tsv") as tsvfile:
tsvreader = csv.reader(tsvfile, delimiter="\t")
for line in tsvreader:
print line[1:]
However I'd also recommend the DataFramemodule from pandasfor anything outside of simple python operations. It can be used as such:
但是,我也建议DataFrame从pandas简单的 python 操作之外的任何东西中使用该模块。它可以这样使用:
from pandas import DataFrame
df = DataFrame.from_csv("tsv.tsv", sep="\t")
DataFrames allow for high level manipulation of data sets such as adding columns, finding averages, etc..
DataFrames 允许对数据集进行高级操作,例如添加列、查找平均值等。
回答by skyuuka
How about using the following native Python codes:
如何使用以下原生 Python 代码:
with open('tsvfilename') as f:
lines = f.read().split('\n')[:-1]
for i, line in enumerate(lines):
if i == 0: # header
column_names = line.split()
# ...
else:
data = line.split();
# ...
回答by Pil Kwon
df = DataFrame.from_csv("tsv.tsv", sep="\t")is deprecated
df = DataFrame.from_csv("tsv.tsv", sep="\t")已弃用
df.read_csv("tsv.tsv", sep="\t")is probably working
df.read_csv("tsv.tsv", sep="\t")可能正在工作
回答by Rohail
DataFrame.from_csv("tsv.tsv", sep="\t")
is not working anymore. Use
不再工作了。用
df.read_csv("tsv.tsv", sep="\t")
回答by Roshan Salian
pandas.read_csv("file.tsv")
pandas.read_csv("file.tsv")
DataFrame.from_csv()doesn't work. DataFrame.read_csv()isn't right.
DataFrame.from_csv()不起作用。DataFrame.read_csv()不对。

