pandas 熊猫 read_csv 并只保留某些行(python)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39339142/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas read_csv and keep only certain rows (python)
提问by dleal
I am aware of the skiprows that allows you to pass a list with the indices of the rows to skip. However, I have the index of the rows I want to keep.
我知道 skiprows 允许您传递包含要跳过的行索引的列表。但是,我有要保留的行的索引。
Say that my cvs file looks like this for millions of rows:
假设我的 cvs 文件在数百万行中看起来像这样:
A B
0 1 2
1 3 4
2 5 6
3 7 8
4 9 0
The list of indices i would like to load are only 2,3, so
我想加载的索引列表只有 2,3,所以
index_list = [2,3]
The input for the skiprows function would be [0,1,4]. However, I only have available [2,3].
skiprows 函数的输入是 [0,1,4]。但是,我只有 [2,3] 可用。
I am trying something like:
我正在尝试类似的东西:
pd.read_csv(path, skiprows = ~index_list)
but no luck.. any suggestions?
但没有运气..有什么建议吗?
thank and I appreciate all the help,
谢谢,我感谢所有的帮助,
回答by gabra
I think you would need to find the number of lines first, like this.
我认为你需要先找到行数,就像这样。
num_lines = sum(1 for line in open('myfile.txt'))
Then you would need to delete the indices of index_list
:
然后你需要删除的索引index_list
:
to_exclude = [i for i in num_lines if i not in index_list]
and then load your data:
然后加载您的数据:
pd.read_csv(path, skiprows = to_exclude)