用 Pandas 在 CSV 文件中写注释
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29233496/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Write comments in CSV file with pandas
提问by Mathieu Dubois
I would like to write some comments in my CSV file created with pandas. I haven't found any option for this in DataFrame.to_csv(even though read_csvcan skip comments) neither in the standard csvmodule. I can open the file, write the comments (line starting with #) and then pass it to to_csv. Does any body have a better option?
我想在我用 .csv 创建的 CSV 文件中写一些评论pandas。我在标准模块中都没有找到任何选项DataFrame.to_csv(即使read_csv可以跳过评论)csv。我可以打开文件,写下注释(以 开头的行#),然后将其传递给to_csv. 任何机构有更好的选择吗?
回答by Vor
df.to_csvaccepts a file object. So you can open a file in amode, write you comments and pass it to the dataframe to_csv function.
df.to_csv接受一个文件对象。所以你可以在a模式下打开一个文件,写下你的评论并将它传递给数据框 to_csv 函数。
For example:
例如:
In [36]: df = pd.DataFrame({'a':[1,2,3], 'b':[1,2,3]})
In [37]: f = open('foo', 'a')
In [38]: f.write('# My awesome comment\n')
In [39]: f.write('# Here is another one\n')
In [40]: df.to_csv(f)
In [41]: f.close()
In [42]: more foo
# My awesome comment
# Here is another one
,a,b
0,1,1
1,2,2
2,3,3
回答by joelostblom
An alternative approach @Vor's solution is to first write the comment to a file, and then use mode='a'with to_csv()to add the content of the data frame to the same file. According to my benchmarks (below), this takes about as long as opening the file in append mode, adding the comment and then passing the file handler to pandas (as per @Vor's answer). The similar timings make sense considering that this is what pandas in doing internally (DataFrame.to_csv()calls CSVFormatter.save(), which uses _get_handles()to read in the filevia open().
@Vor 的解决方案的另一种方法是先将注释写入文件,然后使用mode='a'withto_csv()将数据框的内容添加到同一文件中。根据我的基准测试(如下),这与以追加模式打开文件、添加注释然后将文件处理程序传递给Pandas一样长(根据@Vor 的回答)。类似的时序意义考虑到这是在内部做(什么大PandasDataFrame.to_csv()调用CSVFormatter.save(),它使用_get_handles()的读取文件通过open()。
On a separate note, it is convenient work with file IO via withstatement which ensures that opened files close when you're done with them and leave the withstatement. See examples in the benchmarks below.
另外,通过with语句使用文件 IO 很方便,可确保打开的文件在您完成处理后关闭并离开该with语句。请参阅以下基准中的示例。
Read in test data
读入测试数据
import pandas as pd
# Read in the iris data frame from the seaborn GitHub location
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
# Create a bigger data frame
while iris.shape[0] < 100000:
iris = iris.append(iris)
# `iris.shape` is now (153600, 5)
1. Append with the same file handler
1.追加相同的文件处理程序
%%timeit -n 5 -r 5
# Open a file in append mode to add the comment
# Then pass the file handle to pandas
with open('test1.csv', 'a') as f:
f.write('# This is my comment\n')
iris.to_csv(f)
972 ms ± 31.9 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)
2. Reopen the file with to_csv(mode='a')
2. 重新打开文件to_csv(mode='a')
%%timeit -n 5 -r 5
# Open a file in write mode to add the comment
# Then close the file and reopen it with pandas in append mode
with open('test2.csv', 'w') as f:
f.write('# This is my comment\n')
iris.to_csv('test2.csv', mode='a')
949 ms ± 19.3 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)

