用 Pandas 在 CSV 文件中写注释

Question

提问by Mathieu Dubois

I would like to write some comments in my CSV file created with pandas. I haven't found any option for this in DataFrame.to_csv(even though read_csvcan skip comments) neither in the standard csvmodule. I can open the file, write the comments (line starting with #) and then pass it to to_csv. Does any body have a better option?

我想在我用 .csv 创建的 CSV 文件中写一些评论pandas。我在标准模块中都没有找到任何选项DataFrame.to_csv（即使read_csv可以跳过评论）csv。我可以打开文件，写下注释（以开头的行#），然后将其传递给to_csv. 任何机构有更好的选择吗？

Answer 1

回答by Vor

df.to_csvaccepts a file object. So you can open a file in amode, write you comments and pass it to the dataframe to_csv function.

df.to_csv接受一个文件对象。所以你可以在a模式下打开一个文件，写下你的评论并将它传递给数据框 to_csv 函数。

For example:

例如：

In [36]: df = pd.DataFrame({'a':[1,2,3], 'b':[1,2,3]})

In [37]: f = open('foo', 'a')

In [38]: f.write('# My awesome comment\n')

In [39]: f.write('# Here is another one\n')

In [40]: df.to_csv(f)

In [41]: f.close()

In [42]: more foo
# My awesome comment
# Here is another one
,a,b
0,1,1
1,2,2
2,3,3

Answer 2

回答by joelostblom

An alternative approach @Vor's solution is to first write the comment to a file, and then use mode='a'with to_csv()to add the content of the data frame to the same file. According to my benchmarks (below), this takes about as long as opening the file in append mode, adding the comment and then passing the file handler to pandas (as per @Vor's answer). The similar timings make sense considering that this is what pandas in doing internally (DataFrame.to_csv()calls CSVFormatter.save(), which uses _get_handles()to read in the filevia open().

@Vor 的解决方案的另一种方法是先将注释写入文件，然后使用mode='a'withto_csv()将数据框的内容添加到同一文件中。根据我的基准测试（如下），这与以追加模式打开文件、添加注释然后将文件处理程序传递给Pandas一样长（根据@Vor 的回答）。类似的时序意义考虑到这是在内部做（什么大PandasDataFrame.to_csv()调用CSVFormatter.save()，它使用_get_handles()的读取文件通过open()。

On a separate note, it is convenient work with file IO via withstatement which ensures that opened files close when you're done with them and leave the withstatement. See examples in the benchmarks below.

另外，通过with语句使用文件 IO 很方便，可确保打开的文件在您完成处理后关闭并离开该with语句。请参阅以下基准中的示例。

Read in test data

读入测试数据

import pandas as pd
# Read in the iris data frame from the seaborn GitHub location
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
# Create a bigger data frame
while iris.shape[0] < 100000:
    iris = iris.append(iris)
# `iris.shape` is now (153600, 5)

1. Append with the same file handler

1.追加相同的文件处理程序

%%timeit -n 5 -r 5

# Open a file in append mode to add the comment
# Then pass the file handle to pandas
with open('test1.csv', 'a') as f:
    f.write('# This is my comment\n')
    iris.to_csv(f)

972 ms ± 31.9 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)

2. Reopen the file with `to_csv(mode='a')`

2. 重新打开文件`to_csv(mode='a')`

%%timeit -n 5 -r 5

# Open a file in write mode to add the comment
# Then close the file and reopen it with pandas in append mode
with open('test2.csv', 'w') as f:
    f.write('# This is my comment\n')
iris.to_csv('test2.csv', mode='a')

949 ms ± 19.3 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)

用 Pandas 在 CSV 文件中写注释

提问by Mathieu Dubois

回答by Vor

回答by joelostblom

Read in test data

读入测试数据

1. Append with the same file handler

1.追加相同的文件处理程序

2. Reopen the file with `to_csv(mode='a')`

2. 重新打开文件`to_csv(mode='a')`

相关推荐

最近更新

标签

用 Pandas 在 CSV 文件中写注释

提问by Mathieu Dubois

回答by Vor

回答by joelostblom

Read in test data

读入测试数据

1. Append with the same file handler

1.追加相同的文件处理程序

2. Reopen the file with to_csv(mode='a')

2. 重新打开文件to_csv(mode='a')

相关推荐

Python pandas：获取数据框中值的位置

将列表读入 Pandas DataFrame 的列中

Pandas 相关性 Groupby

使用 Python 和 Pandas 对具有不同列名的 statsmodels.formula 数据使用 predict()

相关推荐

最近更新

标签

2. Reopen the file with `to_csv(mode='a')`

2. 重新打开文件`to_csv(mode='a')`