Pandas：不断地从函数写入 csv

Question

提问by Winterflags

I have a function set up for Pandasthat runs through a large number of rows in input.csvand inputs the results into a Series. It then writes the Series to output.csv.

我为Pandas它设置了一个函数，该函数在其中运行大量行input.csv并将结果输入到一个系列中。然后将系列写入output.csv.

However, if the process is interrupted (for example by an unexpected event) the program will terminate and all data that would have gone into the csv is lost.

但是，如果进程被中断（例如由于意外事件），程序将终止并且所有本应进入 csv 的数据都将丢失。

Is there a way to write the data continuously to the csv, regardless of whether the function finishes for all rows?

有没有办法将数据连续写入 csv，无论函数是否对所有行都完成？

Prefarably, each time the program starts, a blank output.csvis created, that is appended to while the function is running.

最好是，每次程序启动时，output.csv都会创建一个空白，并在函数运行时附加该空白。

import pandas as pd

df = pd.read_csv("read.csv")

def crawl(a):
    #Create x, y
    return pd.Series([x, y])

df[["Column X", "Column Y"]] = df["Column A"].apply(crawl)
df.to_csv("write.csv", index=False)

Answer 1

采纳答案by Winterflags

In the end, this is what I came up with. Thanks for helping out!

最后，这就是我想出的。感谢您的帮助！

import pandas as pd

df1 = pd.read_csv("read.csv")

run = 0

def crawl(a):

    global run
    run = run + 1

    #Create x, y

    df2 = pd.DataFrame([[x, y]], columns=["X", "Y"])

    if run == 1:
        df2.to_csv("output.csv")
    if run != 1:
        df2.to_csv("output.csv", header=None, mode="a")

df1["Column A"].apply(crawl)

Answer 2

回答by Tom Patel

This is a possible solution that will append the data to a new file as it reads the csv in chunks. If the process is interrupted the new file will contain all the information up until the interruption.

这是一个可能的解决方案，它将数据附加到一个新文件，因为它以块的形式读取 csv。如果进程中断，新文件将包含中断前的所有信息。

import pandas as pd

#csv file to be read in 
in_csv = '/path/to/read/file.csv'

#csv to write data to 
out_csv = 'path/to/write/file.csv'

#get the number of lines of the csv file to be read
number_lines = sum(1 for row in (open(in_csv)))

#size of chunks of data to write to the csv
chunksize = 10

#start looping through data writing it to a new file for each chunk
for i in range(1,number_lines,chunksize):
     df = pd.read_csv(in_csv,
          header=None,
          nrows = chunksize,#number of rows to read at each loop
          skiprows = i)#skip rows that have been read

     df.to_csv(out_csv,
          index=False,
          header=False,
          mode='a',#append data to csv file
          chunksize=chunksize)#size of data to append for each loop

Answer 3

回答by Ben K.

I would suggest this:

我建议这样做：

with open("write.csv","a") as f:
    df.to_csv(f,header=False,index=False)

The argument "a" will append the new df to an existing file and the file gets closed after the with block is finished, so you should keep all of your intermediary results.

参数“a”会将新的 df 附加到现有文件中，并且在 with 块完成后文件将关闭，因此您应该保留所有中间结果。

Answer 4

回答by tmsss

I've found a solution to a similar problem by looping the dataframe with iterrows() and saving each row to the csv file, which in your case it could be something like this:

通过使用 iterrows() 循环数据帧并将每一行保存到 csv 文件，我找到了解决类似问题的方法，在您的情况下，它可能是这样的：

for ix, row in df.iterrows():
    row['Column A'] = crawl(row['Column A'])

    # if you wish to mantain the header
    if ix == 0:
        df.iloc[ix - 1: ix].to_csv('output.csv', mode='a', index=False, sep=',', encoding='utf-8')
    else:
        df.iloc[ix - 1: ix].to_csv('output.csv', mode='a', index=False, sep=',', encoding='utf-8', header=False)

Pandas：不断地从函数写入 csv

提问by Winterflags

采纳答案by Winterflags

回答by Tom Patel

回答by Ben K.

回答by tmsss

相关推荐

最近更新

标签

Pandas：不断地从函数写入 csv

提问by Winterflags

采纳答案by Winterflags

回答by Tom Patel

回答by Ben K.

回答by tmsss

相关推荐

pandas 如何在 scikit learn 中矢量化具有多个文本列的数据框而不会丢失对原始列的跟踪

pandas 如何将 MySQL 时间戳（6）读入熊猫？

使用 Pandas 读取带有 numpy 数组的 csv

pandas 将给定行移动到 DataFrame 的末尾

相关推荐

最近更新

标签