用 Pandas 编写单个 CSV 标头

Question

提问by HelloToEarth

I'm parsing data into lists and using pandas to frame and write to an CSV file. First my data is taken into a set where inv, name, and dateare all lists with numerous entries. Then I use concatto concatenate each iteration through the datasets I parse through to a CSV file like so:

我正在将数据解析为列表并使用 Pandas 来构建和写入 CSV 文件。首先，我的数据被放入一个集合中，其中inv、name和date都是具有大量条目的列表。然后我使用concat将每次迭代通过我解析的数据集连接到一个 CSV 文件，如下所示：

counter = True
data = {'Invention': inv, 'Inventor': name, 'Date': date}

if counter is True:
  df = pd.DataFrame(data)
  df = df[['Invetion', 'Inventor', 'Date']]

else:
  df = pd.concat([df, pd.DataFrame(data)])
  df = df[['Invention', 'Inventor', 'Date']]

  with open('./new.csv', 'a', encoding = utf-8) as f:
    if counter is True:
      df.to_csv(f, index = False, header = True)
    else:
      df.to_csv(f, index = False, header = False)

counter = False

The counter = True statement resides outsideof my iteration loop for all the data I'm parsing so it's not overwriting every time.

counter = True 语句驻留在我正在解析的所有数据的迭代循环之外，因此它不会每次都被覆盖。

So this means it only runs oncethrough my data to grab the first dfset then concats it thereafter. The problem is that even though counter is only True the first round and works for my first if-statementfor df it does not work for my writing to file.

所以这意味着它只在我的数据中运行一次以获取第一个df集，然后将其连接起来。问题是，即使 counter 仅在第一轮为 True 并且适用于我的第一个df语句，但它不适用于我写入文件。

What happens is that the header is written over and over again - regardless to the fact that counter is only True once. When I swap the header = False for when counter is True then it never writes the header.

发生的情况是头被一遍又一遍地写入——不管 counter 只为一次 True 的事实。当我将 header = False 交换为 counter 为 True 时，它永远不会写入标头。

I think this is because of the concatenation of df holding onto the header somehow but other than that I cannot figure out the logic error.

我认为这是因为 df 以某种方式连接到标题上，但除此之外我无法弄清楚逻辑错误。

Is there perhaps another way I could also write a header once and only once to the same CSV file?

也许还有另一种方法可以将标题一次且仅一次写入同一个 CSV 文件？

Answer 1

回答by Tom Lynch

It's hard to tell what might be going wrong without seeing the rest of the code. I've developed some test data and logic that works; you can adapt it to fit your needs.

如果没有看到其余的代码，很难判断可能出了什么问题。我开发了一些有效的测试数据和逻辑；您可以对其进行调整以满足您的需求。

Please try this:

请试试这个：

import pandas as pd

early_inventions = ['wheel', 'fire', 'bronze']
later_inventions = ['automobile', 'computer', 'rocket']

early_names = ['a', 'b', 'c']
later_names = ['z', 'y', 'x']

early_dates = ['2000-01-01', '2001-10-01', '2002-03-10']
later_dates = ['2010-01-28', '2011-10-10', '2012-12-31']

early_data = {'Invention': early_inventions,
    'Inventor': early_names,
    'Date': early_dates}

later_data = {'Invention': later_inventions,
    'Inventor': later_names,
    'Date': later_dates}

datasets = [early_data, later_data]

columns = ['Invention', 'Inventor', 'Date']
header = True
for dataset in datasets:
    df = pd.DataFrame(dataset)
    df = df[columns]
    mode = 'w' if header else 'a'
    df.to_csv('./new.csv', encoding='utf-8', mode=mode, header=header, index=False)
    header = False

Alternatively, you can concatenate all of the data in the loop and write out the dataframe at the end:

或者，您可以连接循环中的所有数据并在最后写出数据帧：

df = pd.DataFrame(columns=columns)
for dataset in datasets:
    df = pd.concat([df, pd.DataFrame(dataset)])
    df = df[columns]
df.to_csv('./new.csv', encoding='utf-8', index=False)

If your code cannot be made to conform to this API, you can forego writing the header in to_csv altogether. You can detect whether the output file exists and write the header to it first if it does not:

如果您的代码无法符合此 API，您可以完全放弃在 to_csv 中写入标头。您可以检测输出文件是否存在，如果不存在，则首先将标头写入其中：

import os

fn = './new.csv'
if not os.exists(fn):
    with open(fn, mode='w', encoding='utf-8') as f:
        f.write(','.join(columns) + '\n')
# now append the dataframe without a header
df.to_csv(fn, encoding='utf-8', mode='a', header=False, index=False)

Answer 2

回答by LeninGF

I found the same problem. Pandas dataframe to csvworks fine if the dataframe is finished and no need to do anything beyond any tutorial.

我发现了同样的问题。如果数据帧完成并且不需要做任何教程之外的任何事情，Pandas数据帧到 csv 就可以正常工作。

However if our program is making results and we are appending them, it seems that we find the repetitive header writing problem

但是，如果我们的程序正在生成结果并且我们正在附加它们，似乎我们发现了重复的标题写入问题

In order to solve this consider the following function:

为了解决这个问题，请考虑以下函数：

def write_data_frame_to_csv_2(dict, path, header_list):
    df = pd.DataFrame.from_dict(data=dict, orient='index')
    filename = os.path.join(path, 'results_with_header.csv')
    if os.path.isfile(filename):
        mode = 'a'
        header = 0
    else:
        mode = 'w'
        header = header_list

    with open(filename, mode=mode) as f:
        df.to_csv(f, header=header, index_label='model')

If the file does not existwe use write modeand header is equal to header list. When this is false, and the file existswe use append and header changed to 0.

如果文件不存在，我们使用写模式，标题等于标题列表。当这是错误的，并且文件存在时，我们使用 append 并将标题更改为 0。

The function receives a simple dictionary as parameter, In my case I used:

该函数接收一个简单的字典作为参数，在我的例子中，我使用了：

model = { 'model_name':{'acc':0.9,
                    'loss':0.3,
                    'tp':840,
                    'tn':450}

      }

Using the function form ipython console several times produces expected result:

多次使用 ipython 控制台的函数会产生预期的结果：

write_data_frame_to_csv_2(model, './', header_list)

Csv generated:

CSV 生成：

model,acc,loss,tp,tn
model_name,0.9,0.3,840,450
model_name,0.9,0.3,840,450
model_name,0.9,0.3,840,450
model_name,0.9,0.3,840,450

Let me know if it helps. Happy coding!

如果有帮助，请告诉我。快乐编码！

用 Pandas 编写单个 CSV 标头

提问by HelloToEarth

回答by Tom Lynch

回答by LeninGF

相关推荐

最近更新

标签

用 Pandas 编写单个 CSV 标头

提问by HelloToEarth

回答by Tom Lynch

回答by LeninGF

相关推荐

列出 Pandas 数据框中的唯一值

pandas dataframe resample 聚合函数使用具有自定义函数的多列？

pandas tabula-py 导入错误：无法导入名称“read_pdf”

pandas 如何将一行附加到另一个数据帧

相关推荐

最近更新

标签