Python pandas：使用整数将数据帧输出到 csv

Question

提问by xApple

I have a pandas.DataFramethat I wish to export to a CSV file. However, pandas seems to write some of the values as floatinstead of inttypes. I couldn't not find how to change this behavior.

我有一个pandas.DataFrame我希望导出到 CSV 文件的文件。但是，pandas 似乎将某些值写入float而不是int类型。我找不到如何改变这种行为。

Building a data frame:

构建数据框：

df = pandas.DataFrame(columns=['a','b','c','d'], index=['x','y','z'], dtype=int)
x = pandas.Series([10,10,10], index=['a','b','d'], dtype=int)
y = pandas.Series([1,5,2,3], index=['a','b','c','d'], dtype=int)
z = pandas.Series([1,2,3,4], index=['a','b','c','d'], dtype=int)
df.loc['x']=x; df.loc['y']=y; df.loc['z']=z

View it:

查看它：

>>> df
    a   b    c   d
x  10  10  NaN  10
y   1   5    2   3
z   1   2    3   4

Export it:

导出：

>>> df.to_csv('test.csv', sep='\t', na_rep='0', dtype=int)
>>> for l in open('test.csv'): print l.strip('\n')
        a       b       c       d
x       10.0    10.0    0       10.0
y       1       5       2       3
z       1       2       3       4

Why do the tens have a dot zero ?

为什么十位有一个点零？

Sure, I could just stick this function into my pipeline to reconvert the whole CSV file, but it seems unnecessary:

当然，我可以将此函数粘贴到我的管道中以重新转换整个 CSV 文件，但这似乎没有必要：

def lines_as_integer(path):
    handle = open(path)
    yield handle.next()
    for line in handle:
        line = line.split()
        label = line[0]
        values = map(float, line[1:])
        values = map(int, values)
        yield label + '\t' + '\t'.join(map(str,values)) + '\n'
handle = open(path_table_int, 'w')
handle.writelines(lines_as_integer(path_table_float))
handle.close()

Answer 1

采纳答案by xApple

The answer I was looking for was a slight variation of what @Jeff proposed in his answer. The credit goes to him. This is what solved my problem in the end for reference:

我正在寻找的答案与@Jeff 在他的答案中提出的内容略有不同。功劳归于他。这就是最终解决我的问题以供参考：

    import pandas
    df = pandas.DataFrame(data, columns=['a','b','c','d'], index=['x','y','z'])
    df = df.fillna(0)
    df = df.astype(int)
    df.to_csv('test.csv', sep='\t')

Answer 2

回答by Andy Hayden

This is a "gotcha" in pandas (Support for integer NA), where integer columns with NaNs are converted to floats.

这是pandas (Support for integer NA) 中的一个“陷阱”，其中带有 NaN 的整数列被转换为浮点数。

This trade-off is made largely for memory and performance reasons, and also so that the resulting Series continues to be “numeric”. One possibility is to use dtype=objectarrays instead.

这种权衡主要是出于内存和性能方面的原因，同时也是为了使生成的系列继续是“数字”的。一种可能性是改用dtype=object数组。

Answer 3

回答by Jeff

The problem is that since you are assigning things by rows, but dtypes are grouped by columns, so things get cast to objectdtype, which is not a good thing, you lose all efficiency. So one way is to convert which will coerce to float/int dtype as needed.

问题是，由于您是按行分配内容，但 dtypes 是按列分组的，因此事情会被强制转换为 dtype object，这不是一件好事，您将失去所有效率。所以一种方法是根据需要转换 which 将强制转换为 float/int dtype。

As we answered in another question, if you construct the frame all at once (or construct column by column) this step will not be needed

正如我们在另一个问题中回答的那样，如果您一次构建所有框架（或逐列构建），则不需要此步骤

In [23]: def convert(x):
   ....:     try:
   ....:         return x.astype(int)
   ....:     except:
   ....:         return x
   ....:     

In [24]: df.apply(convert)
Out[24]: 
    a   b   c   d
x  10  10 NaN  10
y   1   5   2   3
z   1   2   3   4

In [25]: df.apply(convert).dtypes
Out[25]: 
a      int64
b      int64
c    float64
d      int64
dtype: object

In [26]: df.apply(convert).to_csv('test.csv')

In [27]: !cat test.csv
,a,b,c,d
x,10,10,,10
y,1,5,2.0,3
z,1,2,3.0,4

Answer 4

回答by Tad

If you want to preserve NaN info in the csv which you have exported, then do the below. P.S : I'm concentrating on column 'C' in this case.

如果要在导出的 csv 中保留 NaN 信息，请执行以下操作。PS：在这种情况下，我专注于“C”列。

df[c] = df[c].fillna('')       #filling Nan with empty string
df[c] = df[c].astype(str)      #convert the column to string 
>>> df
    a   b    c     d
x  10  10         10
y   1   5    2.0   3
z   1   2    3.0   4

df[c] = df[c].str.split('.')   #split the float value into list based on '.'
>>> df
        a   b    c          d
    x  10  10   ['']       10
    y   1   5   ['2','0']   3
    z   1   2   ['3','0']   4

df[c] = df[c].str[0]            #select 1st element from the list
>>> df
    a   b    c   d
x  10  10       10
y   1   5    2   3
z   1   2    3   4

Now, if you export the dataframe to csv, Column 'c' will not have float values and the NaN info is preserved.

现在，如果您将数据框导出到 csv，列 'c' 将没有浮点值并且 NaN 信息被保留。

Answer 5

回答by appsdownload

You can use astype() to specify data type for each column

您可以使用 astype() 为每一列指定数据类型

For example:

例如：

import pandas
df = pandas.DataFrame(data, columns=['a','b','c','d'], index=['x','y','z'])

df = df.astype({"a": int, "b": complex, "c" : float, "d" : int})

Answer 6

回答by Bjorn Eriksson

You can change your DataFrame into Numpy array as a workaround:

您可以将 DataFrame 更改为 Numpy 数组作为解决方法：

 np.savetxt(savepath, np.array(df).astype(np.int), fmt='%i', delimiter = ',', header= 'PassengerId,Survived', comments='')

Python pandas：使用整数将数据帧输出到 csv

提问by xApple

采纳答案by xApple

回答by Andy Hayden

回答by Jeff

回答by Tad

回答by appsdownload

回答by Bjorn Eriksson

相关推荐

最近更新

标签

Python pandas：使用整数将数据帧输出到 csv

提问by xApple

采纳答案by xApple

回答by Andy Hayden

回答by Jeff

回答by Tad

回答by appsdownload

回答by Bjorn Eriksson

相关推荐

Python urlopen 错误 [Errno 11001] getaddrinfo 失败

Python - 检查一个字母是否在列表中

Python modbus 库

绘制在同一轴上的for循环内生成的多个图python

相关推荐

最近更新

标签