pandas 数据框 values.tolist() 数据类型

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34838378/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:31:24  来源:igfitidea点击:

dataframe values.tolist() datatype

pythonpandastolist

提问by Meng Qian

I have a dataframe like this:

我有一个这样的数据框:

enter image description here

在此处输入图片说明

This dataframe has several columns. Two are of type float: priceand change, while volmeand amountare of type int. I use the method df.values.tolist()change df to list and get the data:

此数据框有几列。两个属于类型floatpriceand change、whilevolmeamount属于类型int。我使用方法df.values.tolist()change df 列出并获取数据:

datatmp = df.values.tolist()
print(datatmp[0])

[20160108150023.0, 11.12, -0.01, 4268.0, 4746460.0, 2.0]

The inttypes in dfall change to floattypes. My question is why do inttypes change to the floattypes? How can I get the intdata I want?

所有的int类型df都变成了float类型。我的问题是为什么int类型会更改为float类型?我怎样才能得到int我想要的数据?

采纳答案by Mike Müller

You can convert column-by-column:

您可以逐列转换:

by_column = [df[x].values.tolist() for x in df.columns]

This will preserve the data type of each column.

这将保留每列的数据类型。

Than convert to the structure you want:

比转换为你想要的结构:

list(list(x) for x in zip(*by_column))

You can do it in one line:

您可以在一行中完成:

list(list(x) for x in zip(*(df[x].values.tolist() for x in df.columns)))

You can check what datatypes your columns have with:

您可以检查您的列具有哪些数据类型:

df.info()

Very likely your column amountis of type float. Do you have any NaNin this column? These are always of type floatand would make the whole column float.

您的列很可能amount是 类型float。你有NaN这个专栏吗?这些总是类型的float并且可以构成整个列float

You can cast to intwith:

你可以投射到int

df.values.astype(int).tolist()

回答by Pachelbel

I think the pandas documentation helps:

我认为Pandas文档有帮助:

DataFrame.values

Numpy representation of NDFrame

The dtype will be a lower-common-denominator dtype (implicit upcasting); that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen. Use this with care if you are not dealing with the blocks.

DataFrame.values

NDFrame 的 Numpy 表示

dtype 将是一个较低的公分母 dtype(隐式向上转换);也就是说,如果 dtypes(甚至是数字类型)混合在一起,则将选择容纳所有类型的 dtypes。如果您不处理块,请小心使用它。

So here apparently float is chosen to accomodate all component types. A simple method would be (however, most possibly there are more elegant solutions around, I'm not too familiar with pandas):

所以这里显然选择了 float 来容纳所有组件类型。一个简单的方法是(但是,很可能有更优雅的解决方案,我对Pandas不太熟悉):

datatmp = map(lambda row: list(row[1:]), df.itertuples())

Here the itertuples()gives an iterator with elements of the form (rownumber, colum1_entry, colum2_entry, ...). The map takes each such tuple and applies the lambda function, which removes the first component (rownumber), and returns a list containing the components of a single row. You can also remove the list()invocation if it's ok for you to work with a list of tuples.

这里itertuples()给出了一个具有以下形式元素的迭代器 (rownumber, colum1_entry, colum2_entry, ...)。该映射采用每个这样的元组并应用 lambda 函数,该函数删除第一个组件(行号),并返回一个包含单行组件的列表。如果您可以list()使用元组列表,您也可以删除调用。

[Dataframe values property][1] "http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.values.html#pandas.DataFrame.values"

[数据框值属性][1]“ http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.values.html#pandas.DataFrame.values