pandas 数据框 values.tolist() 数据类型
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34838378/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
dataframe values.tolist() datatype
提问by Meng Qian
I have a dataframe like this:
我有一个这样的数据框:
This dataframe has several columns. Two are of type float
: price
and change
, while volme
and amount
are of type int
.
I use the method df.values.tolist()
change df to list and get the data:
此数据框有几列。两个属于类型float
:price
and change
、whilevolme
和amount
属于类型int
。我使用方法df.values.tolist()
change df 列出并获取数据:
datatmp = df.values.tolist()
print(datatmp[0])
[20160108150023.0, 11.12, -0.01, 4268.0, 4746460.0, 2.0]
The int
types in df
all change to float
types.
My question is why do int
types change to the float
types? How can I get the int
data I want?
所有的int
类型df
都变成了float
类型。我的问题是为什么int
类型会更改为float
类型?我怎样才能得到int
我想要的数据?
采纳答案by Mike Müller
You can convert column-by-column:
您可以逐列转换:
by_column = [df[x].values.tolist() for x in df.columns]
This will preserve the data type of each column.
这将保留每列的数据类型。
Than convert to the structure you want:
比转换为你想要的结构:
list(list(x) for x in zip(*by_column))
You can do it in one line:
您可以在一行中完成:
list(list(x) for x in zip(*(df[x].values.tolist() for x in df.columns)))
You can check what datatypes your columns have with:
您可以检查您的列具有哪些数据类型:
df.info()
Very likely your column amount
is of type float
. Do you have any NaN
in this column? These are always of type float
and would make the whole column float
.
您的列很可能amount
是 类型float
。你有NaN
这个专栏吗?这些总是类型的float
并且可以构成整个列float
。
You can cast to int
with:
你可以投射到int
:
df.values.astype(int).tolist()
回答by Pachelbel
I think the pandas documentation helps:
我认为Pandas文档有帮助:
DataFrame.values
Numpy representation of NDFrame
The dtype will be a lower-common-denominator dtype (implicit upcasting); that is to say if the dtypes (even of numeric types) are mixed, the one that accommodates all will be chosen. Use this with care if you are not dealing with the blocks.
DataFrame.values
NDFrame 的 Numpy 表示
dtype 将是一个较低的公分母 dtype(隐式向上转换);也就是说,如果 dtypes(甚至是数字类型)混合在一起,则将选择容纳所有类型的 dtypes。如果您不处理块,请小心使用它。
So here apparently float is chosen to accomodate all component types. A simple method would be (however, most possibly there are more elegant solutions around, I'm not too familiar with pandas):
所以这里显然选择了 float 来容纳所有组件类型。一个简单的方法是(但是,很可能有更优雅的解决方案,我对Pandas不太熟悉):
datatmp = map(lambda row: list(row[1:]), df.itertuples())
Here the itertuples()
gives an iterator with elements of the form (rownumber, colum1_entry, colum2_entry, ...). The map takes each such tuple and applies the lambda function, which removes the first component (rownumber), and returns a list containing the components of a single row. You can also remove the list()
invocation if it's ok for you to work with a list of tuples.
这里itertuples()
给出了一个具有以下形式元素的迭代器 (rownumber, colum1_entry, colum2_entry, ...)。该映射采用每个这样的元组并应用 lambda 函数,该函数删除第一个组件(行号),并返回一个包含单行组件的列表。如果您可以list()
使用元组列表,您也可以删除调用。
[Dataframe values property][1] "http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.values.html#pandas.DataFrame.values"
[数据框值属性][1]“ http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.values.html#pandas.DataFrame.values”