Python 将熊猫数据框转换为系列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33246771/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert pandas data frame to series
提问by user1357015
I'm somewhat new to pandas. I have a pandas data frame that is 1 row by 23 columns.
我对熊猫有点陌生。我有一个 1 行 x 23 列的熊猫数据框。
I want to convert this into a series? I'm wondering what the most pythonic way to do this is?
我想把这个转换成一个系列?我想知道最pythonic的方法是什么?
I've tried pd.Series(myResults)
but it complains ValueError: cannot copy sequence with size 23 to array axis with dimension 1
. It's not smart enough to realize it's still a "vector" in math terms.
我试过了,pd.Series(myResults)
但它抱怨ValueError: cannot copy sequence with size 23 to array axis with dimension 1
。意识到它仍然是数学术语中的“向量”还不够聪明。
Thanks!
谢谢!
采纳答案by DSM
It's not smart enough to realize it's still a "vector" in math terms.
意识到它仍然是数学术语中的“向量”还不够聪明。
Say rather that it's smart enough to recognize a difference in dimensionality. :-)
而是说它足够聪明,可以识别维度差异。:-)
I think the simplest thing you can do is select that row positionally using iloc
, which gives you a Series with the columns as the new index and the values as the values:
我认为您可以做的最简单的事情是使用 选择该行iloc
,这为您提供了一个以列作为新索引和值作为值的系列:
>>> df = pd.DataFrame([list(range(5))], columns=["a{}".format(i) for i in range(5)])
>>> df
a0 a1 a2 a3 a4
0 0 1 2 3 4
>>> df.iloc[0]
a0 0
a1 1
a2 2
a3 3
a4 4
Name: 0, dtype: int64
>>> type(_)
<class 'pandas.core.series.Series'>
回答by themachinist
You can retrieve the series through slicing your dataframe using one of these two methods:
您可以使用以下两种方法之一通过切片数据框来检索系列:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.htmlhttp://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html
import pandas as pd
import numpy as np
df = pd.DataFrame(data=np.random.randn(1,8))
series1=df.iloc[0,:]
type(series1)
pandas.core.series.Series
回答by Alexander
You can transpose the single-row dataframe (which still results in a dataframe) and then squeezethe results into a series (the inverse of to_frame
).
您可以转置单行数据帧(仍会生成数据帧),然后将结果压缩为一个系列(与 的相反to_frame
)。
df = pd.DataFrame([list(range(5))], columns=["a{}".format(i) for i in range(5)])
>>> df.T.squeeze() # Or more simply, df.squeeze() for a single row dataframe.
a0 0
a1 1
a2 2
a3 3
a4 4
Name: 0, dtype: int64
Note:To accommodate the point raised by @IanS (even though it is not in the OP's question), test for the dataframe's size. I am assuming that df
is a dataframe, but the edge cases are an empty dataframe, a dataframe of shape (1, 1), and a dataframe with more than one row in which case the use should implement their desired functionality.
注意:为了适应@IanS 提出的观点(即使它不在 OP 的问题中),请测试数据帧的大小。我假设这df
是一个数据框,但边缘情况是一个空的数据框、一个形状为 (1, 1) 的数据框和一个多于一行的数据框,在这种情况下,使用应该实现其所需的功能。
if df.empty:
# Empty dataframe, so convert to empty Series.
result = pd.Series()
elif df.shape == (1, 1)
# DataFrame with one value, so convert to series with appropriate index.
result = pd.Series(df.iat[0, 0], index=df.columns)
elif len(df) == 1:
# Convert to series per OP's question.
result = df.T.squeeze()
else:
# Dataframe with multiple rows. Implement desired behavior.
pass
This can also be simplified along the lines of the answer provided by @themachinist.
这也可以按照@themachinist 提供的答案进行简化。
if len(df) > 1:
# Dataframe with multiple rows. Implement desired behavior.
pass
else:
result = pd.Series() if df.empty else df.iloc[0, :]
回答by Tauseef Malik
Another way -
其它的办法 -
Suppose myResult is the dataFrame that contains your data in the form of 1 col and 23 rows
假设 myResult 是包含 1 列和 23 行形式的数据的数据帧
// label your columns by passing a list of names
myResult.columns = ['firstCol']
// fetch the column in this way, which will return you a series
myResult = myResult['firstCol']
print(type(myResult))
In similar fashion, you can get series from Dataframe with multiple columns.
以类似的方式,您可以从具有多列的 Dataframe 中获取系列。
回答by user12230680
data = pd.DataFrame({"a":[1,2,3,34],"b":[5,6,7,8]})
new_data = pd.melt(data)
new_data.set_index("variable", inplace=True)
This gives a dataframe with index as column name of data and all data are present in "values" column
这给出了一个带有索引作为数据列名的数据框,所有数据都存在于“值”列中