Pandas Dataframe 添加标头而不替换当前标头
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19530708/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Dataframe add header without replacing current header
提问by horatio1701d
How can I add a header to a DF without replacing the current one? In other words I just want to shift the current header down and just add it to the dataframe as another record.
如何在不替换当前标题的情况下向 DF 添加标题?换句话说,我只想将当前标题向下移动,并将其作为另一条记录添加到数据帧中。
*secondary question: How do I add tables (example dataframe) to stackoverflow question?
*次要问题:如何将表(示例数据框)添加到计算器溢出问题?
I have this (Note header and how it is just added as a row:
我有这个(注意标题以及它是如何作为一行添加的:
0.213231 0.314544
0 -0.952928 -0.624646
1 -1.020950 -0.883333
I need this (all other records are shifted down and a new record is added) (also: I couldn't read the csv properly because I'm using s3_text_adapter for the import and I couldn't figure out how to have an argument that ignores header similar to pandas read_csv):
我需要这个(所有其他记录都向下移动并添加了一个新记录)(还有:我无法正确读取 csv,因为我使用 s3_text_adapter 进行导入,我无法弄清楚如何有一个论点忽略类似于 pandas read_csv 的标题):
A B
0 0.213231 0.314544
1 -1.020950 -0.883333
回答by Andy Hayden
Another option is to add it as an additional level of the column index, to make it a MultiIndex:
另一种选择是将其添加为列索引的附加级别,使其成为 MultiIndex:
In [11]: df = pd.DataFrame(randn(2, 2), columns=['A', 'B'])
In [12]: df
Out[12]:
A B
0 -0.952928 -0.624646
1 -1.020950 -0.883333
In [13]: df.columns = pd.MultiIndex.from_tuples(zip(['AA', 'BB'], df.columns))
In [14]: df
Out[14]:
AA BB
A B
0 -0.952928 -0.624646
1 -1.020950 -0.883333
This has the benefit of keeping the correct dtypes for the DataFrame, so you can still do fast and correct calculations on your DataFrame, and allows you to access by both the old and new column names.
这样做的好处是可以为 DataFrame 保留正确的 dtype,因此您仍然可以对 DataFrame 进行快速正确的计算,并允许您通过旧列名和新列名进行访问。
.
.
For completeness, here's DSM's (deleted answer), making the columns a row, which, as mentioned already, is usually not a good idea:
为了完整起见,这里是 DSM(已删除的答案),将列设为一行,正如已经提到的,这通常不是一个好主意:
In [21]: df_bad_idea = df.T.reset_index().T
In [22]: df_bad_idea
Out[22]:
0 1
index A B
0 -0.952928 -0.624646
1 -1.02095 -0.883333
Note, the dtype may change (if these are column names rather than proper values) as in this case... so be careful if you actually plan to do any work on this as it will likely be slower and may even fail:
请注意,在这种情况下,dtype 可能会更改(如果这些是列名而不是正确的值)……所以如果您真的打算对此进行任何工作,请小心,因为它可能会变慢,甚至可能会失败:
In [23]: df.sum()
Out[23]:
A -1.973878
B -1.507979
dtype: float64
In [24]: df_bad_idea.sum() # doh!
Out[24]: Series([], dtype: float64)
If the column names areactually a row that was mistaken as a header row then you should correct this on reading in the data(e.g. read_csvuse header=None).
如果列名是实际的行,这是误认为是标题行,那么你应该纠正这种在数据读取(如read_csv使用header=None)。
回答by Brad123
The key is to specify header=None and use column to add header:
关键是指定 header=None 并使用 column 添加标题:
data = pd.read_csv('file.csv', skiprows=2, header=None ) # skip blank rows if applicable
df = pd.DataFrame(data)
df = df.iloc[ : , [0,1]] # columns 1 and 2
df.columns = ['A','B'] # title

