如何使用字典键和值来重命名 Pandas DataFrame 中的列?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41783176/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I use dictionary keys and values to rename columns in a pandas DataFrame?
提问by ZacAttack
I am building functions to help me load data from the web. The problem I am trying to solve as far as loading data is that column names are different depending on the source. For example, Yahoo Finance data column headings look like this Open, High, Low, Close, Volume, Adj Close. Quandl.com will have data sets that have DATE,VALUE,date,value etc. The all upper case and lowercase throws everything off and Value and Adj. Close for the most part mean the same thing. I want to associate columns with different names but the same meaning to one value. For example Adj. Close and value both = AC; Open, OPEN, and open all = O.
我正在构建函数来帮助我从网络加载数据。就加载数据而言,我试图解决的问题是列名因来源而异。例如,Yahoo Finance 数据列标题看起来像这样 Open、High、Low、Close、Volume、Adj Close。Quandl.com 将拥有包含 DATE、VALUE、日期、值等的数据集。所有大写和小写字母都将丢弃所有内容以及 Value 和 Adj。关闭在大多数情况下意味着相同的事情。我想将具有不同名称但含义相同的列与一个值相关联。例如 Adj。关闭并评估两者 = AC; 打开,打开,然后打开所有 = O。
So I have a Csv file ("Functions//ColumnNameChanges.txt") that stores dict() keys and values of column names.
所以我有一个 Csv 文件(“Functions//ColumnNameChanges.txt”),它存储 dict() 键和列名的值。
Date,D
Open,O
High,H
and then I wrote this function to populate my dictionary
然后我写了这个函数来填充我的字典
def DictKeyValuesFromText ():
Dictionary = {}
TextFileName = "Functions//ColumnNameChanges.txt"
with open(TextFileName,'r') as f:
for line in f:
x = line.find(",")
y = line.find("/")
k = line[0:x]
v = line[x+1:y]
Dictionary[k] = v
return Dictionary
This is the output of print(DictKeyValuesFromText())
这是 print(DictKeyValuesFromText()) 的输出
{'': '', 'Date': 'D', 'High': 'H', 'Open': 'O'}
The next function is where my problems are at
下一个功能是我的问题所在
def ChangeColumnNames(DataFrameFileLocation):
x = DictKeyValuesFromText()
df = pd.read_csv(DataFrameFileLocation)
for y in df.columns:
if y not in x.keys():
i = input("The column " + y + " is not in the list, give a name:")
df.rename(columns={y:i})
else:
df.rename(columns={y:x[y]})
return df
df.rename is not working. This is the output I get print(ChangeColumnNames("Tvix_data.csv"))
df.rename 不起作用。这是我得到的输出 print(ChangeColumnNames("Tvix_data.csv"))
The column Low is not in the list, give a name:L
The column Close is not in the list, give a name:C
The column Volume is not in the list, give a name:V
The column Adj Close is not in the list, give a name:AC
Date Open High Low Close Volume \
0 2010-11-30 106.269997 112.349997 104.389997 112.349997 0
1 2010-12-01 99.979997 100.689997 98.799998 100.689997 0
2 2010-12-02 98.309998 98.309998 86.499998 86.589998 0
The columns names should be D, O, H, L, C, V. I am missing something any help would be appreciated.
列名应该是 D、O、H、L、C、V。我错过了一些任何帮助,将不胜感激。
回答by DeepSpace
df.rename
works just fine, but it is not inplace by default. Either re-assign its return value or use inplace=True
. It expects a dictionary with old names as keys and new names as values.
df.rename
工作得很好,但默认情况下它不是就位的。重新分配其返回值或使用inplace=True
. 它需要一个以旧名称作为键和新名称作为值的字典。
df = df.rename({'col_a': 'COL_A', 'col_b': 'COL_B'})
df = df.rename({'col_a': 'COL_A', 'col_b': 'COL_B'})
or
或者
df.rename({'col_a': 'COL_A', 'col_b': 'COL_B'}, inplace=True)
df.rename({'col_a': 'COL_A', 'col_b': 'COL_B'}, inplace=True)
回答by Loochie
Well, when you already have the dictionary store it in a variable say
好吧,当您已经将字典存储在变量中时,请说
DC = {'': '', 'Date': 'D', 'High': 'H', 'Open': 'O'}
DC can now be mapped to the dataframe columns like
DC 现在可以映射到数据框列,如
df.columns = df.columns.map(DC)
In case you want to use rename() method you can simply go with
如果您想使用 rename() 方法,您可以简单地使用
df = df.rename(columns = DC)