IPython Notebook 和 Pandas 自动完成

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21470495/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:38:37  来源:igfitidea点击:

IPython Notebook and Pandas autocomplete

pythonautocompletepandasipython-notebook

提问by metersk

I noticed if I were to type df.column_name(), I can autocomplete the column_namewith a tab in IPython notebook.

我注意到如果我要输入df.column_name(),我可以column_name使用 IPython 笔记本中的选项卡自动完成。

Now, the proper syntax for doing something to a column would be df['column_name'], where I am unable to autocomplete (I am assuming because it is a string?). Is there any other notation or way to simplyfy typing out column names. I am essentailly looking for a solution that would allow me to tab autocomplete the column name within this df['column_name'].

现在,对列执行某些操作的正确语法是df['column_name'],我无法自动完成(我假设它是一个字符串?)。是否有任何其他符号或方法可以简单地键入列名。我正在寻找一种解决方案,该解决方案允许我在此df['column_name'].

采纳答案by Maturin

I've found the following method to be useful to me. It basically creates a namedtuplecontaining the names of all the variables in the data frame as strings.

我发现以下方法对我有用。它基本上创建了一个namedtuple包含数据框中所有变量名称的字符串。

For example, consider the following data frame containing 2 variables called "variable_1" and "variable_2":

例如,考虑以下包含 2 个变量的数据框,称为“variable_1”和“variable_2”:

from collections import namedtuple
from pandas import DataFrame
import numpy as np

df = DataFrame({'variable_1':np.arange(5),'variable_2':np.arange(5)})

The following code creates a namedtuple called "var":

以下代码创建了一个名为“var”的命名元组:

def ntuples():
    list_of_names = df.columns.values
    list_of_names_dict = {x:x for x in list_of_names}

    Varnames = namedtuple('Varnames', list_of_names) 
    return Varnames(**list_of_names_dict)

var = ntuples()

In a notebook, when I write var.and press Tab, the names of all the variables in the dataframe dfwill be displayed. Writing var.variable_1is equivalent to writing 'variable_1'. So the following would work: df[var.variable_1].

在笔记本中,当我编写var.并按 Tab 键时,df将显示数据框中所有变量的名称。写入var.variable_1相当于写入'variable_1'。因此,以下将起作用:df[var.variable_1].

The reason I define a function to do it is that often times you will add new variables to a data frame. In order to update the new variables to your namedtuple "var" simply call the function again, ntuples(), and you are good to go.

我定义一个函数来执行此操作的原因是,您通常会向数据框添加新变量。为了将新变量更新到您的命名元组“var”,只需再次调用该函数,ntuples()就可以了。

回答by gobrewers14

I'm not sure how your data is situated but when I am importing a csv/txt file, I specify the names of the columns in a list, such as...

我不确定您的数据的位置,但是当我导入 csv/txt 文件时,我在列表中指定了列的名称,例如...

names = ['col_1', 'col_2', 'col_3']

etc... and then import my file as such...

等等...然后导入我的文件...

import pandas as pd
data = pd.read_csv('./some_file.txt', header = True, delimiter = '\t', names = names)

You could then do tab completion like...

然后,您可以执行选项卡完成,例如...

new_thing = data[names[1]]

where you would be hitting tab as you started to type "names" and then all you would have to do is specify what 'name' item you wanted. I not sure if this is any more efficient then simply typing out the word.

当您开始键入“名称”时,您将在其中点击选项卡,然后您所要做的就是指定您想要的“名称”项目。我不确定这是否比简单地输入单词更有效。