Python Panda AssertionError 列传递,传递的数据有 2 列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38927230/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:38:04  来源:igfitidea点击:

Panda AssertionError columns passed, passed data had 2 columns

pythonpandasdataframenltkazure-machine-learning-studio

提问by Sudheej

I am working on Azure ML implementation on text analytics with NLTK, the following execution is throwing

我正在使用 NLTK 进行文本分析的 Azure ML 实现,以下执行正在抛出

AssertionError: 1 columns passed, passed data had 2 columns\r\nProcess returned with non-zero exit code 1

Below is the code

下面是代码

# The script MUST include the following function,
# which is the entry point for this module:
# Param<dataframe1>: a pandas.DataFrame
# Param<dataframe2>: a pandas.DataFrame
def azureml_main(dataframe1 = None, dataframe2 = None):
    # import required packages
    import pandas as pd
    import nltk
    import numpy as np
    # tokenize the review text and store the word corpus
    word_dict = {}
    token_list = []
    nltk.download(info_or_id='punkt', download_dir='C:/users/client/nltk_data')
    nltk.download(info_or_id='maxent_treebank_pos_tagger', download_dir='C:/users/client/nltk_data')
    for text in dataframe1["tweet_text"]:
        tokens = nltk.word_tokenize(text.decode('utf8'))
        tagged = nltk.pos_tag(tokens)


      # convert feature vector to dataframe object
    dataframe_output = pd.DataFrame(tagged, columns=['Output'])
    return [dataframe_output]

Error is throwing here

错误在这里抛出

 dataframe_output = pd.DataFrame(tagged, columns=['Output'])

I suspect this to be the tagged data type passed to dataframe, can some one let me know the right approach to add this to dataframe.

我怀疑这是传递给数据框的标记数据类型,有人可以让我知道将其添加到数据框的正确方法吗?

回答by ragesz

Try this:

尝试这个:

dataframe_output = pd.DataFrame(tagged, columns=['Output', 'temp'])