具有不同长度数组的 Pandas

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28798504/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:59:51  来源:igfitidea点击:

Pandas with different length arrays

pythonarrayspandasdataframe

提问by DIGSUM

This is the code I have. Due to content of the raw data to be parsed, I end up with the 'user list' and the 'tweet list' being of different length. When writing the lists as columns in a data frame, I get ValueError: arrays must all be same length. I realize this, but have been looking for a way to work around it, printing 0or NaNin the right places of the shorter array. Any ideas?

这是我的代码。由于要解析的原始数据的内容,我最终得到了不同长度的“用户列表”和“推文列表”。在数据框中将列表编写为列时,我得到ValueError: arrays must all be same length. 我意识到这一点,但一直在寻找一种方法来解决它,打印0NaN在较短数组的正确位置。有任何想法吗?

import pandas
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('#raw.html'))
chunk = soup.find_all('div', class_='content')

userlist = []
tweetlist = []

for tweet in chunk:
    username = tweet.find_all(class_='username js-action-profile-name')
    for user in username:
        user2 = user.get_text()
        userlist.append(user2)

for text in chunk:
    tweets = text.find_all(class_='js-tweet-text tweet-text')
for tweet in tweets:
    tweet2 = tweet.get_text().encode('utf-8')
    tweetlist.append('|'+tweet2)

print len(tweetlist)
print len(userlist)

#MAKE A DATAFRAME WITH THIS
data = {'tweet' : tweetlist, 'user' : userlist}
frame = pandas.DataFrame(data)
print frame

# Export dataframe to csv
frame.to_csv('#parsed.csv', index=False)

回答by Dmitriy Kuznetsov

I'm not sure that this is exactly what you want, but anyway:

我不确定这是否正是您想要的,但无论如何:

d = dict(tweets=tweetlist, users=userlist)
pandas.DataFrame({k : pandas.Series(v) for k, v in d.iteritems()})

回答by Ekrem Gurdal

Try this:

尝试这个:

frame = pandas.DataFrame.from_dict(d, orient='index')

After that, you should transpose your frame with:

之后,您应该使用以下方法转置您的框架:

frame = frame.transpose()

Then you can export to csv:

然后你可以导出到csv:

frame.to_csv('#parsed.csv', index=False)