pandas 多线程中的熊猫数据框

Question

提问by Yasir Azeem

Can someone tell me a way to add data into pandas dataframe in python while multiple threads are going to use a function in which data has to be appended into a dataframe...?

有人能告诉我一种在python中将数据添加到pandas数据帧的方法，而多个线程将使用一个必须将数据附加到数据帧的函数......？

My code scrapes data from a URL and then i was using df.loc[index]... to add the scrapped row into the dataframe.

我的代码从 URL 中抓取数据，然后我使用 df.loc[index]... 将废弃的行添加到数据框中。

Since I've started a multi thread which basically assigns each URL to each thread. So in short many pages are being scraped at once...

因为我已经启动了一个多线程，它基本上将每个 URL 分配给每个线程。简而言之，许多页面同时被刮掉......

How do I append those rows into the dataframe?

如何将这些行附加到数据框中？

Answer 1

采纳答案by exp1orer

Adding rows to dataframes one-by-one is not recommended. I suggest you build your data in lists, then combine those lists at the end, and then only call the DataFrame constructor once at the end on the full data set.

不建议将行一一添加到数据帧。我建议你在列表中构建你的数据，然后在最后组合这些列表，然后在完整数据集的最后只调用一次 DataFrame 构造函数。

Example:

例子：

# help from http://stackoverflow.com/a/28463266/3393459
# and http://stackoverflow.com/a/2846697/3393459


from multiprocessing.dummy import Pool as ThreadPool 
import requests
import pandas as pd


pool = ThreadPool(4) 

# called by each thread
def get_web_data(url):
    return {'col1': 'something', 'request_data': requests.get(url).text}


urls = ["http://google.com", "http://yahoo.com"]
results = pool.map(get_web_data, urls)


print results
print pd.DataFrame(results)

pandas 多线程中的熊猫数据框

提问by Yasir Azeem

采纳答案by exp1orer

相关推荐

最近更新

标签

pandas 多线程中的熊猫数据框

提问by Yasir Azeem

采纳答案by exp1orer

相关推荐

pandas datareader 引发 AttributeError：模块“pandas.io”没有属性“data”

pandas 在熊猫数据框中用 NaN 替换空列表

如何在 Pandas/numpy 中将一系列数组转换为单个矩阵？

pandas 如何在熊猫中创建多级数据框？

相关推荐

最近更新

标签