如何制作从大型 xlsx 文件加载 Pandas DataFrame 的进度条?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/52209290/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I make a progress bar for loading pandas DataFrame from a large xlsx file?
提问by user2303336
from https://pypi.org/project/tqdm/:
来自https://pypi.org/project/tqdm/:
import pandas as pd
import numpy as np
from tqdm import tqdm
df = pd.DataFrame(np.random.randint(0, 100, (100000, 6)))
tqdm.pandas(desc="my bar!")p`
df.progress_apply(lambda x: x**2)
I took this code and edited it so that I create a DataFrame from load_excel rather than using random numbers:
我使用了这段代码并对其进行了编辑,以便从 load_excel 创建一个 DataFrame 而不是使用随机数:
import pandas as pd
from tqdm import tqdm
import numpy as np
filename="huge_file.xlsx"
df = pd.DataFrame(pd.read_excel(filename))
tqdm.pandas()
df.progress_apply(lambda x: x**2)
This gave me an error, so I changed df.progress_apply to this:
这给了我一个错误,所以我将 df.progress_apply 更改为:
df.progress_apply(lambda x: x)
Here is the final code:
这是最终的代码:
import pandas as pd
from tqdm import tqdm
import numpy as np
filename="huge_file.xlsx"
df = pd.DataFrame(pd.read_excel(filename))
tqdm.pandas()
df.progress_apply(lambda x: x)
This results in a progress bar, but it doesn't actually show any progress, rather it loads the bar, and when the operation is done it jumps to 100%, defeating the purpose.
这会产生一个进度条,但它实际上并没有显示任何进度,而是加载进度条,当操作完成时它跳转到 100%,违背了目的。
My question is this: How do I make this progress bar work?
What does the function inside of progress_apply actually do?
Is there a better approach? Maybe an alternative to tqdm?
我的问题是:如何使这个进度条工作?
progress_apply 里面的函数实际上做了什么?
有没有更好的方法?也许是 tqdm 的替代品?
Any help is greatly appreciated.
任何帮助是极大的赞赏。
回答by rocksportrocker
Will not work. pd.read_excel
blocks until the file is read, and there is no way to get information from this function about its progress during execution.
不管用。pd.read_excel
阻塞直到读取文件,并且无法从该函数中获取有关其执行过程中进度的信息。
It would work for read operations which you can do chunk wise, like
它适用于您可以按块进行的读取操作,例如
chunks = []
for chunk in pd.read_csv(..., chunksize=1000):
update_progressbar()
chunks.append(chunk)
But as far as I understand tqdm
also needs the number of chunks in advance, so for a propper progress report you would need to read the full file first....
但据我所知,tqdm
还需要提前知道块的数量,因此对于正确的进度报告,您需要先阅读完整文件....