Javascript Puppeteer - 向下滚动直到你不能再滚动

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51529332/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-23 04:51:26  来源:igfitidea点击:

Puppeteer - scroll down until you can't anymore

javascriptnode.jspuppeteer

提问by user1584421

I am in a situation where new content is created when i scroll down. The new content has a specific class name.

当我向下滚动时,我处于创建新内容的情况。新内容具有特定的类名。

How can i keep scrolling down until all the elements has loaded? In other words, i want to reach the stage where if i keep scrolling down, nothing new will load.

我怎样才能继续向下滚动直到所有元素都加载完毕?换句话说,我想达到这样的阶段:如果我继续向下滚动,则不会加载任何新内容。

I was using code to scroll down, coupled with an

我正在使用代码向下滚动,再加上

await page.waitForSelector('.class_name');

The problem with this approach is that after all the elements have loaded, the code keeps on scrolling down, no new elements are created and eventually i get a timeout error.

这种方法的问题在于,在所有元素都加载后,代码继续向下滚动,没有创建新元素,最终出现超时错误。

EDIT: This is the code

编辑:这是代码

await page.evaluate( () => {
                window.scrollBy(0, window.innerHeight);
            });
await page.waitForSelector('.class_name');

回答by Cory

Give this a shot:

试一试:

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch({
        headless: false
    });
    const page = await browser.newPage();
    await page.goto('https://www.yoursite.com');
    await page.setViewport({
        width: 1200,
        height: 800
    });

    await autoScroll(page);

    await page.screenshot({
        path: 'yoursite.png',
        fullPage: true
    });

    await browser.close();
})();

async function autoScroll(page){
    await page.evaluate(async () => {
        await new Promise((resolve, reject) => {
            var totalHeight = 0;
            var distance = 100;
            var timer = setInterval(() => {
                var scrollHeight = document.body.scrollHeight;
                window.scrollBy(0, distance);
                totalHeight += distance;

                if(totalHeight >= scrollHeight){
                    clearInterval(timer);
                    resolve();
                }
            }, 100);
        });
    });
}

Source: https://github.com/chenxiaochun/blog/issues/38

来源:https: //github.com/chenxiaochun/blog/issues/38

回答by kimbaudi

Scrolling down to the bottom of the page can be accomplished in 2 ways:

向下滚动到页面底部可以通过两种方式完成:

  1. use scrollIntoView(to scroll to the part of the page that can create more content at the bottom) and selectors (i.e., document.querySelectorAll('.class_name').lengthto check whether more content has been generated)
  2. use scrollBy(to incrementally scroll down the page) and either setTimeoutor setInterval(to incrementally check whether we are at the bottom of the page)
  1. 使用scrollIntoView(滚动到页面底部可以创建更多内容的部分)和选择器(即document.querySelectorAll('.class_name').length检查是否生成了更多内容)
  2. 使用scrollBy(逐步向下滚动页面)和setTimeoutsetInterval(逐步检查我们是否在页面底部)

Here is an implementation using scrollIntoViewand selector (assuming .class_nameis the selector that we scroll into for more content) in plain JavaScript that we can run in the browser:

这是一个使用scrollIntoView和选择器(假设.class_name是我们滚动到更多内容的选择器)在纯 JavaScript 中的实现,我们可以在浏览器中运行:

Method 1: use scrollIntoView and selectors

方法一:使用 scrollIntoView 和选择器

const delay = 3000;
const wait = (ms) => new Promise(res => setTimeout(res, ms));
const count = async () => document.querySelectorAll('.class_name').length;
const scrollDown = async () => {
  document.querySelector('.class_name:last-child')
    .scrollIntoView({ behavior: 'smooth', block: 'end', inline: 'end' });
}

let preCount = 0;
let postCount = 0;
do {
  preCount = await count();
  await scrollDown();
  await wait(delay);
  postCount = await count();
} while (postCount > preCount);
await wait(delay);

In this method, we are comparing the # of .class_nameselectors before scrolling (preCount) vs after scrolling (postCount) to check whether we are at bottom of page:

在此方法中,我们比较.class_name滚动前 ( preCount) 和滚动后 ( postCount)的选择器数量,以检查我们是否位于页面底部:

if (postCount > precount) {
  // NOT bottom of page
} else {
  // bottom of page
}

And here are 2 possible implementations using either setTimeoutor setIntervalwith scrollByin plain JavaScript that we can run in the browser console:

这里有两种可能的实现,使用setTimeoutsetInterval使用scrollBy纯 JavaScript,我们可以在浏览器控制台中运行:

Method 2a: use setTimeout with scrollBy

方法 2a:使用 setTimeout 和 scrollBy

const distance = 100;
const delay = 100;
while (document.scrollingElement.scrollTop + window.innerHeight < document.scrollingElement.scrollHeight) {
  document.scrollingElement.scrollBy(0, distance);
  await new Promise(resolve => { setTimeout(resolve, delay); });
}

Method 2b: use setInterval with scrollBy

方法 2b:使用 setInterval 和 scrollBy

const distance = 100;
const delay = 100;
const timer = setInterval(() => {
  document.scrollingElement.scrollBy(0, distance);
  if (document.scrollingElement.scrollTop + window.innerHeight >= document.scrollingElement.scrollHeight) {
    clearInterval(timer);
  }
}, delay);

In this method, we are comparing document.scrollingElement.scrollTop + window.innerHeightwith document.scrollingElement.scrollHeightto check whether we are at the bottom of the page:

在这种方法中,我们是在比较document.scrollingElement.scrollTop + window.innerHeightdocument.scrollingElement.scrollHeight检查我们是否在页面的底部:

if (document.scrollingElement.scrollTop + window.innerHeight < document.scrollingElement.scrollHeight) {
  // NOT bottom of page
} else {
  // bottom of page
}

If either of the JavaScript code above scrolls the page all the way down to the bottom, then we know it is working and we can automate this using Puppeteer.

如果上面的任一 JavaScript 代码将页面一直向下滚动到底部,那么我们就知道它正在工作,我们可以使用 Puppeteer 自动执行此操作。

Here are the sample Puppeteer Node.js scripts that will scroll down to the bottom of the page and wait a few seconds before closing the browser.

下面是示例 Puppeteer Node.js 脚本,它们将向下滚动到页面底部并在关闭浏览器之前等待几秒钟。

Puppeteer Method 1: use scrollIntoView with selector (.class_name)

Puppeteer 方法 1:使用带有选择器 ( .class_name) 的scrollIntoView

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: false,
    defaultViewport: null,
    args: ['--window-size=800,600']
  });
  const page = await browser.newPage();
  await page.goto('https://example.com');

  const delay = 3000;
  let preCount = 0;
  let postCount = 0;
  do {
    preCount = await getCount(page);
    await scrollDown(page);
    await page.waitFor(delay);
    postCount = await getCount(page);
  } while (postCount > preCount);
  await page.waitFor(delay);

  await browser.close();
})();

async function getCount(page) {
  return await page.$$eval('.class_name', a => a.length);
}

async function scrollDown(page) {
  await page.$eval('.class_name:last-child', e => {
    e.scrollIntoView({ behavior: 'smooth', block: 'end', inline: 'end' });
  });
}

Puppeteer Method 2a: use setTimeout with scrollBy

Puppeteer 方法 2a:使用 setTimeout 和 scrollBy

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: false,
    defaultViewport: null,
    args: ['--window-size=800,600']
  });
  const page = await browser.newPage();
  await page.goto('https://example.com');

  await scrollToBottom(page);
  await page.waitFor(3000);

  await browser.close();
})();

async function scrollToBottom(page) {
  const distance = 100; // should be less than or equal to window.innerHeight
  const delay = 100;
  while (await page.evaluate(() => document.scrollingElement.scrollTop + window.innerHeight < document.scrollingElement.scrollHeight)) {
    await page.evaluate((y) => { document.scrollingElement.scrollBy(0, y); }, distance);
    await page.waitFor(delay);
  }
}

Puppeteer Method 2b: use setInterval with scrollBy

Puppeteer 方法 2b:使用 setInterval 和 scrollBy

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: false,
    defaultViewport: null,
    args: ['--window-size=800,600']
  });
  const page = await browser.newPage();
  await page.goto('https://example.com');

  await page.evaluate(scrollToBottom);
  await page.waitFor(3000);

  await browser.close();
})();

async function scrollToBottom() {
  await new Promise(resolve => {
    const distance = 100; // should be less than or equal to window.innerHeight
    const delay = 100;
    const timer = setInterval(() => {
      document.scrollingElement.scrollBy(0, distance);
      if (document.scrollingElement.scrollTop + window.innerHeight >= document.scrollingElement.scrollHeight) {
        clearInterval(timer);
        resolve();
      }
    }, delay);
  });
}

回答by x-magix

based on answer from this url

基于这个网址的答案

await page.evaluate(() => {
  window.scrollBy(0, window.innerHeight);
});

回答by guest

You need to ask yourself whether you're scrolling into an element that requires the page to lazy load data before reaching this DOM. For example, this sephora page: https://www.sephora.com/search?keyword=clean%20at%20sephora

您需要问问自己是否正在滚动到需要页面在到达此 DOM 之前延迟加载数据的元素。比如这个丝芙兰页面:https://www.sephora.com/search ?keyword =clean%20at%20sephora

If so, you need to wait for the promise to load before getting to the footer for example, and scrollToElementlike the solutions above without using a promise will not get you to the end of the element.

如果是这样,例如,您需要等待 Promise 加载,然后才能到达页脚,并且scrollToElement像上面的解决方案一样,不使用 Promise 不会让您到达元素的末尾。

You need to inject a Promiseinside page.evaluatein that case.

在这种情况下,您需要注入Promise内部page.evaluate

async function autoScroll(page) {
  await page.evaluate(async () => {
    await new Promise((resolve, reject) => {
      var totalHeight = 0;
      var distance = 100;
      var timer = setInterval(() => {
        var scrollHeight = document.body.scrollHeight;
        window.scrollBy(0, distance);
        totalHeight += distance;

        if (totalHeight >= scrollHeight) {
          clearInterval(timer);
          resolve();
        }
      }, 100);
    });
  });
}
await autoScroll(page);

回答by Vasul dubyuk

You might just use the following code using page.keyboardobject:

您可能只使用以下代码使用page.keyboard对象:

await page.keyboard.press('ArrowDown');
delay(2000) //wait for 2 seconds
await page.keyboard.press('ArrowUp');
function delay(milliseconds) { //function for waiting
        return new Promise(resolve => {
          setTimeout(() => {
            resolve();
          }, milliseconds);
        });
      }

回答by nagy.zsolt.hun

Many solutions here assume the page height being constant. This implementation works even if the page height changes (e.g. loading new content as user scrolls down).

这里的许多解决方案都假设页面高度是恒定的。即使页面高度发生变化(例如,当用户向下滚动时加载新内容),该实现也能工作。

await page.evaluate(() => new Promise((resolve) => {
  var scrollTop = -1;
  const interval = setInterval(() => {
    window.scrollBy(0, 100);
    if(document.documentElement.scrollTop !== scrollTop) {
      scrollTop = document.documentElement.scrollTop;
      return;
    }
    clearInterval(interval);
    resolve();
  }, 10);
}));