javascript 想使用 Puppeteer 刮表。如何获取所有行,遍历行,然后为每一行获取“td”?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49236981/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 08:29:38  来源:igfitidea点击:

Want to scrape table using Puppeteer. How can I get all rows, iterate through rows, and then get "td's" for each row?

javascripthtmlnode.jspuppeteerheadless-browser

提问by user838426

I have Puppeteer setup, and I was able get all of the rows using:

我有 Puppeteer 设置,并且能够使用以下方法获取所有行:

let rows = await page.$$eval('#myTable tr', row => row);

Now I want for each row to get "td's" and then get the innerTextfrom those.

现在我希望每一行都得到“ td's”,然后innerText从那些中得到。

Basically I want to do this:

基本上我想这样做:

var tds = myRow.querySelectorAll("td");

Where myRowis a table row, with Puppeteer.

myRow表格行在哪里,有 Puppeteer。

回答by Rippo

One way to achieve this is to use evaluate that first gets an array of all the TD'sthen returns the textContent of each TD

实现这一点的一种方法是使用评估,首先获取所有的数组,TD's然后返回每个的 textContentTD

const puppeteer = require('puppeteer');

const html = `
<html>
    <body>
      <table>
      <tr><td>One</td><td>Two</td></tr>
      <tr><td>Three</td><td>Four</td></tr>
      </table>
    </body>
</html>`;

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(`data:text/html,${html}`);

  const data = await page.evaluate(() => {
    const tds = Array.from(document.querySelectorAll('table tr td'))
    return tds.map(td => td.innerText)
  });

  //You will now have an array of strings
  //[ 'One', 'Two', 'Three', 'Four' ]
  console.log(data);
  //One
  console.log(data[0]);
  await browser.close();
})();

You could also use something like:-

你也可以使用类似的东西:-

const data = await page.$$eval('table tr td', tds => tds.map((td) => {
  return td.innerText;
}));

//[ 'One', 'Two', 'Three', 'Four' ]
console.log(data);

回答by Grant Miller

Two-Dimensional Array Method

二维数组法

You can also scrape the innerTextinto a two-dimensional arrayrepresenting your table.

您还可以将 刮取innerText到表示您的表的二维数组中。

[
  ['A1', 'B1', 'C1'], // Row 1
  ['A2', 'B2', 'C2'], // Row 2
  ['A3', 'B3', 'C3']  // Row 3
]


page.$$eval()

page.$$eval()

const result = await page.$$eval('#example-table tr', rows => {
  return Array.from(rows, row => {
    const columns = row.querySelectorAll('td');
    return Array.from(columns, column => column.innerText);
  });
});

console.log(result[1][2]); // "C2"

page.evaluate()

page.evaluate()

const result = await page.evaluate(() => {
  const rows = document.querySelectorAll('#example-table tr');
  return Array.from(rows, row => {
    const columns = row.querySelectorAll('td');
    return Array.from(columns, column => column.innerText);
  });
});

console.log(result[1][2]); // "C2"