PostgreSQL:块中的页头无效
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5220344/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PostgreSQL: invalid page header in block
提问by MrBubbles
I'm getting an Error
我收到一个错误
ERROR: invalid page header in block 411 of relation "t_value_time"
in my PostgreSQL database. This keeps happening on different machines. Is there a way to prevent it from happening, or at least telling PSQL to ignore the data on the invalid block and move on?
在我的 PostgreSQL 数据库中。这在不同的机器上不断发生。有没有办法防止它发生,或者至少告诉 PSQL 忽略无效块上的数据并继续?
I'd rather lose the data from the block and have him skip over it, reading the rest of the data. Is there a way to tell PSQL to skip this block?
我宁愿丢失块中的数据并让他跳过它,阅读其余的数据。有没有办法告诉 PSQL 跳过这个块?
回答by Vlad
WARNING: You will lose some data!
警告:您将丢失一些数据!
We managed to get over it (crashed DEV VM) by issuing:
我们通过发出以下命令设法克服了它(崩溃的 DEV VM):
database=# SET zero_damaged_pages = on;
SET
database=# VACUUM FULL damaged_table;
WARNING: invalid page header in block xxx of relation base/yyy/zzz; zeroing out page
[..]
REINDEX TABLE damaged_table;
Fix via pwkg.ork.
通过pwkg.ork修复。
回答by Mike Sherrill 'Cat Recall'
Same block every time?
每次都是同一个区块?
From what I've read, the most common cause of invalid blocks is hardware. Red Hat has a utility, pg_filedump, that formats "PostgreSQL heap, index, and control files into a human-readable form". I don't think they support any PostgreSQL version greater than 8.4.0, but I could be wrong.
根据我的阅读,无效块的最常见原因是硬件。Red Hat 有一个实用程序pg_filedump,可以将“PostgreSQL 堆、索引和控制文件格式化为人类可读的形式”。我认为他们不支持任何高于 8.4.0 的 PostgreSQL 版本,但我可能是错的。
You want to prove your hardware is good by using tough, thorough disk, RAM, and NIC diagnostics.
您想通过使用强大、彻底的磁盘、RAM 和 NIC 诊断来证明您的硬件是好的。
回答by araqnid
There's no simple way to do it, but it's reasonably easy to do just by editing the data file directly (relfilenode of the pg_class entry gives the filename).
没有简单的方法可以做到这一点,但只需直接编辑数据文件(pg_class 条目的 relfilenode 给出文件名)就相当容易了。
Just copy a block from elsewhere in the file over the bad block. Ideally, synthesise an empty block or update the one you're overwriting to have no valid tuples in it.
只需从文件中的其他地方复制一个块到坏块上。理想情况下,合成一个空块或更新您要覆盖的块,使其中没有有效的元组。
Once you've got something that doesn't produce that error, dump the table and reload it for safety.
一旦你得到了不会产生那个错误的东西,为了安全起见,转储表并重新加载它。
回答by Chris Travers
these are almost always hardware problems btw. Verify and test RAM, disk, CPU. Make sure your environment is good (bad power input can cause problems as can overheating). That's the best way to prevent it. Best way to address it is point in time recovery from a base backup.
顺便说一句,这些几乎总是硬件问题。验证和测试 RAM、磁盘、CPU。确保您的环境良好(糟糕的电源输入会导致问题和过热)。这是预防它的最好方法。解决此问题的最佳方法是从基础备份进行时间点恢复。
回答by Rajesh Vallarapu
If you have a slave, set hot_standby_feedback to 'on' on it if not already. Do pg_dump and write it to /dev/null so that you don't consume any space. nohup pg_dump db_name -v -Fc -f /dev/null & If the dump succeeds, then your slave is fine. Do a failover. There will be no data loss.
如果你有一个从站,如果还没有的话,将 hot_standby_feedback 设置为“on”。执行 pg_dump 并将其写入 /dev/null 以便您不占用任何空间。nohup pg_dump db_name -v -Fc -f /dev/null & 如果转储成功,那么你的奴隶是好的。进行故障转移。不会有数据丢失。
Another way to validate your slave is to do, explain select count(*) from table_name; If it succeeds and if it is using a sequence scan, then your slave is good. You may not have to consider this option if it is using index scan.
另一种验证从属设备的方法是解释 select count(*) from table_name; 如果它成功并且它正在使用序列扫描,那么你的奴隶是好的。如果它使用索引扫描,您可能不必考虑此选项。
Note: This works only if your master is affected with storage level corruption.
注意:这仅在您的 master 受到存储级别损坏的影响时才有效。
I happened to face same issue just today and i was able to fix it.
我碰巧今天遇到了同样的问题,我能够解决它。