MySql 复制 - 从站落后于主站

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8547827/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 22:04:48  来源:igfitidea点击:

MySql Replication - slave lagging behind master

mysqlreplicationmysql-management

提问by Ran

I have a master/slave replication on my MySql DB.

我的 MySql DB 上有一个主/从复制。

my slave DB was down for a few hours and is back up again (master was up all the time), when issuing show slave statusI can see that the slave is X seconds behind the master.

我的从数据库停机了几个小时,然后又恢复了(主数据库一直在运行),在发布时show slave status我可以看到从数据库比主数据库晚 X 秒。

the problem is that the slave dont seem to catch up with the master, the X seconds behind master dont seem to drop...

问题是slave好像跟不上master,master后面的X秒好像没掉...

any ideas on how I can help the slave catch up?

关于我如何帮助奴隶赶上的任何想法?

回答by RolandoMySQLDBA

Here is an idea

这是一个想法

In order for you to know that MySQL is fully processing the SQL from the relay logs. Try the following:

为了让您知道 MySQL 正在完全处理来自中继日志的 SQL。请尝试以下操作:

STOP SLAVE IO_THREAD;

This will stop replication from downloading new entries from the master into its relay logs.

这将阻止复制从 master 下载新条目到其中继日志。

The other thread, known as the SQL thread, will continue processing the SQL statements it downloaded from the master.

另一个线程,称为 SQL 线程,将继续处理它从 master 下载的 SQL 语句。

When you run SHOW SLAVE STATUS\G, keep your eye on Exec_Master_Log_Pos. Run SHOW SLAVE STATUS\Gagain. If Exec_Master_Log_Posdoes not move after a minute, you can go ahead run START SLAVE IO_THREAD;. This may reduce the number of Seconds_Behind_Master.

跑步时SHOW SLAVE STATUS\G,注意观察Exec_Master_Log_Pos。再跑SHOW SLAVE STATUS\G。如果Exec_Master_Log_Pos一分钟后不动,你可以继续跑START SLAVE IO_THREAD;。这可能会减少Seconds_Behind_Master.

Other than that, there is really nothing you can do except to:

除此之外,除了:

  • Trust Replication
  • Monitor Seconds_Behind_Master
  • Monitor Exec_Master_Log_Pos
  • Run SHOW PROCESSLIST;, take note of the SQL thread to see if it is processing long running queries.
  • 信任复制
  • 监视器 Seconds_Behind_Master
  • 监视器 Exec_Master_Log_Pos
  • 运行SHOW PROCESSLIST;,记下 SQL 线程以查看它是否正在处理长时间运行的查询。

BTW Keep in mind that when you run SHOW PROCESSLIST;with replication running, there should be two DB Connections whose user name is system user. One of those DB Connections will have the current SQL statement being processed by replication. As long as a different SQL statement is visible each time you run SHOW PROCESSLIST;, you can trust mysql is still replicating properly.

BTW 请记住,当您运行SHOW PROCESSLIST;复制运行时,应该有两个用户名为system user. 这些数据库连接之一将具有由复制处理的当前 SQL 语句。只要每次运行SHOW PROCESSLIST;时都能看到不同的 SQL 语句,您就可以相信 mysql 仍在正确复制。

回答by Moll

What binary log format are you using ? Are you using ROW or STATEMENT ?

你使用什么二进制日志格式?您使用的是 ROW 还是 STATEMENT ?

SHOW GLOBAL VARIABLES LIKE 'binlog_format';

If you are using ROW as a binlog format make sure that all your tables has Primary or Unique Key:

如果您使用 ROW 作为二进制日志格式,请确保所有表都具有主键或唯一键:

SELECT t.table_schema,t.table_name,engine
FROM information_schema.tables t
INNER JOIN information_schema .columns c
on t.table_schema=c.table_schema
and t.table_name=c.table_name
and t.table_schema not in ('performance_schema','information_schema','mysql')
GROUP BY t.table_schema,t.table_name
HAVING sum(if(column_key in ('PRI','UNI'), 1,0)) =0;

If you execute e.g. one delete statement on the master to delete 1 million records on a table without a PK or unique key then only one full table scan will take place on the master's side, which is not the case on the slave.

例如,如果您在 master 上执行一个 delete 语句以在没有 PK 或唯一键的情况下删除表上的 100 万条记录,那么在 master 一侧只会进行一次全表扫描,而在 slave 上则不是这种情况。

When ROW binlog_format is being used, MySQL writes the rows changes to the binary logs (not as a statement like STATEMENT binlog_format) and that change will be applied on the slave's side row by row, which means a 1 million full table scan will take place on the slave's to reflect only one delete statement on the master and that is causing slave lagging problem.

当使用 ROW binlog_format 时,MySQL 将行更改写入二进制日志(而不是像 STATEMENT binlog_format 这样的语句),并且该更改将逐行应用于从属端,这意味着将进行 100 万次全表扫描在从站上仅反映主站上的一个删除语句,这会导致从站滞后问题。

回答by Andreas Wederbrand

"seconds behind" isn't a very good tool to find out how much behind the master you really is. What it says is "the query I just executed was executed X seconds ago on the master". That doesn't mean that you will catch up and be right behind the master the next second.

“秒落后”并不是一个很好的工具来找出你真正落后于大师的程度。它说的是“我刚刚执行的查询是 X 秒前在主服务器上执行的”。这并不意味着你会在下一秒赶上并紧随其后。

If your slave is normally not lagging behind and the work load on the master is roughly constant you will catch up, but it might take some time, it might even take "forever" if the slave is normally just barely keeping up with the master. Slaves operate on one single thread so it is by design much slower than the master, also if there are some queries that take a while on the master they will block replication while running on the slave.

如果您的从站通常不会落后并且主站的工作负载大致恒定,您将赶上,但可能需要一些时间,如果从站通常只是勉强跟上主站,它甚至可能需要“永远”。从服务器在一个单线程上运行,因此设计上比主服务器慢得多,而且如果有一些查询需要在主服务器上运行一段时间,它们将在从服务器上运行时阻止复制。

回答by Abhijit Buchake

Just check if you have same time and timezones on both the servers, i.e., Master as well as Slave.

只需检查两台服务器(即主服务器和从服务器)上的时间和时区是否相同。

回答by user1769609

If you are using INNODB tables, check that you have innodb_flush_log_at_trx_commit to a value different that 0 at SLAVE.

如果您使用的是 INNODB 表,请检查您的 innodb_flush_log_at_trx_commit 值是否与 SLAVE 中的 0 不同。

http://dev.mysql.com/doc/refman/4.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit

http://dev.mysql.com/doc/refman/4.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit

回答by 58k723f1

We had exactly the same issue after setting up our slave from a recent backup.

从最近的备份设置我们的从站后,我们遇到了完全相同的问题。

We had changed the configuration of our slave to be more crash-safe:

我们已经改变了我们的奴隶的配置,使其更安全:

sync_binlog = 1
sync_master_info = 1
relay_log_info_repository = TABLE
relay_log_recovery = 1

I think that especially the sync_binlog = 1 causes the problem, as the specs of this slave is not so fast as in the master. This config option forces the slave to store every transaction in the binary lo before they are executed (instead of the default every 10k transactions).

我认为特别是 sync_binlog = 1 会导致问题,因为这个从站的规格没有主站那么快。此配置选项强制从属设备在执行之前将每个事务存储在二进制 lo 中(而不是默认的每 10k 个事务)。

After disabling these config options again to their default values I see that the slave is catching up again.

再次将这些配置选项禁用为默认值后,我看到从站再次迎头赶上。

回答by Akhil

Just to add the findings in my similar case.

只是为了在我的类似案例中添加调查结果。

There were few bulk temporary table insert/update/delete were happening in master which occupied most of the space from relay log in slave. And in Mysql 5.5, since being single threaded, CPU was always in 100% and took lot of time to process these records.

master中很少有批量临时表插入/更新/删除发生,占用了slave中中继日志的大部分空间。而在Mysql 5.5中,由于是单线程,CPU一直在100%,处理这些记录需要很多时间。

All I did was to add these line in mysql cnf file

我所做的就是在 mysql cnf 文件中添加这些行

replicate-ignore-table=<dbname>.<temptablename1>
replicate-ignore-table=<dbname>.<temptablename2>

and everything became smooth again.

一切又变得顺利。

Inorder to figure out which tables are taking more space in relay log, try the following command and then open in a text editor. You may get some hints

为了找出中继日志中哪些表占用了更多空间,请尝试以下命令,然后在文本编辑器中打开。你可能会得到一些提示

cd /var/lib/mysql
mysqlbinlog relay-bin.000010 > /root/RelayQueries.txt
less /root/RelayQueries.txt

回答by shadow0359

If u have multiple schema's consider using multi threaded slave replication.This is relatively new feature.

如果您有多个架构,请考虑使用多线程从属复制。这是一个相对较新的功能。

This can be done dynamically without stopping server.Just stop the slave sql thread.

这可以在不停止服务器的情况下动态完成。只需停止从属 sql 线程。

STOP SLAVE SQL_THREAD;
SET GLOBAL slave_parallel_threads = 4;
START SLAVE SQL_THREAD;

回答by Yuda Prawira

I have an issue similar to this. and both of my MySQL server hosted on AWS EC2 (master and replication). by increasing EBS disk size (which automatically increased IOPS) for MySQL slave server, its turned out the solution for me. R/W Throughput and bandwidth is increased R/W latency were decreased.

我有一个类似的问题。以及我在 AWS EC2 上托管的两个 MySQL 服务器(主服务器和复制服务器)。通过增加 MySQL 从服务器的 EBS 磁盘大小(自动增加 IOPS),这对我来说是解决方案。R/W 吞吐量和带宽增加 R/W 延迟减少。

 R/W Throughput and bandwidth is increased R/W latency were decreased, lagging behind master, Seconds_Behind_Master increased

 R/W 吞吐量和带宽增加 R/W 延迟减少,落后于 master,Seconds_Behind_Master 增加

now my MySQL database replication is catching up to the master. and Seconds_Behind_Master was decreased (it was got increased from day to day).

现在我的 MySQL 数据库复制正在赶上 master。并且 Seconds_Behind_Master 减少了(它每天都在增加)。

so if you have MySQL hosted on EC2. I suggest you tried to increase EBS disk size or its IOPS on the slave.

因此,如果您在 EC2 上托管了 MySQL。我建议您尝试增加 EBS 磁盘大小或其在从属设备上的 IOPS。