We have been in the process of upgrading our main MySQL servers from FreeBSD 10.3 to 11.1. First we started with our secondary server, which was verified to work as expected in its current role (replication and read-only queries). There was a slight hiccup after the upgrade completed, in that our database ZFS pool wasn't mounted automatically. This turned out to be more problematic than we initially thought.
On 2018-04-24 we set out to switch to the upgraded server as our primary MySQL server, thereby handling all the writes.
The switch in itself went fine, but shortly thereafter, we were alerted by our monitoring of an unusual amount of documents on DLQ. We quickly found out that the reason for this was exceptions from our service writing documents to MySQL.
Further investigation showed that the new MySQL primary wasn't configured with the correct
sql_mode, causing writes to fail. The problem stemmed from our earlier problem with the missing ZFS pool. When FreeBSD started MySQL on that boot, it could not find the MySQL
datadir, therefore it ran
mysql_install_db, which created an MySQL options file with an
sql_mode value different from the default value.
The issue did not manifest itself in our integration tests and required production write traffic to hit the database.
To ensure this does not happen again, we have explicitly defined
sql_mode in our ordinary options file, and included additional tasks in our deployment playbooks to ensure only our config file exist.