- GCP Cloudsql MySQL
- 64 vCPU, 128 GB Mem, 200 GB ssd (64GB of actual data used)
- MySQL 8.0.26-google
- sort_buffer_size = 1348976000
I must describe, as this was a production server at a company I was working for at the time. All relevant screenshots that would identify it would not be possible for use, and much of this is from memory.
Recently, an issue popped up that I hadn’t seen for a while, and was somewhat unexpected for that reason. The application started timing out, and normally quick queries were timing out. innodb_lock_wait_timeout was at 50 seconds (default), so anything past that failed.
Reviewing the query, I did a quick EXPLAIN. The explain provided a key, but also “using index, where, temporary tables”. The type of index was “index merge”. To verify, I used “SHOW CREATE TABLE”, mainly to see if there were any other indexes that could be used. Verifying that there weren’t any, I used “SHOW INDEXES FROM”, to look at what the indexes looked like. I noticed the cardinality was very low for the number of leafs: 1:17000, 1:866,666, on the two column filters in the WHERE clause, so surmised that there were a 3 temporary tables per query and maybe that was causing the loss of mem space and pushing temp tables to disk.
Initial solution: Create an index from the two columns so that temporary tables were less.
A quick test on DEV and the alter (with ALGORITHM=INPLACE, LOCK=NONE) and the whole thing takes .22 sec. I didn’t try ALGORITHM=INSTANT, because I forget it exists sometimes, but probably should have tried it, and the alter is relatively fast. After writing out the change requests , obtaining approvals and notifying customers (FYI, this is a 45min to 1 hr process), I attempted the alter – thinking it would be a very quick action. My Estimate: “10 for worst-case, 5 minutes if you ask me, less than 1 is more likely.”
The alter failed due to Metadata Locks, threads started to climb waiting for the lock to release, and the application started to go to outage.
Problem: Metadata lock on alter and all incoming threads.
We thought it might be a problem with Volume, and it was also during a peak database usage time, so we ended the incident with a workaround and as a night deployment. The Metadata lock tripped me up, and the sheer number of people in the call who asked questions, interrupted my train of thought, or had other ideas that led to rabbit-hole hypothesis, kept me from thinking on what the real issue was. I continued to research, but didn’t find much additional info. About 3 hits on google suggested:
- Create an index
- Flush Tables
Problem 2: Metadata locks continue even on low volume
We reconvene. A different group shows up, and they have additional Change requests and requirements on creating the index, as well as a new communication to customers. An hour passes, and now we try again, but the Metadata locks return. There is an “Outage” unwritten policy (ie, no self-inflicted outages), so I said we’d need to do a more complex change with a second server, failover, etc. On a whim, I suggest we Flush Tables, and there was another suggestion for a Failover. I agree that the Failover would be better, and the failover is completed. The memory caches clear, and I am able to do the add-index. We receive permission to do both if needed.
Resolution: The resolution is false (to a point).
Upon further review, it looks like we fixed a symptom that defined a work around to the problem. After further research, I found the sort_buffer_size had been raised to an inordinate value of 1.2 GB. This is per session, so there was only about 50 total connections that could be accommodated before max mem was reached and mysql was using on-disk.
According to mysql:
If an internal in-memory temporary table becomes too large (as determined using the
https://dev.mysql.com/doc/refman/8.0/en/memory-use.htmltmp_table_size
andmax_heap_table_size
system variables), MySQL automatically converts the table from in-memory to on-disk format. As of MySQL 8.0.16, on-disk temporary tables always use theInnoDB
storage engine. (Previously, the storage engine employed for this purpose was determined by theinternal_tmp_disk_storage_engine
system variable, which is no longer supported.) You can increase the permissible temporary table size as described in Section 8.4.4, “Internal Temporary Table Use in MySQL”.
I believe that this was a combination of reserved sort_buffer_size and the creation of large temp tables due to the low-cardinality indexes.
When a thread is no longer needed, the memory allocated to it is released and returned to the system unless the thread goes back into the thread cache. In that case, the memory remains allocated.
https://dev.mysql.com/doc/refman/8.0/en/memory-use.html
Because there was thread re-use, the temporary tables may have remained on-disk, causing a perpetual state of on-disk requests. One of the symptoms that we saw were queries that used a PK that took up to 30s to complete.
A
https://dev.mysql.com/doc/refman/8.0/en/memory-use.htmlFLUSH TABLES
statement or mysqladmin flush-tables command closes all tables that are not in use at once and marks all in-use tables to be closed when the currently executing thread finishes. This effectively frees most in-use memory.FLUSH TABLES
does not return until all tables have been closed.
I was hesitant to do just a flush tables, because there was still a wait on queries to execute prior to the flush finishing (and it’s really anxiety inducing to sit there and wait for the flush to finish as it looks like nothing is happening), but if they had not allowed for the Failover, it would have been my next action.
Ultimate Resolution: We performed a failover and immediately added the index.
Once the memory had been cleared, and we were off-disk, adding the index had to be accomplished, as it may be contributing to the overall increase in memory usage. I told the development team that this was a stopgap measure and not a fix, as more filters were needed to increase cardinality.
Root Cause
I believe the root cause to be the sort_buffer_size, but for now, no one wants me to change it because it had been there for at least 6 months (I wasn’t the person who changed it originally, and based on the number distribution, feels like they merely randomly chose different numbers). This would increase per connection memory reservations and inevitably cause the spillover from on-mem to on-disk. While this was occurring, thread re-use was accessing (for some queries), the same allocated memory that was sitting on-disk, causing queries from the same thread to use on-disk allocation. Additionally, there were several non-application users who had logged in to be helpful, increasing the memory reservation at the wrong time.
For now, I’ve sent a lot of best practice information, possible variables changes and a rundown of what I think occurred, but haven’t heard much since. At least I have a workaround if they call again for the same reason.