You can put lipstick on a .cnf – Part 2
The foundation for any change in variables is the hardware, the OS, and the file system. After that, you start looking at the workload – which means understanding what it is you have, are wanting, and are doing, over time.
Tools:
- pt-summary
- sysbench
- fio
- hdparm
- dd
Let’s start with the hardware:
Each MariaDB server is installed with Ubuntu 16.04 on HP Prodesk 600 G1. Using pt-summary from the Percona Toolkit I get the following (Parts removed for compactness):
sysadmin@tpd81:~$ sudo pt-summary
# Percona Toolkit System Summary Report ######################
Hostname | tpd81
System | Hewlett-Packard; HP ProDesk 600 G1 DM; vNot Specified (Desktop)
Platform | Linux
Release | Ubuntu 16.04.6 LTS (xenial)
Kernel | 4.4.0-169-generic
Architecture | CPU = 64-bit, OS = 64-bit
Threading | NPTL 2.23
SELinux | No SELinux detected
Virtualized | No virtualization detected
# Processor ##################################################
Processors | physical = 1, cores = 4, virtual = 4, hyperthreading = no
Speeds | 1x2000.156, 1x2000.390, 1x2059.531, 1x2079.296
Models | 4xIntel(R) Core(TM) i5-4590T CPU @ 2.00GHz
Caches | 4x6144 KB
# Memory #####################################################
Total | 7.7G
Free | 6.5G
Used | physical = 394.2M, swap allocated = 976.0M, swap used = 0.0, virtual = 394.2M
Buffers | 820.6M
Caches | 7.0G
Swappiness | 60
Locator Size Speed Form Factor Type Type Detail
========= ======== ================= ============= ============= ===========
DIMM1 4096 MB 1600 MHz SODIMM DDR3 Synchronous
DIMM3 4096 MB 1600 MHz SODIMM DDR3 Synchronous
# Mounted Filesystems ########################################
Filesystem Size Used Type Opts Mountpoint
/dev/sda1 511M 1% vfat rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro /boot/efi
/dev/sda2 457G 1% ext4 rw,relatime,errors=remount-ro,data=ordered /
/*SNIP*/
# Disk Schedulers And Queue Size #############################
sda | [deadline] 128
# Disk Partioning ############################################
Device Type Start End Size
============ ==== ========== ========== ==================
/dev/sda Disk 500107862016
/dev/sda1 Part 2048 1050623 0
/dev/sda2 Part 1050624 974772223 0
/dev/sda3 Part 974772224 976771071 0
# Kernel Inode State #########################################
dentry-state | 59370 46232 45 0 0 0
file-nr | 896 0 777238
inode-nr | 55090 399
# RAID Controller ############################################
Controller | No RAID controller detected
# Network Config #############################################
Controller | Intel Corporation Ethernet Connection I217-LM (rev 04)
FIN Timeout | 60
Port Range | 60999
# Interface Statistics #######################################
interface rx_bytes rx_packets rx_errors tx_bytes tx_packets tx_errors
========= ========= ========== ========== ========== ========== ==========
eno1 1750000000 6000000 0 2000000000 1500000 0
# The End ####################################################
This is the starting point for my configuration, and really – the important parts are memory, CPU, filesystem type.
Note: Hyperthreading is off because Maria 10 on 14.4 and below at least had issues with hyperthreading.
Note: I know about the “vm.swappiness = 0” on every mysql server – except, I believe that it has been fixed? I am currently researching.
Sysbench:
We’re going to use sysbench 0.4 to give us a better idea of the I/O provided by the hd. This is the older version of sysbench available on Ubuntu. For the newer version, connect to the Percona Repository. I plan to use Sysbench 1.0 soon. Using the fileio test, I test with 20GB file. The importance of the total file size is that it should be bigger than your RAM (as seen above, 8GB). Complete the prepare:
sysbench --test=fileio --file-total-size=150G prepare
Once the prepare is complete, we can do a test on random read/write to get an idea of the numbers we want to look at:
sysadmin@tpd31:~$ sysbench --test=fileio --file-total-size=20G --file-test-mode=rndrw --init-rng=on --max-time=300 --max-requests=0 run
sysbench 0.4.12: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Initializing random number generator from timer.
Extra file open flags: 0
128 files, 160Mb each
20Gb total file size
Block size 16Kb
Number of random requests for random IO: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Threads started!
Time limit exceeded, exiting...
Done.
Operations performed: 26557 Read, 17704 Write, 56576 Other = 100837 Total
Read 414.95Mb Written 276.62Mb Total transferred 691.58Mb (2.3053Mb/sec)
147.54 Requests/sec executed
Test execution summary:
total time: 300.0007s
total number of events: 44261
total time taken by event execution: 147.3784
per-request statistics:
min: 0.00ms
avg: 3.33ms
max: 56.85ms
approx. 95 percentile: 10.95ms
Threads fairness:
events (avg/stddev): 44261.0000/0.00
execution time (avg/stddev): 147.3784/0.00
This has interesting stuff in it. For example: fsyncs() done every 100 requests. Block size of 16KB.
FIO Tests
hdParm and dd
(This next section is probably not useful for NVME and SAN!)
Ok. I’m only adding this part to give you an understanding of what I do when figuring out the configurations of a server instance. The following is only useful for a sandbox system. If you’re using NVME’s and a SAN, or have a RAID controller and 8 hd’s in a RAID 10+5 configuration . . . you gotta figure out what it’s doing and probably use dd. Keep scrolling.
Info and READ Test
The second piece of information that I like to have is for the disk. This is an ATA disk, so I’m using hdparm which is useful for obtaining information and controlling ATA/IDE controllers and hard drives. Notice that it says ATA/IDE controllers. I believe NVME’s have a similar utility called nvme-cli. As for other types of drives . . . /shrug . . . maybe smartctl?
sysadmin@tpd31:~$ sudo hdparm -I /dev/sda
/dev/sda:
ATA device, with non-removable media
Model Number: TOSHIBA MQ01ACF050
Serial Number: 98EZC1XVT
Firmware Revision: AV0A3E
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
Supported: 8 7 6 5
Likely used: 8
Configuration:
Logical max current
cylinders 16383 0
heads 16 0
sectors/track 63 0
--
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 976773168
Logical Sector size: 512 bytes
Physical Sector size: 4096 bytes
Logical Sector-0 offset: 0 bytes
device size with M = 1024*1024: 476940 MBytes
device size with M = 1000*1000: 500107 MBytes (500 GB)
cache/buffer size = unknown
Form Factor: 2.5 inch
Nominal Media Rotation Rate: 7200
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Advanced power management level: 254
DMA: sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* DOWNLOAD_MICROCODE
* Advanced Power Management feature set
SET_MAX security extension
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* WRITE_{DMA|MULTIPLE}_FUA_EXT
* 64-bit World wide name
* IDLE_IMMEDIATE with UNLOAD
Write-Read-Verify feature set
* WRITE_UNCORRECTABLE_EXT command
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
* Gen1 signaling speed (1.5Gb/s)
* Gen2 signaling speed (3.0Gb/s)
* Gen3 signaling speed (6.0Gb/s)
* Native Command Queueing (NCQ)
* Host-initiated interface power management
* Phy event counters
* Idle-Unload when NCQ is active
* Host automatic Partial to Slumber transitions
* Device automatic Partial to Slumber transitions
* READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
* DMA Setup Auto-Activate optimization
Device-initiated interface power management
* Software settings preservation
* SMART Command Transport (SCT) feature set
* SCT Write Same (AC2)
* SCT Error Recovery Control (AC3)
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
* DOWNLOAD MICROCODE DMA command
Security:
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
supported: enhanced erase
92min for SECURITY ERASE UNIT. 92min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 50000398c4405499
NAA : 5
IEEE OUI : 000039
Unique ID : 8c4405499
Checksum: correct
This also has a boat load of info, but then I do the following:
sysadmin@tpd31:~$ sudo hdparm -Tt /dev/sda
/dev/sda:
Timing cached reads: 22436 MB in 1.99 seconds = 11254.34 MB/sec
Timing buffered disk reads: 350 MB in 3.02 seconds = 116.01 MB/sec
sysadmin@tpd31:~$ sudo hdparm -t --direct /dev/sda
/dev/sda:
Timing O_DIRECT disk reads: 348 MB in 3.00 seconds = 115.82 MB/sec
FYI: 115 MB/s aren’t that bad for an HDD.
WRITE Tests (and read too, but I like the above better)
Write tests are done with dd and are pretty simple. As with most test, do it a couple of times and then take an average.
/* *** tempfile does not exist on the disk WRITE is fast (sequential) *** */
sysadmin@tpd31:~$ sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.29691 s, 828 MB/s
/* *** tempfile exists on the disk - write speed drops to 134 MB/s *** */
sysadmin@tpd31:~$ sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 8.01597 s, 134 MB/s
sysadmin@tpd31:~$ sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.80114 s, 138 MB/s
sysadmin@tpd31:~$ sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.91887 s, 136 MB/s
sysadmin@tpd31:~$ sudo rm tempfile
sysadmin@tpd31:~$ sync; dd if=/dev/zero of=tempfile bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.5276 s, 2.0 GB/s
/** READ TEST Check tempfile is in cache **/
sysadmin@tpd31:~$ sync; dd if=tempfile of=/dev/null bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.205524 s, 5.2 GB/s
/** DROP cache and then do it again. This is real READ speed **/
sysadmin@tpd31:~$ sudo /sbin/sysctl -w vm.drop_caches=3
vm.drop_caches = 3
sysadmin@tpd31:~$ sync; dd if=tempfile of=/dev/null bs=1M count=1024; sync
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 9.10899 s, 118 MB/s
The above is going to take some discussion, so the real calculations will happen in Part 3.