Presented by

  • Trent Lloyd

    Trent Lloyd
    @lathiat
    http://lathiat.net

    Trent Lloyd is a Software Engineer at Canonical (Ubuntu) in the Sustaining Engineering team which acts as a escalation point for support requests requiring both in-depth analysis and software fixes. He has 15 years Linux systems administration & networking experience supporting enterprise customers including roles at MySQL, Sun Microsystems, Oracle & ISPs in his home town of Perth, Australia. With a special interest in Networking, Ceph and ZFS he is also upstream maintainer of the Avahi Multicast DNS Service Discovery stack.

Abstract

A rapidly scaling private OpenStack + Hybrid HDD/SSD Ceph cloud began to experience very slow I/O performance for their Windows guests - making them practically un-usable. This is the in-depth technical story of how the issue was found and fixed including the surprising outcome that this I/O was always going to be slow on an OpenStack Ceph cloud with a large Windows guest footprint - until the fixes that were since developed are deployed at both the storage, host and guest image layer. Spoiler Alert: The underlying reason is related to Windows guests by default aligning I/O to 512-byte boundaries but Linux and Ceph generally work best with (and usually only submit, this is key) I/O aligned to 4096-byte boundaries. The story doesn't end there though. I will go in-depth on the fixes and changes needed to Ceph, Nova, Cinder and the Windows VirtIO drivers to get everything working smoothly.