巩义市出租车司机被杀:Getting Good IO from Amazon's EBS
来源:百度文库 编辑:偶看新闻 时间:2024/05/10 08:10:22
Getting Good IO from Amazon's EBS
The performance characteristics of Amazon’s Elastic Block Store are moody, technically opaque and, at times, downright confounding. At Heroku, we’ve spent a lot of time managing EBS disks, and recently I had a very long night trying to figure out how to get the best performance out of the EBS disks, and little did I know I was testing them on a night when they were particularly ornery. On a good day, an EBS can give you 7,000 seeks per second and on a not so good day will give you only 200. On those good days you’ll be singing its praises and on the bad days you’ll be cursing its name. What I think I stumbled on that day was a list of techniques that seem to even out the bumps and get decent performance out of EBS disks even when they are performing badly.
Under perfect circumstances a totally untweaked EBS drive running an ext3 filesystem will get you about 80kb read or write throughput and 7,000 seeks per second. Two disks in a RAID 0 configuration will get you about 140kb read or write and about 10,000 seeks per second, this represents the best numbers I’ve been able to get out of an EBS disk setup as it seems to be saturating the IO channel on the EC2 instance (this makes sense as it would be about what you’d expect from gigabit ethernet). However, when the EBS drives are NOT running their best, which is often, you need a lot more tweaking to get good performance out of them.
The tool I used to benchmark was bonnie++, specifically:
bonnie++ -u nobody -fd /disk/bonnie
Saturating reads and writes was not very hard, but seeks per second – which is CRITICAL for databases – was much more sensitive and is what I was optimizing for in my tests.
In my tests I build raids. I’ve been using mdadm raid 0:
mdadm --create /dev/md0 --metadata=1.1 --level=0 ...
Each EBS disk is claimed to be a redundant disk to begin with so I felt safe just striping for speed.
Now, I just need to take a moment to point something out. Performance testing on EBS is very hard. The disks speed up and slow down on their own. A lot. Telling when your tweak is helping vs it just being luck is not easy. It feels a bit like trying to clock the speed of passing cars with a radar gun from the back of a rampaging bull. I fully expect to find that some of my discoveries here are just a mare’s nest, but hopefully others will prove enduring.
After testing, what I found surprised me:
- More disks are better than fewer. I’ve had people tell me that performance maxed out for them at 20 to 30 disks. I could not measure anything above 8 disks. Most importantly, lots of disks seem to smooth out the flaky performance of a single EBS disk that might be busy chewing on someone else’s data.
- Your IO scheduler matters (but not as much as I thought it would). Do not use noop. Use cfq or deadline. I found deadline to be a little better but YMMV.
- Larger chunk sizes on the raid made a (shockingly) HUGE difference in performance. The sweet spot seemed to be at 256k.
- A larger read ahead buffer on the raid also made a HUGE difference. I bumped it from 256 bytes to 64k.
- Use XFS or JFS. The biggest surprise to me was how much better XFS and JFS performed on these moody disks. I am used to seeing only minimal performance enhancements to disk performance when using them but something about the way XFS and JFS group reads and writes plays very nicely with EBS drives.
- Mounting noatime helps but only by about 5%.
- Different EC2 instance sizes, much to my surprise, did not make a noticeable difference in disk IO.
- I was not able to reproduce Ilya’s results where a disk performed poorly when newly created but faster after being zeroed out with dd (due to lazy allocation of sectors).
I’ve included my notes from that day below. I was not running tests three times in a row and taking the standard deviation into account (although I wish I had), and these aren’t easy to reproduce because it’s been a while since the EBS drives were having such a bad day.
Glad you liked it. Would you like to share?
- Share
- No thanks
Sharing this page …
Thanks! Close
On bonnie, I don't have a lot of hands on experience with it, but I have come across articles saying that it's not to be trusted on NAS installations. I don't have anything concrete to back this up, but just a heads up. Having said that...
- IO Scheduler: CFQ is the default under 2.6.x, correct?
- Larger chunk sizes = great performance: to be expected, but this depends very much on the data you're storing. If you have large contiguous chunks then tweaking the FS and your raid config to larger values will definitely lead to better perf. Having said that, one thing I haven't tried is giving InnoDB a raw disk, instead of a filesystem. I wonder if performs better with that layer removed?
- XFS: once again, depends on your files. XFS does dynamic allocation for inodes, which means really poor performance if you're storing A LOT of very small files (something that I was trying to do).
- Different EC2 instance sizes: great to hear. Even with small vs medium? They advertise IO differences on the site. I've never setup a definitive test for this.
- Performance increase after DD: This is based on hearsay, and I haven't been able to reproduce it reliably either. The theory is that EBS allocates blocks on demand, so you can either DD the drive and force that cost up front, or swallow it as the drive grows. Having said that, I've seen more variation in performance based on simply unmounting/remounting the EBS drive. Unfortunately, we have no visibility into how the system is setup, and for all we know, they're changing the hardware configuration underneath without us knowing a thing.. It's a little bit frustrating, but such is life in the clouds. ;-) Flag 1 person liked this.
Maybe it's even just a matter of more conservative defaults options in ext3 (e.g. try setting data=writeback).
I don't know how bonnie++ works, but if it's just accessing a pre-allocated file, then I would expect that there is not much performance to be gained over a dumb file system like ext2. Flag
Earlier in the post you reference "64k" bytes as being an optimal read ahead, yet this doesn't match up with the blockdev statements in the table.
Thanks for posting this data, very useful. Flag
http://www.google.com/search?q...
http://www.google.com/search?q... Flag
I can't tell what the actual takeaway from this thread is:
http://developer.amazonwebserv...
But i think it says that nobarrier is either a) likely to be very helpful or b) harmless
Also, here are some recent benchmarks for XFS on local storage with a whole different group of magic words:
http://recoverymonkey.net/word...
and another set of benchmarks from Vadim at the MySQL Performance Blog:
http://www.mysqlperformanceblo...
In that thread, the 'first write is slower' is confirmed by a link to the AWS docs:
http://docs.amazonwebservices....
"Due to how Amazon EC2 virtualizes disks, the first write to any location on an instance's drives performs slower than subsequent writes. For most applications, amortizing this cost over the lifetime of the instance is acceptable. However, if you require high disk performance, we recommend initializing drives by writing once to every drive location before production use. " Flag
I also blogged about some speed improvements I noticed with different filesystems and configs as well. http://af-design.com/blog/2009... According to a commenter there, Amazon states that under certain circumstances, EBS stores will fail at an "annual failure rate (AFR) of between 0.1% – 0.5%." So be careful about assuming the drives are more robust than "real" drives.
Flag
I will see if I can replicate when I have time. Flag
ec2-attach-volume vol-111 -i i-1111 -d /dev/sdh1
ec2-attach-volume vol-222 -i i-1111 -d /dev/sdh2
ec2-attach-volume vol-333 -i i-1111 -d /dev/sdh3
ec2-attach-volume vol-444 -i i-1111 -d /dev/sdh4
Now the instance (i-1111) has 4 disks, sdh1, sdh2, sdh3 and sdh4 which you can make into a software raid with the mdadm tool just like if you had physical disks connected to a physical computer.
Flag