Evan Jones' Scratch Pad
Intel SSDs: Not appropriate for databases
I have been testing an Intel SSD (X25-M G2) to see if it stores data durably, meaning that if the disk claims the data has been written, it actually survives a power failure. This is important because you want your airline ticket to stay purchased after your buy it, even if the system crashes. The conclusion of my testing is that this SSD can lose data in power failures, even with the write cache disabled. This means that if you are using it for a database, committed transactions could be lost (meaning lost airline tickets, forgotten bank deposits, etc.). This is a "bug" in this device, as this is not how disks are supposed to work. In this article, I'll describe how I tested this, and what problems I found. I'm still contacting some experts to see if this observation agrees with what they have found, so I'll update this if I discover new information. ... (1262 words)
[ 2010-August-25 10:49 | Permanent Link ]
Write Latency and Block Size
Today's episode of how to get good performance from disks looks at the impact of block size and alignment on write latency. There are two layers involved in writes. The first layer is the disk cache or page cache, which stores disk pages in memory temporarily. On Linux x86 and x86-64, the page cache stores 4 kB chunks of files from disk (other architectures may use a different size, for example 8 kB on Sparc or IA64). Thus, performing I/O in 4 kB chunks will probably be more efficient. The second layer is the disk interface itself. In the dark ages, someone decided that disk sectors should be 512 bytes long (although this is now increasing to 4 k). This means that every access to the disk must perform I/O in 512 byte chunks. Again, this means that if you perform I/O in 512 byte chunks, you should get more efficient I/O. Combining these two seems to suggest that performing I/O in 4 kB chunks is the best. However, I wanted to find out exactly how much this matters, so I performed a simple experiment. (Short version: align writes on a 4 k boundary, and always write a multiple of 4 k bytes). ... (877 words)
[ 2010-August-09 10:11 | Permanent Link ]
More Than You Wanted to Know About Checksums
A checksum is a function that computes an integer value from a string of bytes that is used to detect errors. Each checksum function has slightly different speed or robustness properties, so which one should you use? In my opinion, new applications should use the CRC32C algorithm, as it is very robust and supported in hardware in newer Intel CPUs. This article discusses checksums in general, then provides an implementation and some performance advice. ... (732 words)
[ 2010-July-16 14:55 | Permanent Link ]
Linux Write Caching
I just spent a couple days trying to understand the performance of a small test program that constantly overwrites a file, treating it like a circular buffer. The key is understanding how Linux caches dirty pages in memory and writes them back. The brief summary is that Linux attempts to write dirty pages out in the background, without blocking the process doing the writing. However, once there are enough dirty pages, the process will be blocked, actively pages out to disk. The details are described elsewhere, but I will briefly summarize what I learned here. This may not be completely correct, because I didn't look at the code in any detail. Note: All the variables are files in /proc/sys/vm that are adjustable. ... (518 words)
[ 2010-July-07 11:31 | Permanent Link ]
Java String Encoding Internals
I have written about my experiments with Java string encoding performance before. However, someone asked me some questions about why my code can do better than the JDK. I did not have a good answer, so I dug into the JDK source code to find out. If you want to know what happens when you call String.getBytes("UTF-8"), read on. The short version for UTF-8 is: ... (521 words)
[ 2010-May-31 16:15 | Permanent Link ]
Debugging with a Circular Buffer
I just spent the last day tracking down an annoying bug in a multi-threaded event-driven program. This bug would cause the program to get stuck on rare occasions, after running for an hour or so. Examining the stuck process's state with GDB simply showed that it was stuck in a state that I didn't think was possible. This led me in the right direction, but I didn't understand how it could possibly end up in that state. I first tried adding printf statements to my program to trace the execution, but generated tons of output (although in retrospect, tail probably could have solved that part of the problem), and ended up slowing the program down enough that the bug did not happen. To solve this, I ended up logging data to a per-thread circular buffer. This was fast enough to still hit the bug, and then I could look at the last entries in the circular buffer to figure out how the program got stuck. ... (325 words)
[ 2010-May-22 12:53 | Permanent Link ]
Ken Thompson's "Trusting Trust" Quine
One of the papers I discussed this week as part of teaching MIT's Computer Systems Engineering was Ken Thompson's Reflections on Trusting Trust [doi]. This short paper discusses an interesting security attack via the compiler. However, it starts with a quine, a program that prints its own source code. Russ Cox recently described quines in great detail, including zip files that contain themselves. However, after reading this paper, I wanted to reproduce Ken Thompson's original quine. This documents how I did it, and includes source code. ... (461 words)
[ 2010-May-09 14:51 | Permanent Link ]
Buffer Overflows 101
I am a recitation instructor for MIT's 6.033 course this semester, which basically involves discussing "notable" computer science papers with approximately 50 students every week. This week we discussed buffer overflows. To refresh my memory, I dug up my previous work with a buffer exploit challenge, simplifying it and revisiting it for modern Linux. Here is a brief introduction to how to exploit a buffer overflow on a Linux system. This works on both 32-bit and 64-bit systems, although 64-bit systems may need an offset to be manually updated. ... (507 words)
[ 2010-April-27 16:41 | Permanent Link ]
Weak Isolation in Relational Databases
The ACID model is a key feature of traditional relational database systems. The I stands for Isolation, meaning that transactions cannot see intermediate results from other transactions. The traditional definition of isolation is serializability, where the results of processing a set of transactions is equivalent to executing them one at a time in some order. Effectively, each transaction can pretend to be executing by itself, when in reality multiple transactions can run concurrently. This is very useful, since applications do not need to worry about concurrency, making it much easier to write correct code. However, most database systems do not actually provide this model. Many systems, notably Postgres and Oracle, provide Snapshot Isolation, which is weaker. The primary difference is that snapshot isolation only checks for write/write conflicts, while serializability enforces read/write conflicts. Even worse, MySQL with InnoDB by default uses what it calls the REPEATABLE READ isolation level, but it has reads which are not repeatable. Similarly, Postgres defaults to the READ COMMITTED isolation level. The good news is that if you explicitly tell MySQL to use the SERIALIZABLE isolation level, it really is serializable. This article will provide a few concrete examples of where these isolation levels differ. ... (1336 words)
[ 2009-December-10 09:18 | Permanent Link ]
Java String Encoding Performance
When writing Java String objects to the external world, either to a file or over the network, they must be converted to bytes in some specific encoding. Personally, I recommend UTF-8, but that is another issue. No matter what encoding is used, some API must be called to do the conversion. The simplest is String.getBytes(), which returns a new byte[] array with the data in a given character set. Unfortunately, when buffering writes for high performance, this typically results in an extra copy and an array that promptly needs to be garbage collected, which is wasteful. CharsetEncoder provides a mechanism to serialize into a ByteBuffer, which can be reused. This is more complicated, but should avoid extra copies and memory allocations. Interestingly, it doesn't improve performance when used in a straightforward fashion. It turns out that if you are only encoding a few strings, use String.getBytes(). If you need to encode many strings into a separate buffer, copy the string into a char[] array and use that as the input to a CharsetEncoder. This article describes the details of these approaches, and shows some performance numbers. [Update: For more detail, read how the JDK encodes strings]. ... (633 words)