Dr. David DeWitt
“I’m not a doctor.” This is going to be good. “From 1 to 1000 MIPS” He’s doing great. He was a fantastic last year and I knew this was going to be good. 32 years in academia and only in MS for 1.5 years. He’s the blue sky guy, and beleive me when I tell you, you get smarter from being in the room with him. And a huge ovation because he told us that he’s going technical and not covering a marketing pitch.
Right. This one is going to be hard to blog. He’s going through information quick. It’s all good. He’s giving an academic talk about the 30 years of technology trends in databases. He’s going over how the trends have affected OLTP and why the trends are forcing DBMS to evolve and some future trends to keep an eye on.
Back in 1980 we had a Digital VAX with a 1MIP CPU, 8mb of memory, 80mb disk drive… all for $250K. Yeah, the “good old days” Not. More or less 1000 times better for Cache & memory, 2000 times better for CPU and 10000 times for disks. But transfer rates have only increased by 65X. And for seeks there has only been a 10x improvement. 1985, 100TPS & 400 I/O/second. Today, 25,000 TPS & 100,000 I/O/second. His key point is that it took 14 drives to keep a single processor busy in 1985 but it takes 300 drives to keep a cpu busy today. He believes that SSD and phase change memory are the only things that are going to make a difference here.
His comparison of the problem is that today we’re in the situation where we have a gigantic water tank to feed water to the hole town, but only a garden hose to drain it. That’s scary to think about. When thinking about it, he compared the disk to cpu ratio and came to the conclusion that drives are 150x SLOWER today than they were 30 years ago.
So his basic conclusion is that we need to avoid random I/O in our database applications…. uh, yeah, good advice, I guess, but how the heck do we do that? Obviously we don’t and that’s not what he’s saying. He’s making a great point. This is just flat out fantastic.
In the old days it was 6 cycles to access memory. Now it’s 200 cycles to access memory. Despite all these improvements, we’re really messing up our scalability on the hardware because of limitations in the design. Hard to believe when you consider that everything is, in fact, faster, but it’s just not faster the right way. It’s scaling too hard.
He’s showing how the queries work within the system, what happens, literally, what is waiting on what, all the way down at the lowest level of the process. Scan queries vs. join queries. So why so many stalls? It’s the L2 data cache stalls, an artifact of the last 30 years of design within systems. Why? Well, you’d need to be here to really understand. I’m not going to repeat it correctly. For each record, there’s an L2 & L1 cache miss from each page as it reads from the disk, row-by-flipping-row. So basically, we’re getting RBAR within the CPU when it has to read from the disk.
There is a new memory configuration called a column store where the columns are broken down instead of rows. The columns are stored on files/pages. He’s showing how ID would be one page, Name on 2, City on 12, etc. because otherwise the rows are stored across pages. Without compression, it changes the amount of data on a page. If they also add compression. It’s going to make a huge difference. BUT, it’s not updateable. They’re going to make the DB a perfect decision support engine. Showing some drill down on the details, he showed about a 7x improvement in speed, without compression.
Then, you put compression on top of it and stuff really takes off. But remember, this is a read win. It’s write lose.
Really, I’m going to stop trying to keep up. I’m starting to lose track of everything anyway. I think this might be available online or on the DVD. I sure hope it is. If you were at the Summit and you didn’t attend this… you messed up, big time.
The key to compression is to remember that you need to get more data onto the disk because the CPU is 1000x faster but the disk is only 65x faster. That’s why you compress. Not because you’re saving space, but because you’re saving READS. I’m going back to the office to turn on compression on everything (after testing & verification).
I can’t describe it all, but the early/late materialization performance differences are amazing.
Key points for the quiz. yes, I’m giving out the quiz answers, call Bluto and D-Day:
- At first glance hardware folks would appear to be our friends
- Huge inexpensive disks have enabled us to cod-effectively store vast quantities of data
- On the other hand ONLY a 10x improvement in random disk access and 65x improvement in through point
- Two pronged solution for “read” intesnsive data waroehouse workloads – parallel db technology & column stores
- Column stores, minimize transfer, facilitate compression, minimize memory stalls
- But column stores don’t work for OLTP, AT ALL
- Hardware trends and demands are forcing DB systems to evolve through specialization
But, I missed to two bullets because I couldn’t type fast enough. So you might still fail the quiz.
This will be on the DVD. Thank you PASS. Thank you Dr. DeWitt.