In-memory Database Systems: Hype or Reality?

Share

arrow

Typical arguments about in-memory database systems follows:

From : http://nosql.mypopescu.com/post/505511044/in-memory-elastic-databases

This is indeed an interesting argument and one that I’m not going to argue against. But it still feels like elastic caching or in-memory elastic databases will remain just a part of the software equation:

– Even if the price of RAM has continued to decrease, the machines mentioned do not sound like commodity hardware so you’ll have to balance the costs with the value of data.
– It still sounds like vertical scaling (nb not saying that vertical scaling is always bad)
– There will always be data that will fit better on disk (e.g. video)
– The more data will be accumulated the more you’d like to make sure that querying it (nb online or offline) is not expensive

Those are just one of the many examples of taking the points to extremes. Before giving an answer to those we should go over those lines from 2008.

Steven Robbins published an interesting article on InfoQ titled  RAM is the new disk In the comment thread, Steven Robbins quoted Tim Bray and others comparing file system performance to memory:

Memory is the new disk! With disk speeds growing very slowly and memory chip capacities growing exponentially, in-memory software architectures offer the prospect of orders-of-magnitude improvements in the performance of all kinds of data-intensive applications. Small (1U, 2U) rack-mounted servers with a terabyte or more or memory will be available soon, and will change how we think about the balance between memory and disk in server architectures….

It raises the following questions:

What if the disk were RAM-based? Does that mean that all we need to do is replace the current disks with RAM technology to gain speed? The title of the article leads people to think along those lines.

It’s not just the speed of memory compared to disks that makes a difference. It’s not even the extra benefit of the collocation of CPU and memory. What’s really a important is the fact that disk is a sequential storage medium that was designed primarily to store a stream of bytes, not tables of data. That means that if you want to store data objects you need to serialize them into bytes, map sectors in the file system that points to the location of those bytes. Maintaining an index to this data is a relatively expensive operation as every additional index is stored as a copy of the original data, there is no real option to access data by reference, etc. If you think about it, existing RDBMS are basically a mapping layer between data-tables representations and sequential storage. A large part of existing database implementations is spent on addressing the impedance mismatch between the two representations models. All this complexity doesn’t really exist when we’re dealing with memory. That means that if will take existing databases and run them on memory based devices we’re basically going to force the limitations of sequential storage representations into memory.
To exploit the real value of memory based resources we need to have different approach and implementations that assume that data can be accessed by reference – that objects can be accessed directly from our application without complex mapping layer in our native application domain.

At this point I’d like to end with Tim’s last remark:

Disk will become the new tape, and will be used in the same way, as a sequential storage medium (streaming from disk is reasonably fast) rather than as a random-access medium (very slow). Tons of opportunities there to develop new products that can offer 10x-100x performance improvements over the existing ones.

 

CloudRAM

Second important credit should be given to the important paper that changed our point of view:

http://www.stanford.edu/~ouster/cgi-bin/papers/ramcloud.pdf

The Case for RAMClouds:
Scalable High-Performance Storage Entirely in DRAM
Department of Computer Science
Stanford University

Or You can go over:

http://nms.csail.mit.edu/~stavros/pubs/hstore.pdf

The End of an Architectural Era
(It’s Time for a Complete Rewrite)
Michael Stonebraker  -MIT

ppt      ppt2

After all what is the conclusion?

A hype? Are you sure?… Watch out for more…

Share

COMMENTS

  • Keep working ,terrific job!

  • ituned

    You have certainly been very busy writing up this great post, It was very interesting to read. Can’t wait to see what you write about in the next month.

  • Despina Gucwa

    I’m impressed, I have to say. Actually hardly ever do I encounter a blog that’s each educative and entertaining, and let me let you know, you have got hit the nail on the head. Your idea is outstanding; the problem is something that not sufficient people are talking intelligently about. I’m very pleased that I stumbled across this in my search for one thing regarding this.

  • Hi Ugur
    I’m glad that you found my quotes and slides usefull..
    I’m interested to know how do you see then step on that path?

  • Thx Nati, I appreciate your work. I think the next step is the CPU architecture. We will see some radical changes in the BI landscape. More specialized CPU’s and HW platforms. I will soon write about them. If you are interested you might want to check out my last IDC presentation.
    Regards from istanbul

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.