Distributed replicated blob server is a basic file server with distributed capabilities.
So, what are the benefits of drbs? Essentially, it allows you to keep a large set of immutable blobs available, under circumstances where failure of storage components is expected. These blobs are identified by a simple number, or blobid, chosen by the server and not influenced by the client.
Drbs consists of three components: the blobclient, which is the client library used to access the blobs; a number of blobserver, which actually store the blobs; and a single blobmaster, which coordinates where the blobs are stored and tells the blobclient where they can be found.
Each blob is stored on a number of different blobserver, which allows for redundancy and compensation in the event of a failure. A sensible setup would require at least 10 blobservers, though they could all be run on the same host. For more redundancy, it is recommended to spread them across multiple hardware units.
The blobmaster never sees the actual blob, only the meta information. This information includes an md5 checksum, which ensures that failing disk and/or mistakes by humans are detected. The blobmaster keeps all his data in ram since it consists solely of meta data on the blobs.
The Blobserver keeps all the meta data in ram and has the blobs as files in the ordinary file system. It logs all changes in a logfiles, which allows it to be restarted quickly by replaying actions to reach the old state again. Since the logfile is just mmap'ed, it can be read and interpreted quickly.
While it would be possible to implement a similar solution on top of an ordinary database, drbs follows the Google File System's concept that this can be done with much lower overhead. This makes drbs a more cost-efficient solution. This software assumes that hardware will fail, so cheaper hardware that will fail can be chosen.
Although drbs can work on a single machine, it is intended to scale up for storing larger sets of blobs on many machines. Google's paper on the subject even suggests the use of hundreds of machines!
In conclusion, while drbs is not yet suitable for handling production data, it has the potential to be a useful tool in keeping a large set of immutable blobs available. Thanks to its design, drbs offers redundancy and compensation for expected storage component failure, as well as fast restarts in the event of a failure. Overall, this software is a promising solution that is worth keeping an eye on as it continues to develop.
Version 20040804: N/A