Redo Log – Resource Journaling

posted in: Projects | 0




Background

Redo Log (look ahead log or commit) journaling is required whenever system supports in-place updating of serialized resource which is backed up by datastore for durability . This is required to capture changes in atomic way so that they can be replayed properly in the event of system crashes or abrupt shutdowns.

Interesting fact is, this feature is not required for append only databases or applications since changes can be serialized to disk in atomic way in one write.

In the past this was mostly used in relational databases. But now, with new wave of applications with requirements of high level of redundancy, durability and horizontal scalability on the commodity hardware is picking up, this feature may be useful to lot of todays applications other than NoSql databases.

Todays hardware configurations are also very different from the past. Commodity hardware can have 10+ of CPUs, 100+ GB RAM and 10-40 TB of hard disk with ~1 GB SSD. If your application needs to store lot of changing data in to the backend, in-place editing of the resource may be more scalable than append only serialization. In that case redo log or resource journaling will be very handy for you.

High level requirements

Redo log journaling not only should manage redo logs but also apply the changes to backend data stores. Applying changes to the backend should guarantee the order and be multi threaded.

Redo log management should include log file rotations and also cleaning up of old log files whose changes are already applied.

High Level design

It has following 4 apis.

APIs

startTransaction

It returns transaction id which will be further used during logging, commit or rollback apis.

log (transaction id, byte[] serializedObject)

It keeps serializedObject in local ordered list against the transaction id so that it can be serialized to redo log file in atomic fashion.

commit (transaction id)

It writes list of all serialized objects for given transaction in to the redo log file. After writing, if log file size goes beyond certain size, it creates new one. And if more one than log folders are provided, it creates new log file in other folder.

rollback (transaction id)

It removes list of all serialized objects from the list for given transaction and never writes them to redo logs.

Apply types

This product not only maintains redo log files but also applies to backend store. Following are various apply types.

Every

It applies changes to backend for every transaction. This may not be most suitable from performance point of view in the running system since it writes every transaction not only to redo log file but also to backend store.

Only advantage of this type is, it doesn’t have to flush pending transactions to the backend store during shutdown. So shutdown time is shortened.

Every so often (configurable number)

It is similar to Every, except it syncs transactions to backend after set number of transactions. This is little better from performance point of view since it batches up the transactions to write to backend.

On Log switch

This is most preferred way. It syncs up transactions after log file switch. If it is configured with multiple log files, then it can provide the best performance and throughput since it could be writing redo logs in one disk while syncing them from another disk.

None

It doesn’t apply changes to datastore. Application generating transactions will take this responsibility.

In WonderDB we use this setting since WOnderDB database syncs up caches back to the data store in regular intervals.

Should support multiple log files in multiple folders

Todays commodity hardware have multiple hard drives (mostly for hadoop like workloads). For high performance, we need to assume logs will be stored in multiple folders (and not just one). So that look ahead logs can be written to one folder in some disk and at the same time, those logs are being read and applied to backend from other disk or folder. This will reduce the contention on disk heads (and disks) since reads and writes will happen on two separate folders possibly from separate disks.

For example, redo logs are configured to write in two folders A and B. Say folder A resides on disk1 and folder B resides on disk 2. Also say we have set max file size for redo logs is 100K. Meaning, after log file reaches 100K or above, new log file will be created and new logs will be written into the new log file on disk2. Now, we can start writing back logs to backend from disk1 or log file1 so that reads and writes will be happening on different disk.

How it works

btree_image.002Redo Transaction Manager as shown in diagram is the implementation of redo log journaling. Applications can use it just by adding its dependency in their maven project and start using it.

As described above it provids 4 apis.

Here is the flow application will take to integrate with the redo logger. First application should have some resource which it needs to serialize to disk or some other data store. It performs following 4 steps to use redo logger to make sure the resource is serialized in atomic/transactional way in to the data store.

  1. Application updates resource
  2. It calls redo transaction logger to log it in to the redo log.
  3. Redo transaction logger logs/serializes it into the redo log. It also manages life cycle of redo log files.
  4. Eventually in some other thread redo transaction applier, which is also part of the redo logger picks up this transaction from redo log and applies it to the actual data files.

For more information on its uses please refer to Getting Started page.

Follow vathavale:

Implementor of WonderDB. A transactional NoSql clustered database.