SQLite for beginners: Journal Modes

SQLite for beginners: Journal Modes

SQLite is really easy to start using, you just need to add the library to your project and start issuing SQL statements.

Now, because it's so easy to start using SQLite, it might feel like it's configuration free, but that's **not exactly true**.

There are a few key configurations you should be familiar with, and in this video we'll look at one of the most important configuration options, the `journal_mode` option.


"One of the nice things about SQLite is that you don't need to have Database Administrator (DBA) skills to use it, you can just add the SQLite library to your project and start issuing SQL statements.

Now, because of how easy it is to start using SQLite, it might feel like it's configuration free, but that's not exactly true. I mean, there's not a lot of configuration needed to start using SQLite, but there are a few key configuration options that you should be aware of, and in this video, we're going to talk about one of the most important configuration options, that I think every SQLite developer needs to understand.

So what configuration option am I talking about?

The journal_mode option.

Ok, before we dive in, I want to set some expectations.

The topic we're about to discuss is pretty complicated, and there are lots of details we could get into, but I'm going to gloss over many of the details, because my goal in this video, is to give you a big-picture view of how things work, and hopefully a useful mental model.

Ok, let's get back to it.

So, the journal mode configuration option can be set with the following pragma statement.

Ok, so what values can you set the journal_mode to?

Well, there are basically 2 main journal modes that you can use in SQLite:

  • Rollback Journal mode, which is the default
  • and WAL mode, which is short for Write-Ahead Log

Ok, technically there are a few additional variations of the Rollback Journal option, but, for the vast majority of people, these are the two options to consider.

So, which journal mode should you use?

Well, to answer this question, you need to know what each of these modes do, so let's take a big-picture look, at how these modes work.

The first mode we'll look at is the rollback journal mode, which again is the default mode in SQLite.

So, imagine this is the computer that's running your SQLite-powered application.

Now, for discussion's sake, we'll say that this application is a web server, but keep in mind, SQLite works in plenty of other use cases, such as mobile, desktop, IOT apps and much, much more.

Let's peek inside this computer, and we'll consider how things function.

In particular, we're going to focus on the disk drive, which stores the main database file, and a couple of journal files, which you'll see in a moment here.

Next, in memory, we'll consider the operating system buffers.

So how do these buffers get used?

Well, when a database connection is established, and a query is made, data from the disk file is first copied into these buffers, and then the query data is copied back to the connections memory, in userland.

Now, when a connection needs to write some data, it will make the changes to its memory, then when the change is saved, the updated pages, get copied back to the buffers, and eventually, they get persisted to disk.

So, is this last part of my description OK?

In other words, is it ok, for a database to simply write its changes to the buffers, and leave them there for the OS to eventually flush to disk, at its convenience?

Well, no, it's not really ok, if you're wanting an acid-compliant database, because you could lose the data that's sitting in the buffers, if the system crashes or loses power, so for durabilities sake, writes to the buffers should get pushed to disk at critical times, with a call to fsync. You'll see how this works in a moment here.

Ok, let's walk through a more detailed look at how read operations work in SQLite.

So, when a connection is made to query some data, the first thing that happens is a shared lock is obtained on the database file, but technically the lock is stored in the volatile, OS Buffers, for the file.

So what's this shared lock do, I mean, what's it used for?

Well, when there's a shared lock it means at least one read operation is happening in SQLite, but there could be more than one read operation, because a shared lock doesn't prevent concurrent read operations, but, it does prevent concurrent write operations.

Now, I want to re-iterate this subtle, but important fact, which is that in rollback mode, which is what we're talking about right now, you can have many concurrent readers, but you can't have concurrent readers and writers, which might be problematic for you, but you'll see how you can work around this limitation, by choosing WAL mode, but I'll talk more about WAL mode in a moment here.

Ok, so to read data from the SQLite database file, the content on the disk, gets copied over to the OS buffers, and then the data being read is copied into the process or connections memory, where it can be used by the application.

Now, by copying the data to the OS Buffers, subsequent reads of the same data will potentially be faster, because you're avoiding slow disk IO.

Ok, now let's consider how a database write happens in SQLite, while in rollback journal mode.

The first step in writing data to disk is obtaining a reserved lock on the database file, and again this happens in the OS Buffers.

Now, reserved locks can exist with the shared lock you see here, in other words, the reserved lock doesn't stop read operations from occurring, but it does signal to SQLite that a write operation is going to occur in the near future, so SQLite needs to be prepared.

One other thing about this reserved lock is that there can only be one reserved lock overall, so if another process wants to do a write, it will need to wait for the first writing process to complete.

The next thing that happens is a rollback journal file is created, which will get used in a moment here.

So what exactly goes into the rollback journal? I mean what data gets stored in this file?

Well, the name rollback, is kind of a hint, but you'll see what I mean in a moment here.

Now, as I said earlier, SQLite is an ACID-compliant database, and the A in ACID stands for atomicity. So, what does atomicity mean in this context?

It means that when you've got a transaction, that contains many statements like you see here, the transaction will either completely work, or it will completely fail, in other words, you'll never end up with a half-applied transaction, it's an all-or-nothing scenario.

So back to my question from a moment ago, which was "what data get's stored in this journal"?

Well, to achieve the atomicity effect, SQLite has to be able to undo a partially applied transaction, aka rollback, when things go wrong. So, you know how I called atomicity an all-or-nothing scenario? Well, this journal file is necessary to achieve the "nothing" part of all or nothing, let me explain.

You can think of this rollback journal file, as a partial copy or partial snapshot of the database, taken just before any changes are made to the database.

So which parts of the database get copied into this file?

Well, when a write is about to happen to the database, first the pages in the database that need to be updated, get pulled off disk, then the untouched pages and some metadata, get saved to the journal.

So, to summarize, this data here, is the data necessary to perform a rollback, or an undo, if something goes wrong.

The next thing that happens when writing to the database is that the database pages that need updating are mutated in the database connections memory.

So, basically, this modified data here, needs to get persisted to disk, somehow, but are we ready to write this data to disk?

No, not really, because this data in the journal buffers, hasn't necessarily been saved to disk yet, so we could lose this data if the app crashes, so to support atomicity, SQLite will call fsync which forces a physical write of data, from the buffers to disk.

By the way, fsyncs are pretty expensive, and I want you to be on the lookout for how many fsyncs happen in each journal mode, because it's a significant factor in terms of overall performance, between the two modes we're looking at.

Ok, now that the rollback journal has been persisted to disk, and the necessary changes have occurred to pages in user space here, we're ready to commit these changes to the actual database file.

So, the next step in writing data to SQLite is getting an exclusive lock on the database because as I said earlier, in Rollback mode, reads cannot be occurring when a write is occurring, thus the need for an exclusive lock by the writing process.

Ok, so how do you get an exclusive lock?

Well, first the reserved lock changes to a pending lock, then at an opportune time, it will escalate to an exclusive lock.

So, what's with the pending lock? I mean, why is it needed?

Well, let's go back to the reserved lock for a moment.

Now, while the writing connection has a reserved lock, and it's doing some of the preliminary work, reader connections can be coming and going as needed.

But, by switching the lock to a pending lock, two things happen:

  • First, existing read connections are still able to finish their read operations
  • but, with the pending lock, no new reader connections are allowed.

You can think of this pending lock as a way of draining any read connections down to zero, essentially preventing the writer from starvation.

What I mean is, if we didn't use this pending lock to stop new readers' connections, then in a high throughput system, the readers might continuously block the writer, essentially starving it from doing its work.

However, while in pending mode, any existing connections will complete their operations, and disconnect, and eventually all the readers will be gone, and the pending lock will escalate to an exclusive lock.

Now that the writer has an exclusive lock, it can make its changes to the OS buffers, and then SQLite will call fsync for the second time, to persist the pages to disk.

The last couple steps in the writer process are to first delete the rollback journal, at which point the write transaction is committed, and finally the exclusive lock is released, at which point other reader or writer processes can start interacting with the database.

Ok, so I just walked you through the happy path, where everything works out as planned, but now let's look at what happens when things go wrong.

So, imagine we're in the middle of a write operation, and we're writing to disk, and the system crashes or maybe loses power. So, in this case, it's very possible that some portion of the transaction was applied, but not all of it.

Ok, so what happens now?

Well, when the system comes up again, and the first connection is made to the database, SQLite notices a rollback journal, which is abnormal at startup because usually, the last SQLite connection deletes the rollback journal before disconnecting.

So now, SQLite checks to see if the journal is a "hot journal"

So, what's a "hot journal"?

Well, it's a journal that needs to be played back, or rolled back, onto the database file, to restore the database.

The next question you probably have is, what makes a journal a "hot journal"?

Well, a journal is a hot journal if all of the following conditions apply:

  • The rollback journal exists
  • The journal isn't empty
  • There is no reserved locks on the main database
  • The header of the journal is well-formed
  • And the journal doesn't reference super-journals, which are used for transactions across multiple databases

If these criteria are met, then SQLite knows it needs to fix the database, by applying the data in the rollback journal, back onto the database file.

So, to deal with the "hot journal", an exclusive lock is obtained on the database.

Next, the journal is read and reapplied to the database, which essentially rolls back the incomplete transaction, and puts the database back into its previous state.

And finally, the "hot journal" is deleted and the exclusive lock is removed.

Ok, so what we just talked about should hopefully give you a sense of how the rollback journal mode works, in SQLite.

All right, so what's the other journal mode option?

Well, it's the Write-Ahead log or WAL mode.

Ok, so how does WAL mode work?

Well, you know how the Rollback journal was a snapshot of the pre-existing data, and the mutations or writes were made directly in the database file?

Well, WAL mode does just the opposite of this.

In other words, in WAL mode, the mutations only happen in the WAL file initially, and then, at a later point and time, the changes get moved from the WAL file to the database file.

Let me walk you through an example.

This time I'll start by discussing how a write transaction works in WAL mode.

So, first, the pages to modify get copied into the connections memory.

Next, the pages are modified.

Then, this modified data is appended to the WAL file, and of course, one fsync call occurs.

And this is all that happens, at least for the moment.

Now when another writer comes along, it follows the same process of just appending the updated data to the WAL file.

And every subsequent writer continues to append to the WAL file until the WAL file grows to a certain size threshold, at which point, a special operation called a checkpoint, is triggered.

By default, the checkpoint occurs when 1000 or more pages are written to the WAL file.

So for example, let's say the WAL file has 999 pages in it, then another writer appends its data to the WAL file, pushing the page size to 1000, thus triggering a checkpoint.

So what happens now is, the connection with the transaction that triggered the checkpoint will then start moving the pages in the WAL file, to the database file, but here's one of the interesting things about WAL mode, which is the fact that a writer doesn't have to get exclusive access to the database. In other words, a single writer can be doing its work concurrently with many different readers.

But this sort of begs the question, how do you make sure the writer doesn't interfere with the readers?

Well, to understand how readers and writers stay out of each other's way, let's look at how readers work when using WAL mode.

So, here's a connection that needs to read some data from the database, so how does this reader know where to look for its data?

I mean, remember in WAL mode, you've potentially got valid data, or pages, in both the database file and the WAL file.

Well, first of all, when a reader starts, it marks the last valid commit in the WAL file, which is called the connection's "end mark".

Next, when the reader needs a page of content, it first checks the WAL file, to see if the page appears, somewhere before its "end mark". If it finds the page, it uses it. If however, the page it needs isn't in the WAL file, it will pull the page from the database file.

Now, it's important to keep in mind, that a writer could concurrently write pages to the WAL file, while the reader is concurrently running, but the reader will never consider any pages that come after its "end mark".

Additionally, writers will never checkpoint any pages that are after any of the reader's end marks.

This means that writers don't interfere with readers, so readers and a writer can be working at the same time, without any problem. This is great because it increases the maximum possible read-throughput.

But I will point out, that in WAL mode, you're still limited to only 1 writer at a time.

So, how does WAL mode work, when there's a problem?

Let's look at some examples.

Consider the scenario where a writer connection is writing to the WAL file when the system crashes.

In this scenario, the writer may have partially written data to the WAL file, but it never finished the write, by appending what's called a commit record.

So, when the system comes back up, it will notice the WAL file, which is abnormal at startup.

Next, SQLite looks at the WAL file and notices the partially written data, which is invalid, because there is no commit record. So, SQLite disregards the invalid data in the WAL file, at which point, the attempted transaction is effectively rolled back.

Ok, let's consider another scenario.

What if a crash happens while the checkpointing process is running.

In other words, what if data is getting moved from the WAL file to the database file, and a crash occurs?

So, let me ask you, does any sort of recovery process need to happen to fix the database file?

Well, no not exactly, because even though these pages are in a suspect state, well, they won't be referenced anyway, because the valid version of these pages, still exists in the WAL file, and these updated pages take precedence over, these database file pages.

Now, at some later point in time, checkpointing will start again, and it will attempt to move the pages from the WAL file to the database file, which is an idempotent operation, and this checkpointing process, essentially fixes the suspect pages in the database file, and then, of course, the WAL file content is removed.

So, what are the advantages and disadvantages of using WAL mode?

Well, WAL mode is faster, and you've got better concurrency, and you end up writing less data to disk, and when it does write data, the appends to the WAL file tend to be fast, sequential writes and WAL mode has less Fsyncs, and the checkpointing process, is able to better amortize the cost of fsyncs because it's batching transaction writes back to the database file.

There are a few disadvantages to WAL mode as well, such as:

  • WAL mode doesn't work over network file systems
  • It Doesn't offer atomic commits across multiple separate database files
  • and WAL mode might be slightly slower (1% or 2%) in applications that do mostly reads and very occasional writes.

So, which mode should you use?

Well, it depends on your situation, you'll have to weigh to pros and cons, but for most scenarios, you should strongly consider using WAL mode.

Hey, here at Mycelial, we're big fans of SQLite for reasons like we just talked about, and we're building tools around SQLite that will allow you to synchronize an SQLite database to multiple computers without conflict.

If you are interested in SQLite and what we're doing with it, please subscribe to our channel, and consider joining our newsletter to learn more."