Designing Multi-Tenanted Databases

Designing database tables for many customers is a surprisingly foreign concept for some programmers. I’ve been in interviews where a programmer will sketch out a normal object model, and then proceed to suggest that for the database, each customer has their own set of tables, each prefixed with the customer name. What I’d like to do in this post is introduce the concept of multi-tenancy and then show methods you can use to do it right, instead of hacky ways like the one I just mentioned. Multi-tenancy is when several customers (tenants) share the same database and codebase but can only see their own data.

The basic idea

The core method of adding multi-tenancy to your database is adding a column to every table you’d like to segment that indicates the owner of the data. I’ll call this column a tenant id. We’ll start with a basic example: a todo app. Let’s say you write a program that you’re going to use on your own to track your tasks. The basic model is this: the task description, when it was created, due date, and when it was completed. This is the SQL to create a basic Tasks table:

Create Table Tasks
(
     TaskID Int Auto_Increment Not Null Primary Key
    ,Description Varchar(50) Not Null
    ,CreatedDate Timestamp Default Now() Not Null
    ,DueDate Datetime Null
    ,CompletedDate Datetime Null
);

You write your app and start using it. Your boss notices that you are 100% more productive than your other team members, so he asks what your secret is and you tell him about this app. He then asks you to create a Web-based todo system for the whole team.

So, you add a new table called Users:

Create Table Users
(
     UserID Int Auto_Increment Not Null Primary Key
    ,Username Varchar(50) Not Null
    ,FirstName Varchar(50) Not Null
    ,LastName Varchar(50) Not Null
    ,Password Varbinary(50) Not Null
    ,Salt Varbinary(50) Not Null
);

Then you add a UserID column to tasks to tie it back to the user:

Create Table Tasks
(
     TaskID Int Auto_Increment Not Null Primary Key
    ,UserID Int Not Null
    ,Description Varchar(50) Not Null
    ,CreatedDate Timestamp Default Now() Not Null
    ,DueDate Datetime Null
    ,CompletedDate Datetime Null
    ,Constraint Tasks_fk_UserID Foreign Key (UserID) References Users (UserID)
);

Now, when we get the user’s tasks, we just specify which user in the where clause:

Select Description From Tasks Where UserID = 1;

You see that the UserID segments the data, and allows multiple users to track their tasks in the system, but they only see their own task. The application in this case would get a user id from a session and then pass it to this query. Something to note is that since MySQL implicitly creates an index on foreign key’d columns, this query already has an index.

Connected data

Tables are not always directly segmented by user or customer. Say you have a table for orders. Here’s the info you’d like to store about an order: an order id, the date, and the address. We’ll also want to have line items with a product id, a quantity, and a unit cost, but we’ll keep those in another table.

Create Table Orders
(
     OrderID Int Auto_Increment Not Null Primary Key
    ,CustomerID Int Not Null
    ,OrderDate Timestamp Default Now() Not Null
    ,Address1 Varchar(50) Not Null
    ,Address2 Varchar(50) Not Null
    ,City Varchar(50) Not Null
    ,State Varchar(2) Not Null
    ,Constraint Orders_fk_CustomerID Foreign Key (CustomerID) References Customers(CustomerID)
);
Create Table LineItems
(
     OrderID Int Not Null Primary Key
    ,ProductID Int Not Null
    ,Quantity Int Not Null
    ,UnitCost Decimal(10,2) Not Null
    ,Constraint LineItems_fk_OrderID Foreign Key (OrderID) References Orders(OrderID)
    ,Constraint LineItems_pk Primary Key (OrderID, ProductID)
);

Now, we want to get all of a customer’s line items in order to calculate the lifetime customer spend. The LineItems table is segmented by customer because each line item is tied to an order and each order is tied to a customer.

Select Sum(Quantity * UnitCost) As LifetimeSpend
From LineItems LI
Join Orders O On O.OrderID = LI.OrderID
Where O.CustomerID = @CustomerID

We didn’t have to put a CustomerID column on the LineItems table, because its parent table already had one. So, Orders was segmented by CustomerID and LineItems was segmented by OrderID.

Not every table in the system will need to be segmented by the tenant id. There’s a base set of data that’s used by everyone in your system. Amazon has data on your orders and credit cards, but they have a lot of data about products, inventory, etc. Another example is lookup tables like order statuses or zipcodes. Not only is it extra work to maintain these types of tables for each customer, it’s a very bad idea: you want one version of truth for every customer.

When to use the multi-tenant pattern

In almost every business system, you have the company, its customers, and the employee users. In an application like Excel, you have a solitary and implicit company, customer, and user all rolled into one, so a multi-tenanted pattern makes no sense. You need to design for multi-tenancy when you have more than one company, customer, or user.

In the next post, we’ll cover hierarchical data, where the customer has access to data based on where they are in a hierarchy.

Want to hear more about database design, specifically with MySQL? Well, I’m writing a book! You can subscribe below and I’ll send you updates. I’ll also send out stuff on MySQL like tips and tutorials.



 No spam


Posted in Databases, IT
4 comments on “Designing Multi-Tenanted Databases
  1. Valerie Parham-Thompson says:

    Noel,

    This design can result in contention on tables in a highly scaled system. Is there a way to mitigate that contention with this design?

    Valerie

  2. Noel says:

    Sure! As I said above, foreign keys will automatically create an index which could be used in queries that reference the tenant id. If you need to go beyond that, MySQL supports partitioning, creating several physical files for each logical table. Once you grow beyond that, you’d have to start thinking about sharding or aggressive service-oriented architecture. What are your thoughts?

  3. Valerie Parham-Thompson says:

    I like having separate schema for each user, with shared schema for the zipcodes, etc. Makes it easier to drop out old users, too. Another benefit is you can move the whole schema more easily than a set of rows if you get a “hot” user or run against disk space issues.

    Lots of ways to do it, no? :)

  4. Tony Marston says:

    I have already implemented this as an optional feature in my own framework. Take a look at http://www.tonymarston.net/php-mysql/virtual-private-database.html

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>