Inner vs. Outer Joins

join

I want to teach you the difference between an inner and an outer join. We first need to think about what a join is. Simply, it’s when you combine two tables to make a new one. You’re not physically creating a new table when you join them together, but for the purposes of the query, you are creating a new virtual table. Every row now has the columns from both tables. So if TableA has columns Col1 and Col2 and TableB has columns Col3 and Col4, when you join these two tables, you’ll get Col1, Col2, Col3, and Col4. Just as with any query, you have the option of including all columns or excluding some, as well as filtering out rows.

Inner join. A join is combining the rows from two tables. An inner join attempts to match up the two tables based on the criteria you specify in the query, and only returns the rows that match. If a row from the first table in the join matches two rows in the second table, then two rows will be returned in the results. If there’s a row in the first table that doesn’t match a row in the second, it’s not returned; likewise, if there’s a row in the second table that doesn’t match a row in the first, it’s not returned.

Let’s take a look at an example, without using SQL. Say you have this table called CatOwners:

Name

Occupation

Alice

Sanitation Engineer

Bob

Inventory Coordinator

Charlie

Theater Admissions Specialist

and this table named Cats:

Name

OwnerName

Fluffy

Alice

Mittens

Alice

Snuggles

Bob

Simba

We inner join these tables by matching the Name column from CatOwners to the OwnerName column in Cats to produce this new table (renaming the Name column from Cats to CatName):

OwnerName

Occupation

CatName

Alice

Sanitation Engineer

Fluffy

Alice

Sanitation Engineer

Mittens

Bob

Inventory Coordinator

Snuggles

Alice is included twice since she matched two cats. Charlie doesn’t own a cat, so he’s not listed. Let’s say we wanted to see all the people, even if they didn’t happen to own a cat at the present. This would be a outer join.

Outer Join. A left join attempts to find match up the rows from the first table to rows in the second table. If it can’t find a match, it will return the columns from the first table and leave the columns from the second table blank (null).

Let’s join those two tables with an outer join:

OwnerName

Occupation

CatName

Alice

Sanitation Engineer

Fluffy

Alice

Sanitation Engineer

Mittens

Bob

Inventory Coordinator

Snuggles

Charlie

Theater Admissions Specialist

We now see Charlie, but he doesn’t have a cat, so the CatName column is blank (null). This is an outer join.

Now remember how I used the terms first and second? We can reverse the order of the join. Imagine that to join two tables, you write “take CatOwners join Cats if CatOwners’ Name equals Cats’ OwnerName”. You read that sentence from left to right. The first table is CatOwners and the second table is Cats. When you say left vs. right join, that’s what it means: read this join starting with the left (left join) or starting from the right (right join). Let’s right join those tables and see how it’s different:

OwnerName

Occupation

CatName

Alice

Sanitation Engineer

Fluffy

Alice

Sanitation Engineer

Mittens

Bob

Inventory Coordinator

Snuggles

Simb

Charlie is now missing since he didn’t match any rows in Cats, but Simba is shows up. Since we right outer joined, Simba’s now in the first table so he shows up even if he doesn’t have a matching record in the CatOwners table, which is now the second table.

Developers often use the shorthand left join to mean left outer join. Let’s think about why that is. Specifying right vs. left for an inner join makes no difference (and you can’t do that anyway with MySQL), since the query will only show a row if and only if there’s a match between the two tables. It doesn’t matter which table you start from, the results will always be the same. If you’re specifying a direction for the join, you want it to be an outer join. Most developers are used to code, including SQL, that reads from the left to right, so the left outer join is the default.

In SQL

I’ve given some examples in plain English with simple tables. Let’s dive into SQL to see how it looks with MySQL. You can follow along either using your local development instance of MySQL or by using SQL Fiddle. First, we create our tables and fill it with data:

Create Table CatOwners
(
 Name Varchar(50) Primary Key Not Null
,Occupation Varchar(50) Not Null
);

Create Table Cats
(
 Name Varchar(50) Primary Key Not Null
,CatOwnerName Varchar(50) Null
,Constraint Cats_Fk_CatOwnerName
    Foreign Key (CatOwnerName)
    References CatOwners(Name)
);

Insert Into CatOwners (Name, Occupation)
Values
 ('Alice', 'Sanitation Engineer')
,('Bob', 'Inventory Coordinator')
,('Charlie', 'Theater Admissions Specialist');

Insert Into Cats (Name, CatOwnerName)
Values
 ('Mittens', 'Alice')
,('Fluffy', 'Alice')
,('Snuggles', 'Bob')
,('Simba', null);

We can run a simple inner join two ways in MySQL. This first way is ANSI syntax or explicit join notation:

Select
     CO.Name As OwnerName
    ,CO.Occupation
    ,C.Name As CatName
From CatOwners CO
Inner Join Cats C
    On C.CatOwnerName = CO.Name
;

+-----------+-----------------------+----------+
| OwnerName | Occupation            | CatName  |
+-----------+-----------------------+----------+
| Alice     | Sanitation Engineer   | Fluffy   |
| Alice     | Sanitation Engineer   | Mittens  |
| Bob       | Inventory Coordinator | Snuggles |
+-----------+-----------------------+----------+
3 rows in set (0.00 sec)

You can also use implicit join syntax by putting the matching columns in the Where clause.

Select
     CO.Name As OwnerName
    ,CO.Occupation
    ,C.Name As CatName
From CatOwners CO, Cats C
Where
    C.CatOwnerName = CO.Name
;

These produce the same query plan, so there’s no performance difference. Stick to explicit syntax since it’s more common (I showed you implicit in case you encountered it in the wild).

We also don’t need the Inner since it’s implied:

Select
     CO.Name As OwnerName
    ,CO.Occupation
    ,C.Name As CatName
From CatOwners CO
Join Cats C
On
    C.CatOwnerName = CO.Name
;

How do you do a left outer join in SQL? Just like this:

Select
     CO.Name As OwnerName
    ,CO.Occupation
    ,C.Name As CatName
From CatOwners CO
Left Outer Join Cats C
    On C.CatOwnerName = CO.Name
;

+-----------+-------------------------------+----------+
| OwnerName | Occupation                    | CatName  |
+-----------+-------------------------------+----------+
| Alice     | Sanitation Engineer           | Fluffy   |
| Alice     | Sanitation Engineer           | Mittens  |
| Bob       | Inventory Coordinator         | Snuggles |
| Charlie   | Theater Admissions Specialist | NULL     |
+-----------+-------------------------------+----------+
4 rows in set (0.00 sec)

You can’t use implicit join syntax with outer joins since MySQL expects there to be an on clause. I should also note that if you reference the second table in the where clause, you are basically converting it to an inner join since every column will be null:

Select
     CO.Name As OwnerName
    ,CO.Occupation
    ,C.Name As CatName
From CatOwners CO
Left Outer Join Cats C
    On C.CatOwnerName = CO.Name
Where
    C.Name <> 'Frank'
;

+-----------+-----------------------+----------+
| OwnerName | Occupation            | CatName  |
+-----------+-----------------------+----------+
| Alice     | Sanitation Engineer   | Fluffy   |
| Alice     | Sanitation Engineer   | Mittens  |
| Bob       | Inventory Coordinator | Snuggles |
+-----------+-----------------------+----------+
3 rows in set (0.00 sec)

You could avoid this by putting that criteria in the On clause or dealing with nulls in your criteria:

Select
     CO.Name As OwnerName
    ,CO.Occupation
    ,C.Name As CatName
From CatOwners CO
Left Outer Join Cats C
    On C.CatOwnerName = CO.Name And C.Name <> 'Frank'
;

Select
     CO.Name As OwnerName
    ,CO.Occupation
    ,C.Name As CatName
From CatOwners CO
Left Outer Join Cats C
    On C.CatOwnerName = CO.Name
Where
    (C.Name <> 'Frank' Or C.Name Is Null)
;

If we want the query to read the other way, from right to left (or bottom to top), we can specify a right outer join:

Select
     CO.Name As OwnerName
    ,CO.Occupation
    ,C.Name As CatName
From CatOwners CO
Right Outer Join Cats C
    On C.CatOwnerName = CO.Name
;

+-----------+-----------------------+----------+
| OwnerName | Occupation            | CatName  |
+-----------+-----------------------+----------+
| Alice     | Sanitation Engineer   | Fluffy   |
| Alice     | Sanitation Engineer   | Mittens  |
| Bob       | Inventory Coordinator | Snuggles |
| NULL      | NULL                  | Simba    |
+-----------+-----------------------+----------+
4 rows in set (0.00 sec)

With both right and left join, you don’t need to use the keyword outer:

# Left (Outer) Join
Select
     CO.Name As OwnerName
    ,CO.Occupation
    ,C.Name As CatName
From CatOwners CO
Left Join Cats C
    On C.CatOwnerName = CO.Name
;

# Right (Outer) Join
Select
     CO.Name As OwnerName
    ,CO.Occupation
    ,C.Name As CatName
From CatOwners CO
Right Join Cats C
    On C.CatOwnerName = CO.Name
;

Warning in the right

If you’re reading this article, you’re probably an English speaker. Most technical documentation is written in left to right, top to bottom languages. Most programming languages are written the same way. That all to say, the mental model for most people reading your SQL will be left to right, top to bottom. Stay away from right outer joins. It’s unexpected, like the Spanish Inquisition.

If you enjoyed this article, I have good news. I’m writing a book for developers who use MySQL. Sign-up below.

Want to learn the good parts of MySQL? You can subscribe below and I’ll send you updates on a new book I’m writing that helps you find the goodies in MySQL. If you sign-up, I’ll also send out stuff on MySQL like tips and tutorials.



 No spam


Advanced Example

I’m not going to get into this example too much, but you’re welcome to try it on your own. Here’s some sample data.

Create Table Professors
(
 ProfessorID Int Primary Key Auto_Increment
,Name Varchar(50) Not Null
);

Create Table Students
(
 StudentID Int Primary Key Auto_Increment
,Name Varchar(50) Not Null
);

Create Table Courses
(
 CourseID Int Primary Key Auto_Increment
,ProfessorID Int Not Null
,Name Varchar(50) Not Null
,Constraint CourseEnrollments_Fk_ProfessorID Foreign Key (ProfessorID) References Professors(ProfessorID)
);

Create Table CourseEnrollments
(
 CourseID Int Not Null
,StudentID Int Not Null
,Constraint CourseEnrollments_Pk Primary Key (CourseID, StudentID)
,Constraint CourseEnrollments_Fk_CourseID Foreign Key (CourseID) References Courses(CourseID)
,Constraint CourseEnrollments_Fk_StudentID Foreign Key (StudentID) References Students(StudentID)
);

Insert Into Professors (Name)
Values
 ('Plum')
,('Friedman')
,('Jones');

Insert Into Students (Name)
Values
 ('Brill')
,('McGowan')
,('Boyer')
,('Snyder')
,('Phillips')
,('Cowan')
,('Hopkins')
,('Franklin')
,('Cooper')
;

Insert Into Courses (ProfessorID, Name)
Values
 (1, 'Economics 101')
,(1, 'Quantitative Trading 301')
,(2, 'Databases 102')
,(3, 'Archaeology 102')
,(3, 'Ancient South American Cultures 204');

Insert Into CourseEnrollments (CourseID, StudentID)
Values
 (1, 1)
,(1, 3)
,(1, 5)
,(2, 2)
,(2, 4)
,(2, 8)
,(3, 8)
,(3, 7)
,(3, 3)
,(4, 1)
,(4, 4)
,(4, 6)
,(4, 2)
;

What happens when I write:

Select
 C.Name As CourseName
,S.Name As StudentName
From Courses C
Right Outer Join CourseEnrollments CE On CE.CourseID = C.CourseID
Right Outer Join Students S On S.StudentID = CE.StudentID
;

What about:

Select
 C.Name As CourseName
,S.Name As StudentName
From Courses C
Join CourseEnrollments CE On CE.CourseID = C.CourseID
Right Outer Join Students S On S.StudentID = CE.StudentID
;

Interesting, right?

References

Wikipedia
Reference.com

Tagged with: ,
Posted in Databases, IT, Open-Source
3 comments on “Inner vs. Outer Joins
  1. Lalit says:

    Nice explanation Noel, I always like Post with test case or examples which helps people to test the same and learn more about it.

    Thanks

  2. Vincent says:

    “These produce the same query plan, so there’s no performance difference.”

    Just a note from the trenches:
    implicite joins do not always give the same performance as explicit ones. The database will choose an queryplan as best it can by looking at the table content and the WHERE clause, to see which order will eliminate as many rows as possible early on. Sometimes it will simply get this wrong and leave you with a slow query that seems to have no real reason for being slow.

    A side effect of the plan being based on table statistics is that the queryplan may change as the table content changes, so a queryplan that works just fine today, can be much slower (of faster) tomorrow because the table reached a number of rows that triggered the planner to swap the join order.

    So I would also say: stick with explicit joins, if not for readability then for predictability.

  3. Hi,

    A comment to Vincent’s comment:

    The MySQL optimizer does not handle what is here called implicit join differently from explicit (INNER) JOIN. In both cases, the optimizer may switch the order in which it processes the tables. In order to force a particular order, you need to use STRAIGHT_JOIN instead of INNER JOIN. For outer joins, on the other hand, there is only one possible join order. E.g., for LEFT JOIN, the left-hand table needs to be processed before the right-hand table.

    While it is true that forcing a specific join order (e.g., using STRAIGHT_JOIN) may give you predictable performance, it may also give non-optimal plans when your data or your server configuration changes. Also, you may miss out on optimizer improvements when upgrading to newer versions.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>