Calculus

Set Theory

Set Theory Logo

Relational Databases

Nowadays, data is the most valuable asset of any organization. The way how data is processed and managed often determines the success or failure of a project or initiative.

One of the most effective approaches to managing data is the relational data model. It is based on the concept of relation and first-order predicate logic.

Relational Model

Recall that a binary relation R from set A to set B is a subset of the Cartesian product A × B. We can expand this definition and introduce n-ary relations. If A1, A2, ..., An are sets, an n-ary relation is a subset of the Cartesian product A1 × A2 × ⋯ × An.

A binary relation is then a special case of an n-ary relation. It involves 2 sets and can be represented by ordered pairs.

When the degree is n = 3, we have a ternary relation involving 3 sets. It can be described by a set of triples.

An n-ary relation involves n sets and is described by a set of n-tuples.

In a relational database, the n-tuples are simply called tuples or records. The relations themselves are called tables. Each table consists of columns which are called attributes or fields.

A relational schema is defined by the names of all relations (tables) and their attributes. For example, the Student relation is given by STUDENT (studentID, firstName, lastName, major, age, email, phone):

A DB table describing Student entities
Figure 1.

Keep in mind that the columns of the table (attributes) in this figure are listed vertically. The mandatory attributes on the diagram are indicated in bold. The studentID attribute is the primary key \(\left(PK\right),\) i.e. the field that uniquely identifies each tuple.

The number of attributes in a relation is known as degree of the relation. The Student relation in this example has degree \(7.\)

Each row (tuple or record) of a relation represents a single object, fact, or any other entity. So, the Student relation contains basic data about all students of a college. Records in this table might look like this:

Few Tuples of the Student table
Figure 2.

The number of tuples in a relation is known as cardinality of the relation.

A relational model can have from several to thousands of tables depending on the complexity of the problem. The tables can be connected by various relationships such as "one-to-one", "one-to-many", etc. For example, the following entity relationship diagram shows relationships between Student, Exam, Course, and Teacher:

Student-Course-Teacher-Exam relationships
Figure 3.

Examples of "one-to-many" relationships:

A foreign key \(\left(FK\right)\) is one or several attributes in one table that refer to the primary key of another table. So, the studentID field is a foreign key in the Exam_Result table since it is linked to the Student table where this field is the primary key.

The relational data model was proposed in \(1970\) by English computer scientist Edgar Frank Codd \(\left({1923 - 2003}\right)\) when he worked for IBM.

The fundamental assumption of the relation model is that all data is stored in database and represented as \(n\text{-ary}\) relations. As with any other relations, we can perform a number of operations on them. In the next section, we consider the most important relational operations.

Basic Relational Operations

A set of operations used to manipulate and extract data from relations is called relational algebra. Each operation takes one or more input relations and transform them to produce an output relation. There are \(6\) fundamental operations on relations:

Select Operation

A selection is a unary operation, denoted \({\sigma _\varphi }\left( R \right).\) It takes the input relation \(R\) and selects from it all those tuples (records) that satisfy the proposition \(\varphi.\)

Example

Let \(R\) be the Student relation above. Suppose we want to find all students who major in CS and under the age of \(20.\) The logical condition \(\varphi\) is given by

\[\varphi : \text{major} = \text{"CS"} \wedge \text{age} \lt 20.\]

The selection \({\sigma _\varphi }\left( R \right)\) returns only one record:

Selection from the Student table with major=CS, age < 20.
Figure 4.

Project Operation

A projection is a unary operation, denoted \({\Pi_{a_1,a_2,\ldots,a_n}}\left( R \right).\) This operation is similar to selection operation, but filters attributes instead of records. As a result, the output tuples contain only those attributes that match the set \(\left\{ {{a_1},{a_2}, \ldots ,{a_n}} \right\}.\)

Example

The projection of Student relation with attributes {studentID, major}

\[{\Pi_{{a_1},{a_2}, \ldots ,{a_n}}}\left( R \right) = {\Pi _{\text{studentID},\text{major}}}\left( \text{Student} \right)\]

produces a new relation:

Projection of Student table restricted to ID and Major attributes.
Figure 5.

Union Operation

The union of two relations \(R_1\) and \(R_2\) is the relation \(R_1 \cup R_2\) containing all records that appear in \(R_1,\) \(R_2,\) or both. The input relations \(R_1\) and \(R_2\) must have the same schema.

Example

Let \(R_1\) and \(R_2\) be two relations that are selections from the Teacher table:

1st Selection from Teacher table
Figure 6.
2nd Selection from Teacher table
Figure 7.

The union of the relations \(R_1 \cup R_2\) contains all records from either \(R_1,\) or \(R_2,\) or both of them.

Union of two relations
Figure 8.

Set Difference Operation

The difference of two relations \(R_1\) and \(R_2\) is the relation \(R_1 \backslash R_2\) containing all records that appear in \(R_1,\) but not in \(R_2.\) The input relations \(R_1\) and \(R_2\) must have the same schema.

Example

Using the above relations \(R_1\) and \(R_2,\) we find their difference \(R_1 \backslash R_2:\)

Difference of two relations
Figure 9.

Cartesian Product Operation

The Cartesian product of two relations \(R_1\) and \(R_2\) is the relation \(R_1 \times R_2\) that combines the records of the \(1\text{st}\) relation with all the records of the \(2\text{nd}\) relation. The Cartesian product is also referred to as Cross product or Cross Join.

Example

Suppose we have two relations \(R_1\) and \(R_2:\)

A projection of the Teacher relation.
Figure 10.
A projection of the Course relation.
Figure 11.

The Cartesian product \(R_1 \times R_2\) is represented by the following combined relation:

Cartesian product of two relations
Figure 12.

Rename Operation

A rename is a unary operation, written as \({\rho_{a \backslash b}}\left( R \right).\) The rename operator returns the relation \(R\) in which the \(b\) attribute is renamed to an \(a\) attribute.

The relation \(R\) can represent a relational algebra expression. Using the rename operation, we can refer to a relational expression by name.

Example

To rename the courseName attribute to courseTitle in the Course relation, we use the following operator:

\[{\rho_{a \backslash b}}\left( R \right) = {\rho_{\text{courseTitle} \backslash \text{courseName}}}\left( {\text{Course}} \right).\]

Additional Relational Operations

There are a few additional operations that can simplify database queries. We consider two of them - Natural Join and Outer Join.

Natural Join Operation

The Natural Join operation, denoted \(\Join,\) is used to combine records from two (or more) tables based on the common attributes. Common attributes must have the same names and same datatypes. The resulting joined relation contains all combinations of records from both tables that have equal values on the common attributes. All other records which do not match will be eliminated from the joined relation.

Example

There are two relations - Course and Test.

Course relation
Figure 13.
Test relation
Figure 14.

Both relations have the common attribute - courseName. The matching value is only Linear Algebra. Hence, the natural join of these relations is given by

Natural join of Course and Test relations.
Figure 15.

Outer Join Operation

When applying a natural join, all records with unmatched values are eliminated from the resulting relation. An outer join retains such records by replacing missing data with null and thus avoids lost of information. The null means that the value is unknown or does not exist.

There are two relations - Course and Test.

There are a few types of outer joins:

Examples

The left outer join of the relations Course and Test above looks as follows:

Left Outer Join of two relations.
Figure 16.

Similarly, we can build the right outer join of these relations:

Right Outer Join of two relations.
Figure 17.