# Relational Databases

Nowadays, data is the most valuable asset of any organization. The way how data is processed and managed often determines the success or failure of a project or initiative.

One of the most effective approaches to managing data is the relational data model. It is based on the concept of relation and first-order predicate logic.

## Relational Model

Recall that a binary relation \(R\) from set \(A\) to set \(B\) is a subset of the Cartesian product \(A \times B.\) We can expand this definition and introduce \(n\text{-ary}\) relations. If \(A_1, A_2,\dots,A_n\) are sets, an \(n\text{-ary}\) relation is a subset of the Cartesian product \(A_1 \times A_2 \times \cdots \times A_n.\)

A binary relation is then a special case of an \(n\text{-ary}\) relation. It involves \(2\) sets and can be represented by ordered pairs.

When the degree is \(n = 3,\) we have a ternary relation involving \(3\) sets. It can be described by a set of triples.

An \(n\text{-ary}\) relation involves \(n\) sets and is described by a set of \(n\text{-tuples}.\)

In a relational database, the \(n\text{-tuples}\) are simply called tuples or records. The relations themselves are called tables. Each table consists of columns which are called attributes or fields.

A relational schema is defined by the names of all relations (tables) and their attributes. For example, the **Student** relation is given by **STUDENT** (**studentID**, **firstName**, **lastName**, **major**, **age**, **email**, **phone**):

Keep in mind that the columns of the table (attributes) in this figure are listed vertically. The mandatory attributes on the diagram are indicated in bold. The **studentID** attribute is the primary key \(\left(PK\right),\) i.e. the field that uniquely identifies each tuple.

The number of attributes in a relation is known as degree of the relation. The **Student** relation in this example has degree \(7.\)

Each row (tuple or record) of a relation represents a single object, fact, or any other entity. So, the **Student** relation contains basic data about all students of a college. Records in this table might look like this:

The number of tuples in a relation is known as cardinality of the relation.

A relational model can have from several to thousands of tables depending on the complexity of the problem. The tables can be connected by various relationships such as "one-to-one", "one-to-many", etc. For example, the following entity relationship diagram shows relationships between **Student**, **Exam**, **Course**, and **Teacher**:

Examples of "one-to-many" relationships:

- A
**Teacher**may have several**Courses**. - A
**Student**sometimes takes an**Exam**for the same**Course**several times.

A foreign key \(\left(FK\right)\) is one or several attributes in one table that refer to the primary key of another table. So, the **studentID** field is a foreign key in the **Exam_Result** table since it is linked to the **Student** table where this field is the primary key.

The relational data model was proposed in \(1970\) by English computer scientist Edgar Frank Codd \(\left({1923 - 2003}\right)\) when he worked for IBM.

The fundamental assumption of the relation model is that all data is stored in database and represented as \(n\text{-ary}\) relations. As with any other relations, we can perform a number of operations on them. In the next section, we consider the most important relational operations.

## Basic Relational Operations

A set of operations used to manipulate and extract data from relations is called relational algebra. Each operation takes one or more input relations and transform them to produce an output relation. There are \(6\) fundamental operations on relations:

- Select: \(\sigma\)
- Project: \(\Pi\)
- Union: \(\cup\)
- Difference: \(-\)
- Product: \(\times\)
- Rename: \(\rho\)

### Select Operation

A selection is a unary operation, denoted \({\sigma _\varphi }\left( R \right).\) It takes the input relation \(R\) and selects from it all those tuples (records) that satisfy the proposition \(\varphi.\)

#### Example

Let \(R\) be the **Student** relation above. Suppose we want to find all students who **major** in CS and under the **age** of \(20.\) The logical condition \(\varphi\) is given by

The selection \({\sigma _\varphi }\left( R \right)\) returns only one record:

### Project Operation

A projection is a unary operation, denoted \({\Pi_{a_1,a_2,\ldots,a_n}}\left( R \right).\) This operation is similar to selection operation, but filters attributes instead of records. As a result, the output tuples contain only those attributes that match the set \(\left\{ {{a_1},{a_2}, \ldots ,{a_n}} \right\}.\)

#### Example

The projection of **Student** relation with attributes {**studentID**, **major**}

produces a new relation:

### Union Operation

The union of two relations \(R_1\) and \(R_2\) is the relation \(R_1 \cup R_2\) containing all records that appear in \(R_1,\) \(R_2,\) or both. The input relations \(R_1\) and \(R_2\) must have the same schema.

#### Example

Let \(R_1\) and \(R_2\) be two relations that are selections from the **Teacher** table:

The union of the relations \(R_1 \cup R_2\) contains all records from either \(R_1,\) or \(R_2,\) or both of them.

### Set Difference Operation

The difference of two relations \(R_1\) and \(R_2\) is the relation \(R_1 \backslash R_2\) containing all records that appear in \(R_1,\) but not in \(R_2.\) The input relations \(R_1\) and \(R_2\) must have the same schema.

#### Example

Using the above relations \(R_1\) and \(R_2,\) we find their difference \(R_1 \backslash R_2:\)

### Cartesian Product Operation

The Cartesian product of two relations \(R_1\) and \(R_2\) is the relation \(R_1 \times R_2\) that combines the records of the \(1\text{st}\) relation with all the records of the \(2\text{nd}\) relation. The Cartesian product is also referred to as Cross product or Cross Join.

#### Example

Suppose we have two relations \(R_1\) and \(R_2:\)

The Cartesian product \(R_1 \times R_2\) is represented by the following combined relation:

### Rename Operation

A rename is a unary operation, written as \({\rho_{a \backslash b}}\left( R \right).\) The rename operator returns the relation \(R\) in which the \(b\) attribute is renamed to an \(a\) attribute.

The relation \(R\) can represent a relational algebra expression. Using the rename operation, we can refer to a relational expression by name.

#### Example

To rename the **courseName** attribute to **courseTitle** in the **Course** relation, we use the following operator:

## Additional Relational Operations

There are a few additional operations that can simplify database queries. We consider two of them - Natural Join and Outer Join.

### Natural Join Operation

The Natural Join operation, denoted \(\Join,\) is used to combine records from two (or more) tables based on the common attributes. Common attributes must have the same names and same datatypes. The resulting joined relation contains all combinations of records from both tables that have equal values on the common attributes. All other records which do not match will be eliminated from the joined relation.

#### Example

There are two relations - **Course** and **Test**.

Both relations have the common attribute - **courseName**. The matching value is only **Linear Algebra**. Hence, the natural join of these relations is given by

### Outer Join Operation

When applying a natural join, all records with unmatched values are eliminated from the resulting relation. An outer join retains such records by replacing missing data with null and thus avoids lost of information. The null means that the value is unknown or does not exist.

There are two relations - **Course** and **Test**.

There are a few types of outer joins:

- Left Outer Join returns records from the left table.
- Right Outer Join returns records from the right table.
- Full Outer Join returns data from both tables.

#### Examples

The left outer join of the relations **Course** and **Test** above looks as follows:

Similarly, we can build the right outer join of these relations: