Relational Databases
Nowadays, data is the most valuable asset of any organization. The way how data is processed and managed often determines the success or failure of a project or initiative.
One of the most effective approaches to managing data is the relational data model. It is based on the concept of relation and first-order predicate logic.
Relational Model
Recall that a binary relation R from set A to set B is a subset of the Cartesian product A × B. We can expand this definition and introduce n-ary relations. If A1, A2, ..., An are sets, an n-ary relation is a subset of the Cartesian product A1 × A2 × ⋯ × An.
A binary relation is then a special case of an n-ary relation. It involves 2 sets and can be represented by ordered pairs.
When the degree is n = 3, we have a ternary relation involving 3 sets. It can be described by a set of triples.
An n-ary relation involves n sets and is described by a set of n-tuples.
In a relational database, the n-tuples are simply called tuples or records. The relations themselves are called tables. Each table consists of columns which are called attributes or fields.
A relational schema is defined by the names of all relations (tables) and their attributes. For example, the Student relation is given by STUDENT (studentID, firstName, lastName, major, age, email, phone):
Keep in mind that the columns of the table (attributes) in this figure are listed vertically. The mandatory attributes on the diagram are indicated in bold. The studentID attribute is the primary key \(\left(PK\right),\) i.e. the field that uniquely identifies each tuple.
The number of attributes in a relation is known as degree of the relation. The Student relation in this example has degree \(7.\)
Each row (tuple or record) of a relation represents a single object, fact, or any other entity. So, the Student relation contains basic data about all students of a college. Records in this table might look like this:
The number of tuples in a relation is known as cardinality of the relation.
A relational model can have from several to thousands of tables depending on the complexity of the problem. The tables can be connected by various relationships such as "one-to-one", "one-to-many", etc. For example, the following entity relationship diagram shows relationships between Student, Exam, Course, and Teacher:
Examples of "one-to-many" relationships:
- A Teacher may have several Courses.
- A Student sometimes takes an Exam for the same Course several times.
A foreign key \(\left(FK\right)\) is one or several attributes in one table that refer to the primary key of another table. So, the studentID field is a foreign key in the Exam_Result table since it is linked to the Student table where this field is the primary key.
The relational data model was proposed in \(1970\) by English computer scientist Edgar Frank Codd \(\left({1923 - 2003}\right)\) when he worked for IBM.
The fundamental assumption of the relation model is that all data is stored in database and represented as \(n\text{-ary}\) relations. As with any other relations, we can perform a number of operations on them. In the next section, we consider the most important relational operations.
Basic Relational Operations
A set of operations used to manipulate and extract data from relations is called relational algebra. Each operation takes one or more input relations and transform them to produce an output relation. There are \(6\) fundamental operations on relations:
- Select: \(\sigma\)
- Project: \(\Pi\)
- Union: \(\cup\)
- Difference: \(-\)
- Product: \(\times\)
- Rename: \(\rho\)
Select Operation
A selection is a unary operation, denoted \({\sigma _\varphi }\left( R \right).\) It takes the input relation \(R\) and selects from it all those tuples (records) that satisfy the proposition \(\varphi.\)
Example
Let \(R\) be the Student relation above. Suppose we want to find all students who major in CS and under the age of \(20.\) The logical condition \(\varphi\) is given by
The selection \({\sigma _\varphi }\left( R \right)\) returns only one record:
Project Operation
A projection is a unary operation, denoted \({\Pi_{a_1,a_2,\ldots,a_n}}\left( R \right).\) This operation is similar to selection operation, but filters attributes instead of records. As a result, the output tuples contain only those attributes that match the set \(\left\{ {{a_1},{a_2}, \ldots ,{a_n}} \right\}.\)
Example
The projection of Student relation with attributes {studentID, major}
produces a new relation:
Union Operation
The union of two relations \(R_1\) and \(R_2\) is the relation \(R_1 \cup R_2\) containing all records that appear in \(R_1,\) \(R_2,\) or both. The input relations \(R_1\) and \(R_2\) must have the same schema.
Example
Let \(R_1\) and \(R_2\) be two relations that are selections from the Teacher table:
The union of the relations \(R_1 \cup R_2\) contains all records from either \(R_1,\) or \(R_2,\) or both of them.
Set Difference Operation
The difference of two relations \(R_1\) and \(R_2\) is the relation \(R_1 \backslash R_2\) containing all records that appear in \(R_1,\) but not in \(R_2.\) The input relations \(R_1\) and \(R_2\) must have the same schema.
Example
Using the above relations \(R_1\) and \(R_2,\) we find their difference \(R_1 \backslash R_2:\)
Cartesian Product Operation
The Cartesian product of two relations \(R_1\) and \(R_2\) is the relation \(R_1 \times R_2\) that combines the records of the \(1\text{st}\) relation with all the records of the \(2\text{nd}\) relation. The Cartesian product is also referred to as Cross product or Cross Join.
Example
Suppose we have two relations \(R_1\) and \(R_2:\)
The Cartesian product \(R_1 \times R_2\) is represented by the following combined relation:
Rename Operation
A rename is a unary operation, written as \({\rho_{a \backslash b}}\left( R \right).\) The rename operator returns the relation \(R\) in which the \(b\) attribute is renamed to an \(a\) attribute.
The relation \(R\) can represent a relational algebra expression. Using the rename operation, we can refer to a relational expression by name.
Example
To rename the courseName attribute to courseTitle in the Course relation, we use the following operator:
Additional Relational Operations
There are a few additional operations that can simplify database queries. We consider two of them - Natural Join and Outer Join.
Natural Join Operation
The Natural Join operation, denoted \(\Join,\) is used to combine records from two (or more) tables based on the common attributes. Common attributes must have the same names and same datatypes. The resulting joined relation contains all combinations of records from both tables that have equal values on the common attributes. All other records which do not match will be eliminated from the joined relation.
Example
There are two relations - Course and Test.
Both relations have the common attribute - courseName. The matching value is only Linear Algebra. Hence, the natural join of these relations is given by
Outer Join Operation
When applying a natural join, all records with unmatched values are eliminated from the resulting relation. An outer join retains such records by replacing missing data with null and thus avoids lost of information. The null means that the value is unknown or does not exist.
There are two relations - Course and Test.
There are a few types of outer joins:
- Left Outer Join returns records from the left table.
- Right Outer Join returns records from the right table.
- Full Outer Join returns data from both tables.
Examples
The left outer join of the relations Course and Test above looks as follows:
Similarly, we can build the right outer join of these relations: