Relational Model

INLS 256, Database I

A table may represent an entity or a relationship from an entity-relationship diagram..

The rows are also called tuples, which represent an occurrence of the relation.

The columns are the attributes of the relation.

Each attribute is associated with a set of values that can fill the attribute. This is called the domain of the attribute.

The values in a simple domain are atomic – basic elements that can’t be broken down further.

Advantage of domain as a concept. Many types of database commands only make sense if the domains of the attribute(s) being operated on is the same

A database that supports the definition of domains does 2 things:

1. Prevent nonsensical operations (unless overridden)
2. Provide error checking (E.g., a value couldn’t be assigned if it weren’t a member of the domain of the attribute.

Composite domain: Cartesian product of two or more simple domains. All members of one set combined with all members of next set.

To fully support domains, you must:

1. Support their definition.
2. Support defining attributes as having domains.
3. Define operations that can be done within a domain, e.g., want to support the comparison of members of the domain of names (for alphabetical order), don’t want to support adding them together.
4. Support definition of composite domains.
5. Support definition of operations between domains, e.g., calculations of time and money.

Relational Schemas

A relational schema is an intensional definition of a relation that lists the attributes in the relation, e.g. R(A1, A2, A3, . . ., An). This is also the heading of the table.

Each attribute in the schema is the name of a role played by a domain in the schema.

The extension of the relation R is the set of n-tuples, where each n-tuple has a value for each of the attributes in the schema

In set terminology, a relation is a subset of the Cartesian product of the domains of the attributes that define R.

r(R) Í (dom(A1) X dom(A2) X . . . X dom(An))

# Properties of Relations

1. Tuples are unordered (members of sets are unordered).
2. Attributes are unordered – the columns are unordered.
3. Values in a tuple are atomic – no composite or multivalued attributes allowed in implementation.

We talked about the problems of null values, with their ambiguous interpretations:

a.       value unknown

b.      attribute does not apply

c.       has no value

4.      No duplicate tuples.

Qualified Attribute Names:

In order to specify precisely which attribute one means, you can qualify the attribute name by prefixing the table name.

Keys:

The primary key is a unique identifier of a tuple (record

There may be more than one field or combination of fields that are unique identifiers. Any combination of attributes is a candidate key if:

1.      It is a unique identifier; and

2.      It contains the minimum number of fields necessary to make a unique key.

One of the set of candidate keys is chosen to be the primary key, the others are alternate keys.

Foreign Keys:

The idea of a foreign key is to identify references to one tuple by its primary key that is made inside another tuple.

A foreign key is an attribute or combination of attributes in relation R2 whose values must match the primary key values in R1. In referring to the R1 tuple by its primary key in R2, you must be sure it exists in R1.

Integrity Rules/Constraints:

These have to do with ensuring that the information in the DB is correct for the DB and its world.

Entity Integrity: No attribute in the primary key of a base table (relation) can accept a null value (e.g., unknown, inapplicable

Referential Integrity: If base relation R2 contains a foreign key referring to the primary key of R1, then every value of the foreign key must be equal to the primary key values of some tuple in R1, or else every value must be null.

Specifically, what type of enforcement is needed?

1.      Restrictions on duplicating primary key values on addition or update.

2.      Requirement of no null values of primary key, and perhaps of foreign key, depending on semantics of domain.

3.      What happens on update operations – insert, delete, modify.

a.       What happens on insert (the easy case).

i.      If add primary key tuple. Must enforce the entity integrity constraint, and make sure that it has a unique primary key.

ii.      If add tuple in which a foreign key is needed, must make sure it refers to an existing key, or is null (the referential constraint).

b.      What happens on delete.

i.      If you delete the primary key tuple, what should happen in the tuples in which it is a foreign key?

2.      Restrict: you can delete a primary key tuple only if it isn’t a foreign key in any other tuple.

3.      Nullify: when you delete a primary key, all those tuples in which it was a foreign key now get nulls (e.g. unknown) for the values of those attributes.

4.      Change: when you delete a primary key, all those tuples in which it was a foreign key have the values changed for those attributes to refer to some other tuple.

ii.      If you delete a record which contains a foreign key, the primary entity can still exist.

c.       What happens on update (modify)?

i.      If you update the primary key values, what happens to the foreign key values?

1.      Cascade: when primary key value changes, all its foreign key appearances also change.

2.      Restricted: only allow primary key update if it has no foreign keys.

3.      Nullify: when primary key value changes, nullify all its foreign key appearances.

ii.      If you update the tuple in which a foreign key appears, cannot change foreign key values unless you want to refer to a new foreign entity

Mapping between ER and EER diagrams and relational model. There are some basic steps that make this a fairly straightforward process.

1. For each entity, create a relation (table), with all its attributes in their simple (non-composite) form. Identify a primary key (1 or more attributes).
2. For any weak entity, create a relation, include its attributes. Also include as a foreign key the primary key of its owner entity. The primary key of this relation will be the primary key of the owner and the partial key of the weak entity.
3. 1:1 binary relationships will be modeled as foreign keys in one of the participating entity tables. So the primary key of one entity will be one of the attributes in the other entity’s table. In addition, any attributes of the relation will also be an attribute in that table. If one of the entities has total participation in the relationship, that entity should be the “home table” of the relationship.

Note: If both entities have total participation, and don’t participate in any other relationship, you could merge the entities into one table.

4.      1:N binary relationships are also modeled using a foreign key. The primary key of the entity at the 1 side of the relationship is included as  a foreign key in the table of the other entity. Also include any attributes of the relationship.

1. M:N binary relationships are modeled in a separate relation. This is because each instance in one entity is related to possibly many in the other, and vice versa. Since we can’t have repeating fields in the the relational model, this information must be extracted. The new table will contain the primary keys of each participating entity, along with any attributes of the relationship. The primary key of the table will be the combined primary keys of the participating entities.
2. Multi-valued attributes must also be pulled out to form their own table. Again, this is because we can’t have repeating fields. This table will include the attribute and the primary key (as foreign key) of the entity. The primary key will be the foreign key and the attribute.
3. N-ary relationships greater than binary are represented by a new table containing the primary keys from each of the participating entities, along with any attributes of the relationship. The primary key of the table will be the combined primary keys.
4. There are 4 choices for representing superclass/subclass (specialization) structures.
1. Create one tahble for the superclass, with its attributes and primary key. Create a table for each subclass, containing its attributes and the primary key of the superclass. The primary key of the subclass tables will be the primary key of the superclass.
2. Create a table for each subclass, containing all the attributes of the superclass and of the subclass. The primary key is the primary key of the superclass.
3. If you have disjoint subclasses, create one table containing all the attributes of the superclass and of each subclass. Include the type attribute that distinguishes the members of the subclasses. The primary key is the primary key of the superclass. Note that you may end up with lots of null values if each subclass has attributes that don’t apply to the other subclasses.
4. If you have overlapping subclasses, create one table containing all the attributes of the superclass and of each subclass. Add a Boolean attribute for each subclass that indicates whether the tuple is a member of each subclass. The primary key is the primary key of the superclass.
5. A category whose superclasses have the same primary key is modeled as a table containing the primary key and any attributes of the category. Note that these are attributes that all the superclasses share.

A category whose superclasses do not have the same primary key requires the addition of a new key for the category, called the surrogate key. The category table contains the surrogate key, and any attributes of the category. The surrogate key is also added to the tables of the superclasses as a foreign key.

## Relational Algebra

Relational algebra represents the formal underpinnings for relational theory.

Overview of the Operations:

1.      select: it chooses specified tuples from a relation (a subset of rows).

2.      project: this chooses specified attributes from a relation (a subset of attributes).

3.      cartesian product: builds a new relation from two existing relations, where each tuple in the new relation is the concatenation of a tuple from one relation with a tuple from the other, creating all possible combinations.

R1 X R2

4.      union: builds a new relation from two existing relations, where the new relation contains all the tuples appearing in either (or both) of the old relations.

R1 È R2

5.      intersect: builds a new relation from two existing relations, where the new relation contains all tuples that appear in both of the old relations.

R1 Ç R2

6.      difference: builds a new relation from two existing relations, where the new relation contains all the tuples that appear in the first of the old relations but not in the second.

R1 – R2

7.      join: builds a new relation from two existing relations, where the new relation contains all the tuples from the product of the old relations, such that the new tuples satisfy some condition.

R1 |X| R2

8.      divide: builds a new relation from two existing relations, where the one old relation is binary, and the other is unary. The new relation contains all values of one attribute of the binary relation that match all values in the unary relation.

R1 ¸ R2

Note that all these operations have the property of closure.

Details of Operations:

Select – Select a subset of the tuples in a relation. The selection condition specifies which ones will be chosen.

senrolled = capacity (COURSE)

Project – Choose a subset of the columns in a relation.

(SID, sname, bdate, major, instate?)

pstudent-name, bdate (STUDENT)

Combining Operations: You create more complex queries by doing several basic operations in a row, operating on the results; nesting operations. To do this, you may want to give the intermediate table a name.

pfname, lname, bdate sbdate>1/1/74 (STUDENT)

is the same as

YOUNG-STUDENTS <- sbdate>1/1/74 (STUDENT)

RESULT <- pfname, lname, bdate (YOUNG-STUDENTS)

Both of these give the names and birthdates of young students.

When you make a new table like this, you may also want to rename the attributes in the table, which you can also do.

RESULT (young-fname, young-lname, bdate) <- pfname, lname, bdate (YOUNG-STUDENTS)

Union Compatibility: For some set operations, the two relations must be union compatible.

Union – A UNION B. All tuples belonging to A or B.

Intersection – A INTERSECT B. All tuples belonging to both A and B

Difference – A MINUS B. All tuples in A that aren’t in B.

Cartesian Product – A TIMES B. Concatenate each tuple in A with each tuple in B.

DEFINE ALIAS A for S

means to “rename” S as A, with all the same attributes.

Join:

This is used to “connect” different tables using one or more attributes on each that are defined on the same domain.

RESULT <- A |X| x=y B

Equijoin: where the comparison is =. This is the most common join. The result will contain two identical attributes, e.g., S join T on bdate header =

(s.id#, s.name, s.bdate, t.emp#, t.name, t.bdate) where s.bdate = t.bdate

You could eliminate the duplicate columns by filtering it through a projection, and only showing one of the bdates. This is called the natural join. Its label is *.

Division: divide A by B. If A is of degree (m+n), and B is of degree (n), then the result, D, will be of degree (m).

Aggregate Functions are included for convenience – e.g., sum, min, max, avg, count.

Outer Join is used for queries where you want some information on each tuple, and some additional information if it applies. E.g., List all students names, and if they have already done their field experience, where they did it. For those tuples that the condition doesn’t apply to, that attribute will be null.

STUDENTS (ss#, sname, . . . ) S-FE (ss#, fe-loc, . . . )

RESULT <- psname, fe-loc (STUDENTS |X|SS#=SS# S-FE)

This is a left outer join.

Outer Union takes the union of relations that aren’t union compatible.