INLS 256, Database I
A table may represent an entity or a relationship from an entity-relationship diagram..
The header represents the metadata about what is represented by the table.
The rows are also called tuples, which represent an occurrence of the relation.
The columns are the attributes of the relation.
Each attribute is associated with a set of values that can fill the attribute. This is called the domain of the attribute.
The values in a simple domain are atomic – basic elements that can’t be broken down further.
Advantage of domain as a concept. Many types of database commands only make sense if the domains of the attribute(s) being operated on is the same
A database that supports the definition of domains does 2 things:
Composite domain: Cartesian product of two or more simple domains. All members of one set combined with all members of next set.
To fully support domains, you must:
A relational schema is an intensional definition of a relation that lists the attributes in the relation, e.g. R(A1, A2, A3, . . ., An). This is also the heading of the table.
Each attribute in the schema is the name of a role played by a domain in the schema.
The extension of the relation R is the set of n-tuples, where each n-tuple has a value for each of the attributes in the schema
In set terminology, a relation is a subset of the Cartesian product of the domains of the attributes that define R.
r(R) Í (dom(A1) X dom(A2) X . . . X dom(An))
We talked about the problems of null values, with their ambiguous interpretations:
a. value unknown
b. attribute does not apply
c. has no value
4. No duplicate tuples.
Qualified Attribute Names:
In order to specify precisely which attribute one means, you can qualify the attribute name by prefixing the table name.
The primary key is a unique identifier of a tuple (record
There may be more than one field or combination of fields that are unique identifiers. Any combination of attributes is a candidate key if:
1. It is a unique identifier; and
2. It contains the minimum number of fields necessary to make a unique key.
One of the set of candidate keys is chosen to be the primary key, the others are alternate keys.
The idea of a foreign key is to identify references to one tuple by its primary key that is made inside another tuple.
A foreign key is an attribute or combination of attributes in relation R2 whose values must match the primary key values in R1. In referring to the R1 tuple by its primary key in R2, you must be sure it exists in R1.
These have to do with ensuring that the information in the DB is correct for the DB and its world.
Entity Integrity: No attribute in the primary key of a base table (relation) can accept a null value (e.g., unknown, inapplicable
Referential Integrity: If base relation R2 contains a foreign key referring to the primary key of R1, then every value of the foreign key must be equal to the primary key values of some tuple in R1, or else every value must be null.
Specifically, what type of enforcement is needed?
1. Restrictions on duplicating primary key values on addition or update.
2. Requirement of no null values of primary key, and perhaps of foreign key, depending on semantics of domain.
3. What happens on update operations – insert, delete, modify.
a. What happens on insert (the easy case).
i. If add primary key tuple. Must enforce the entity integrity constraint, and make sure that it has a unique primary key.
ii. If add tuple in which a foreign key is needed, must make sure it refers to an existing key, or is null (the referential constraint).
b. What happens on delete.
i. If you delete the primary key tuple, what should happen in the tuples in which it is a foreign key?
1. Cascade: delete those tuples.
2. Restrict: you can delete a primary key tuple only if it isn’t a foreign key in any other tuple.
3. Nullify: when you delete a primary key, all those tuples in which it was a foreign key now get nulls (e.g. unknown) for the values of those attributes.
4. Change: when you delete a primary key, all those tuples in which it was a foreign key have the values changed for those attributes to refer to some other tuple.
ii. If you delete a record which contains a foreign key, the primary entity can still exist.
c. What happens on update (modify)?
i. If you update the primary key values, what happens to the foreign key values?
1. Cascade: when primary key value changes, all its foreign key appearances also change.
2. Restricted: only allow primary key update if it has no foreign keys.
3. Nullify: when primary key value changes, nullify all its foreign key appearances.
ii. If you update the tuple in which a foreign key appears, cannot change foreign key values unless you want to refer to a new foreign entity
Mapping between ER and EER diagrams and relational model. There are some basic steps that make this a fairly straightforward process.
Note: If both entities have total participation, and don’t participate in any other relationship, you could merge the entities into one table.
4. 1:N binary relationships are also modeled using a foreign key. The primary key of the entity at the 1 side of the relationship is included as a foreign key in the table of the other entity. Also include any attributes of the relationship.
A category whose superclasses do not have the same primary key requires the addition of a new key for the category, called the surrogate key. The category table contains the surrogate key, and any attributes of the category. The surrogate key is also added to the tables of the superclasses as a foreign key.
Relational algebra represents the formal underpinnings for relational theory.
Overview of the Operations:
1. select: it chooses specified tuples from a relation (a subset of rows).
2. project: this chooses specified attributes from a relation (a subset of attributes).
3. cartesian product: builds a new relation from two existing relations, where each tuple in the new relation is the concatenation of a tuple from one relation with a tuple from the other, creating all possible combinations.
R1 X R2
4. union: builds a new relation from two existing relations, where the new relation contains all the tuples appearing in either (or both) of the old relations.
R1 È R2
5. intersect: builds a new relation from two existing relations, where the new relation contains all tuples that appear in both of the old relations.
R1 Ç R2
6. difference: builds a new relation from two existing relations, where the new relation contains all the tuples that appear in the first of the old relations but not in the second.
R1 – R2
7. join: builds a new relation from two existing relations, where the new relation contains all the tuples from the product of the old relations, such that the new tuples satisfy some condition.
R1 |X| R2
8. divide: builds a new relation from two existing relations, where the one old relation is binary, and the other is unary. The new relation contains all values of one attribute of the binary relation that match all values in the unary relation.
R1 ¸ R2
Note that all these operations have the property of closure.
Details of Operations:
Select – Select a subset of the tuples in a relation. The selection condition specifies which ones will be chosen.
senrolled = capacity (COURSE)
Project – Choose a subset of the columns in a relation.
(SID, sname, bdate, major, instate?)
pstudent-name, bdate (STUDENT)
Combining Operations: You create more complex queries by doing several basic operations in a row, operating on the results; nesting operations. To do this, you may want to give the intermediate table a name.
pfname, lname, bdate sbdate>1/1/74 (STUDENT)
is the same as
YOUNG-STUDENTS <- sbdate>1/1/74 (STUDENT)
RESULT <- pfname, lname, bdate (YOUNG-STUDENTS)
Both of these give the names and birthdates of young students.
When you make a new table like this, you may also want to rename the attributes in the table, which you can also do.
RESULT (young-fname, young-lname, bdate) <- pfname, lname, bdate (YOUNG-STUDENTS)
Union Compatibility: For some set operations, the two relations must be union compatible.
Union – A UNION B. All tuples belonging to A or B.
Intersection – A INTERSECT B. All tuples belonging to both A and B
Difference – A MINUS B. All tuples in A that aren’t in B.
Cartesian Product – A TIMES B. Concatenate each tuple in A with each tuple in B.
DEFINE ALIAS A for S
means to “rename” S as A, with all the same attributes.
This is used to “connect” different tables using one or more attributes on each that are defined on the same domain.
RESULT <- A |X| x=y B
Equijoin: where the comparison is =. This is the most common join. The result will contain two identical attributes, e.g., S join T on bdate header =
(s.id#, s.name, s.bdate, t.emp#, t.name, t.bdate) where s.bdate = t.bdate
You could eliminate the duplicate columns by filtering it through a projection, and only showing one of the bdates. This is called the natural join. Its label is *.
Division: divide A by B. If A is of degree (m+n), and B is of degree (n), then the result, D, will be of degree (m).
Aggregate Functions are included for convenience – e.g., sum, min, max, avg, count.
Outer Join is used for queries where you want some information on each tuple, and some additional information if it applies. E.g., List all students names, and if they have already done their field experience, where they did it. For those tuples that the condition doesn’t apply to, that attribute will be null.
STUDENTS (ss#, sname, . . . ) S-FE (ss#, fe-loc, . . . )
RESULT <- psname, fe-loc (STUDENTS |X|SS#=SS# S-FE)
This is a left outer join.
Outer Union takes the union of relations that aren’t union compatible.