Entity Relation Data Modeling
Extended Entity
Relation Data Modeling
Moody, 1996.
The Moody article is a somewhat light-hearted view of the entire modeling process, but there are some nuggets of very good advice there. Notice that this isn’t concerned just with DB modeling, but with modeling for an entire information system.
1.
See for yourself.
2.
Generate alternatives.
3.
Test the model.
4.
Know when to stop.
5.
Remember the big picture.
6.
Following through.
E-R
modeling is a way of determining what information you need to have in your DB,
and how it should be organized.
The
ER diagram is a model of the conceptual level of the DB – everything that is in
the DB from a “neutral” perspective. You may want to show a basic version to
the user to check on your assumptions, and use a more complex one,
incorporating things we’ll talk about in the next section on extended entity
relationship modeling, for yourself.
Notation: There are several different
types of notation, which are roughly equivalent. You need to be able to work
with any of them. The book uses one type; we’ll be using a different type that
is supported by ABC Flowcharter a little better.
Quick Definitions and
Review:
Entity – A “thing” in the real
world that will be modeled in the DB.
Attribute – Each entity has
attributes. These are the “facts” that we know about the entity and want to
record.
Each
entity type must have at least one attribute or combination of attributes that
has a unique value for each instance. This is the key attribute. A key may be simple,
consisting of only one attribute (e.g. SS#), or it may be compound or composite,
consisting of more than one attribute, (e.g. date and seat# for theater
tickets).
An
entity type may have more than one unique identifier. All unique identifiers
are candidate keys – generally you
will choose one of the candidate keys to be the primary key.
Composite attributes are made of two or more
other attributes.
Values:
Each
attribute is associated with a value set
or domain, which is the set of
possible values for that attribute in the world.
A single-valued attribute is one where
each entity will have at most one value for it.
A multi-valued attribute is one where an
entity may have more than one value for it at the same time.
A derived attribute is one whose value
can be determined or calculated from one or more other attributes.
A null value is when an entity does not
have a value for an attribute. There are a couple of situations in which this
can occur.
1.
The
attribute doesn’t apply to that entity instance, e.g., middle initial for
someone who doesn’t have a middle name.
2.
The
value for the attribute is not known, although it is known to exist. E.g., the
telephone number for someone whom you know does have a telephone.
3.
Asking
for something that doesn’t exist – doesn’t really apply to the class. (This can
happen when you are tring to “squeeze” 2 different classes into one entity.)
E.g., number of hours worked per week for a student who is not a GA.
Relationships:
A
relationship is an association between entities. Like entities, relationships
also come in types and instances, where the type is the definition,
and the instance is an actual occurrence of a relationship between entity
instances.
The
relationship type defines the relationship in terms of the participating entity types, those entities between which the
relationship exists.
Degree of relationship:
Degree of a relationship is the
number of participating entity types, most common are binary and ternary.
Relationship roles:
Each
entity type in a relationship plays a particular role in the relationship. It is
often helpful to label the roles, to give specific information on what the
entity is doing.
Cardinality:
The
cardinality of a relationship
specifies how many relationship instances any entity instance can participate
in.
One-to-one, 1:1, means that each entity can
only participate once.
One-to-many, 1:M, means that one entity can
participate once, the other can participate many times.
Many-to-many, M:N, means that every entity
can participate many times.
Participation:
The
participation constraints on a relationship indicate whether each entity
instance must participate in a
relationship instance in order to exist. This comes in two types.
Total participation means that each entity must participate.
Partial participation (or optional participation)
means that only a subset of the entity instances are expected to participate;
the instance can still exist without participating.
Relationship attributes:
In
some sense, we can think of a relationship as being an intangible “thing” in
the world too; and it can have its own attributes.
Weak entity types:
These
are entity types that do not have a unique identifier of their own – they can
be uniquely identified only by taking into account their identifying owner. They have a partial key, that uniquely
identifies them within the
identifying owner, but the full key must include information from the owner.
There
are many ties between expert systems and databases – both need to have
information about the world. This more complex world requires more complex data
modeling, and the simple ER model that we start with may not be enough. Other
useful concepts:
subclass/superclass
specialization,
generalization
inheritance
A subclass is a subset of another class.
The members of a subclass are also members of the superclass. This type of relationship is also called the is-a relation, in some cases.
1.
There
may be a defining attribute, where
each instance falls into a class based on the value of that attribute.
2.
There
may be a defining predicate, which
is a constraint which all members of the class must satisfy in order to be in
the class.
Subclasses
may be overlapping or disjoint.
The
division of a class into subclasses may be total
or partial. Total means that all
members of the superclass must be in
at least one of the subclasses, e.g., all students must be either full- or
part-time.
Partial means that not all entities that belong to the superclass must belong to one of the subclasses. E.G., a student may not enjoy any hobbies.
Aggregation allows you to build
composite objects from their component objects.
A
category is a special type of classification that allows dissimilar entities to
be grouped together for the purposes of a specific relation. Entities in a
category inherit from their own classes. This is called selective inheritance.
Adding
the concepts of super/subclasses, inheritance, aggregation, and categories
gives us what the book calls the Enhanced ER model. Entities, classes,
subclasses, superclasses, aggregates, and categories can all participate in
relationships.
In
designing categories, it is important to consider the key of the category. If
the different entities have the same key, then just use that as the key of the
category. If they all have different types of keys, you need to create a key
especially to represent the category. This surrogate
key is then the key of the category relation, and appears as an attribute
in each of the member classes.
Getting started with an ER
or EER diagram.
1.
Identify
the entity types in your world – the things you want to represent.
Determine the attributes, or information you need to
record, for each entity type.
2.
Determine
the nature of the attributes – composite, multivalues, candidate keys, etc.
3.
Identify
relationships between entities.
4.
For
each relationship, determine its cardinality, participation, and attributes.
Home and Garden Tour In Class Exercise (paper handouts)
Knowledge
Representation:
Many of the concepts needed to capture the broader
world of KR have just been incorporated into the EER model. Others are still
missing, and require either a different type of model, or more radical changes
to the model.
Aggregation
can be roughly approximated by the “part-of” relation.
When
we first talked about ER models, we modeled fairly simple aspects of systems.
These reflect the basic type of information that DBs traditionally hold.
Several things are causing DBs to change in the type of information they hold,
and the type of queries that are asked of them – in fact, query may be too
simple a word.
1.
Expert
systems
2.
Non-numerical
data, text, non-text, graphics.
In
the earlier days of expert systems, the KB was hand-crafted and had to be in a
special format and contain special information for processing, search, and
inference. While the ES were still small and experimental, this was allowable,
but as the size and range of applications started to expand, this became the knowledge acquisition bottleneck.