Entity Relation Data Modeling

Extended Entity Relation Data Modeling

 

 

Moody, 1996.

The Moody article is a somewhat light-hearted view of the entire modeling process, but there are some nuggets of very good advice there. Notice that this isn’t concerned just with DB modeling, but with modeling for an entire information system.

 

1.      See for yourself.

2.      Generate alternatives.

3.      Test the model.

4.      Know when to stop.

5.      Remember the big picture.

6.      Following through.

 

E-R Modeling

 

E-R modeling is a way of determining what information you need to have in your DB, and how it should be organized.

 

The ER diagram is a model of the conceptual level of the DB – everything that is in the DB from a “neutral” perspective. You may want to show a basic version to the user to check on your assumptions, and use a more complex one, incorporating things we’ll talk about in the next section on extended entity relationship modeling, for yourself.

 

Notation: There are several different types of notation, which are roughly equivalent. You need to be able to work with any of them. The book uses one type; we’ll be using a different type that is supported by ABC Flowcharter a little better.

 

Quick Definitions and Review:

 

Entity – A “thing” in the real world that will be modeled in the DB.

Attribute – Each entity has attributes. These are the “facts” that we know about the entity and want to record.

 

Each entity type must have at least one attribute or combination of attributes that has a unique value for each instance. This is the key attribute. A key may be simple, consisting of only one attribute (e.g. SS#), or it may be compound or composite, consisting of more than one attribute, (e.g. date and seat# for theater tickets).

 

An entity type may have more than one unique identifier. All unique identifiers are candidate keys – generally you will choose one of the candidate keys to be the primary key.

 

Composite attributes are made of two or more other attributes.

 

Values:

 

Each attribute is associated with a value set or domain, which is the set of possible values for that attribute in the world.

 

A single-valued attribute is one where each entity will have at most one value for it.

A multi-valued attribute is one where an entity may have more than one value for it at the same time.

A derived attribute is one whose value can be determined or calculated from one or more other attributes.

 

A null value is when an entity does not have a value for an attribute. There are a couple of situations in which this can occur.

 

1.      The attribute doesn’t apply to that entity instance, e.g., middle initial for someone who doesn’t have a middle name.

2.      The value for the attribute is not known, although it is known to exist. E.g., the telephone number for someone whom you know does have a telephone.

3.      Asking for something that doesn’t exist – doesn’t really apply to the class. (This can happen when you are tring to “squeeze” 2 different classes into one entity.) E.g., number of hours worked per week for a student who is not a GA.

 

Relationships:

 

A relationship is an association between entities. Like entities, relationships also come in types and instances, where the type is the definition, and the instance is an actual occurrence of a relationship between entity instances.

 

The relationship type defines the relationship in terms of the participating entity types, those entities between which the relationship exists.

 

Degree of relationship:

 

Degree of a relationship is the number of participating entity types, most common are binary and ternary.

 

Relationship roles:

 

Each entity type in a relationship plays a particular role in the relationship. It is often helpful to label the roles, to give specific information on what the entity is doing.

 

Cardinality:

 

The cardinality of a relationship specifies how many relationship instances any entity instance can participate in.

One-to-one, 1:1, means that each entity can only participate once.

One-to-many, 1:M, means that one entity can participate once, the other can participate many times.

Many-to-many, M:N, means that every entity can participate many times.

 

Participation:

 

The participation constraints on a relationship indicate whether each entity instance must participate in a relationship instance in order to exist. This comes in two types.

Total participation means that each entity must participate.

Partial participation (or optional participation) means that only a subset of the entity instances are expected to participate; the instance can still exist without participating.

 

Relationship attributes:

 

In some sense, we can think of a relationship as being an intangible “thing” in the world too; and it can have its own attributes.

 

Weak entity types:

 

These are entity types that do not have a unique identifier of their own – they can be uniquely identified only by taking into account their identifying owner. They have a partial key, that uniquely identifies them within the identifying owner, but the full key must include information from the owner.

 

In Class Example (Sports)    

Extended ER Design

 

There are many ties between expert systems and databases – both need to have information about the world. This more complex world requires more complex data modeling, and the simple ER model that we start with may not be enough. Other useful concepts:

subclass/superclass

specialization, generalization

inheritance

 

A subclass is a subset of another class. The members of a subclass are also members of the superclass. This type of relationship is also called the is-a relation, in some cases.

 

How do you determine membership in a subclass?

1.      There may be a defining attribute, where each instance falls into a class based on the value of that attribute.

2.      There may be a defining predicate, which is a constraint which all members of the class must satisfy in order to be in the class.

 

Subclasses may be overlapping or disjoint.

 

The division of a class into subclasses may be total or partial. Total means that all members of the superclass must be in at least one of the subclasses, e.g., all students must be either full- or part-time.

 

Partial means that not all entities that belong to the superclass must belong to one of the subclasses. E.G., a student may not enjoy any hobbies.

 

Aggregation allows you to build composite objects from their component objects.

 

Categories

 

A category is a special type of classification that allows dissimilar entities to be grouped together for the purposes of a specific relation. Entities in a category inherit from their own classes. This is called selective inheritance.

 

Adding the concepts of super/subclasses, inheritance, aggregation, and categories gives us what the book calls the Enhanced ER model. Entities, classes, subclasses, superclasses, aggregates, and categories can all participate in relationships.

 

In designing categories, it is important to consider the key of the category. If the different entities have the same key, then just use that as the key of the category. If they all have different types of keys, you need to create a key especially to represent the category. This surrogate key is then the key of the category relation, and appears as an attribute in each of the member classes.

 

Getting started with an ER or EER diagram.

 

1.      Identify the entity types in your world – the things you want to represent.

Determine the attributes, or information you need to record, for each entity type.

2.      Determine the nature of the attributes – composite, multivalues, candidate keys, etc.

3.      Identify relationships between entities.

4.      For each relationship, determine its cardinality, participation, and attributes.

 

In Class Example (QuizShow)    

Home and Garden Tour In Class Exercise (paper handouts)

Knowledge Representation:

 

Many of the concepts needed to capture the broader world of KR have just been incorporated into the EER model. Others are still missing, and require either a different type of model, or more radical changes to the model.

 

  1. Instantiation.
  2. Default values are values you may assume an instance has for an attribute in the absence of other information; you make this assumption on the basis of its class membership.
  3. Meta-knowledge. This is knowledge about knowledge. It is useful for reasoning, and other types of queries. It includes the type of information the book calls class properties, e.g., name, #members, etc.
  4. Identification really has two sides to it: 1. distinguishing between different instances and 2. associating the same instance with different “descriptions.” The former problem is solved in the relational model by requiring a key for each relation.

Aggregation can be roughly approximated by the “part-of” relation.

 

Why do we need more complex modeling tools?

 

When we first talked about ER models, we modeled fairly simple aspects of systems. These reflect the basic type of information that DBs traditionally hold. Several things are causing DBs to change in the type of information they hold, and the type of queries that are asked of them – in fact, query may be too simple a word.

 

1.      Expert systems

2.      Non-numerical data, text, non-text, graphics.

 

In the earlier days of expert systems, the KB was hand-crafted and had to be in a special format and contain special information for processing, search, and inference. While the ES were still small and experimental, this was allowable, but as the size and range of applications started to expand, this became the knowledge acquisition bottleneck.