Wikinotes

Maintainer: admin

1 ER Model
- 1.1 ISA Hierarchies: Subclasses
  - 1.1.1 Constraints
- 1.2 Relationship
2 Relational Model
- 2.1 SQL
- 2.2 Translating ER to R
3 Relational Algebra
- 3.1 Types of Joins
  - 3.1.1 Division
  - 3.1.2 Rules of Relational Algebra
4 Functional Dependicies
5 SQL
- 5.1 WHERE clause
- 5.2 Multirelational Queries
6 Advanced SQL
7 SQL Integrity
- 7.1 Constraints
- 7.2 Triggers
8 XML
9 Transactions
- 9.1 ACID
- 9.2 Buffers and the Disk
  - 9.2.1 FORCE flush
  - 9.2.2 NOSTEAL flush
10 Indexing
11 Query Evaluation
12 NoSQL

Entity: real-world object distinguishable from other objects. An entity is described using a set of attributes
Entity set: a collection of similar entities; eg: all employees; each entity set has a key, a minimum set of attributes whose values uniquely identify an entity in the set; each attribute has a domain

Diagram 1

A ISA B, then every A is also a B entity. Key is only in B

Diagram 2

E/R Viewpoint: An entity has a component in each entity set to which it logically belongs. Its properties are the union of these entity sets

Overlap Constraint: can an entity be in more than one subclass?
Covering Constraint: must every entity of the superclass be in one of the subclasses?

Relationship: association among two or more entities; eg: Employee A works in department X
Relationship set: Collection of similar relationships; An n-ary relationship set R relates n entities set E1...En. Each relationship in R involves entites $e1 \in E1, \ldots , en \in En$

Diagram 3

Each entity can have relationships with multiple entities

note: a relationship is uniquely defined by the primary keys of participating entities. That is Employee A cannot work twice in the department X

Diagram 4

eg: manages. One employee manages several departments. Each department is managed by only one employee

Key Constraint: If an entity in an entity set can only have one other entity involved in the relationship, this is a key constraint; Depicted with an arrow from the one to the one-to-many relationship set

Diagram 5

Represented with key constraints in both directions

Diagram 6

Participation Constraint: Requires that every entity in the entity set participating in the relationship MUST have a relation; represented with a thick line from the constrained entity set to the relationship

Diagram 7

Can also combine key constraints with participation constraints

Diagram 8

Weak entity: can be indentified uniquely ONLY by considering the primary key of another entity; The key in this kind of set is the union of the keys of the owner entity set and the set of its own attribtues.; represented with a bold rectangle

Diagram 9

Diagram 10

Aggregation: Allows us to treat a reltionship set R as an entity set so that R can participate in other relationships

Diagram 11

ER is a semantic model. Relational model is a DATA model

Relation: Consists of a schema and an instance
Schema: specifies name of relation, plus a set of attributes, plus the domain/type of each attribute
Instance: set of all tuples; tuples are distinct
Database Schema: Collection of relation schemas

A relation can be seen as a table. Column headers = attribute names, rows = tuples/records, columns/fields = attribute values.

The number of rows = cardinality. Number of fields = degree / arity

All rows are distinct. Rows are not ordered

DDL: Data defintion language; defines the schema of a databse
DML: Data manipulation language; manipulates the data (instances of the relations); eg: Insert, update, delete; Query the relationship; aka Query Language

Structured Query Language (SQL)

Used to define relations and to write/query data

Name	Description
CHAR(n)	A character string of fixed length n
VARCHAR(n)	denotes a string of up to n charaters
INT or INTEGER	An integer
SHORTINT	Shorter version of an integer
FLOAT or REAL	floating point number
DOUBLE PRECISION	eg: Double in java
DECIMAL(n,d)	real number with fixed decimal pooint. Value of consits of n digits with a decimal point d positions from the right
DATE	In the form YYYY-MM-DD
TIME	Form is 15:00:02 or 15:00:02.5
Bit strings	???

CREATE TABLE Students
(sid INT,
 name VARCHAR(20),
 login CHAR(10),
 major VARCHAR(20)
     DEFAULT 'undefined')

Defines all attributes of the relation and the type/domain of each.

SQL is case insensitive

DROP TABLE Students

Destroys the schema AND the tuples are deleted

ALTER TABLE Students
    ADD COLUMN firstYear:integer

Adding a new row results in all existing tuples gettings a new field with the null value

Inserting a single tuple

INSERT INTO Students (sid, name, faculty)
    VALUES (53688, 'Chang', 'Eng')

Deleting Multiple Tuples

DELETE FROM Students WHERE name = 'Chang'

Updating Tuples

UPDATE Students SET faculty = 'Science' WHERE sid = 53688

An Integrity Constraint Must be true for any instance of the database. There are domain specific constraints that are already enforced (eg on INT)

DB System will enforce any constraint and reject a modification that violates it

CREATE TABLE Students
(sid INT NOT NULL,
 name VARCHAR(20),
 login CHAR(10),
 gpa REAL
     DEFAULT 0.0)

requires an attribute to always have a proper value

A set of fields is a key for a relation if:

No two distinct tuples can have the same values of all key fields
This is no true for any subset of the key

Any key that satisfies these properties is considered a candidate key. One is chosen to be the primary key

Primary key attributes may not be NULL

CREATE TABLE Students
(sid INT NOT NULL PRIMARY KEY,
 name VARCHAR2(20))

CREATE TABLE Enrolled
(sid INT NOT NULL,
 cid CHAR(20) NOT NULL,
 grade CHAR(2) NOT NULL,
 PRIMARY KEY (sid,cid),
 UNIQUE (cid,grade))

sid makes a good primary key in the first one. In Enrolled we want each student to have ONE grade per class!

Foreign Key

set of fields in one relation that is used ot "refer" to a tuple in another relation

must correspond to the primary key of the second relation

represents a "logical pointer"

CREATE TABLE Enrolled
(sid INT NOT NULL,
PRIMARY KEY(sid)
FOREIGN KEY (sid)
REFERENCES Students)

Foreign Key Constraint

foreign key value of a tuple must represent an existing tuple in the referred relation.

Tuples that refer to nonexistent foreign keys are not allowed to be inserted.

What happens when a refered tuple is deleted?: Delete all tuples that reference it; Disallow deletion as long as something refers to it; set the foreign key to a feault value
When the primary key is updated: updated all tuples that reference it; or do something similar to deletion (disallow, delete referencing tuples)

SQL default to disallowing. Can use CASCDE

eg:

CREATE TABLE Enrolled
 (sid INT NOT NULL,
  cid CHAR(8) NOT NULL,
  grade CHAR(2),
  PRIMARY KEY (sid,cid)
  FOREIGN KEY (sid)
  REFERENCES Students
  ON DELETE CASCADE
  ON UPDATE CASCADE)

Integrity Constraints are based on the semantics of the realworld. You can check that an IC holds by looking at the whole table, but you can't assume from examing one or some tuples

Diagram 11

One relation per entity set

Always translated as an individual table. Attributes for the table are the keys for each participating set (as foreign keys) and all descriptive attributes

Diagram 12

If one-to-many make the one the primary key, hence guaranteeing uniqueness. Entity set and relationship each gets a table

Include relationship set in table of the entity set with the key constraint as a field.

Include relationship set in table of the entity set with the constraints

Usually can't be done UNLESS also key constraint

Might happen if a relation involves to entities from the same set.

Diagram 13

Weak identity set and identifying relationship set are translated into a single table. Must cascade on deletion of primary foreign key

Diagram 14

distribute information among relations. Super class defines general attributes and key. Subclasses are relations with additional attributes and superclass key. subclasses must be altered when superclass altered.

Diagram 15

sub-classes have all the attributes

Pro	Con
Query asking about just subclass entities only has one table to search	Query on attributes of all superclass entities have to traverse all tables of subclasses
	Entities that are in two subclasses are stored twice

One big table with all the attribtues of all subclasses (can be null)

Diagram 16

Option 1: No key contraints: Basically just create a table for the aggregation relation that references the keys of all involved

Projects(pid,started_on,pbudget)
Departments(did,dname,budget)
Employees(eid,name,salary)
Sponsors(pid,did,since)
Monitors(pid,did,eid,until)

Option 2: Use inner relationship: Key constraint on relationship within the aggregation and the other entity set; eg: Sponsor(pid,did, eid, since, until)
Option 3: Use Outter Relationship: No Sponsors; Project(pid, started_on, pbudget, did, since); Monitors(pid,eid,until)

Basics

Input: input one or two relations; schema of each relation is known; instance can be arbritrary
Output: a relation; schama of output relation depends on operator and input relations

Relational algebra is closed

Name	Symbol	Description
Selection	$\sigma_{condition}(R_{in})$	Selects a subset of tuples from a relation
Projection	$\Pi_{attribute}(R_{in})$	Projects to a subset of attributes from a relation
Renaming	$\rho(R_{out}(B1, \ldots, Bn), R_{in}(A1, \ldots, AN))$	Rename relations or attribtues
Cross product	$\times$	Combines two relations. Each row of first input combined with each row of second input
Join	$\bowtie$	Combination of cross product and selection
Division	\	TODO
Intersection	$\cap$
Union	$\cup$
Set Different	-	Tuples that are in the first, but not the second

Operators can be composed

Set operators must be on relations with the same number of attribtues and same types

Condition/Theta Join $R_{out} = R_{in1} \bowtie_c R_{in2} = \sigma_c(R_{in1} \times R_{in2})$
Equi Join: $R_{out} = R_{in1} \bowtie_{Rin1a1 = Rin2b1, \ldots, Rin1an = Rin2bn} R_{in2}$ Condition join where condition contains ONLY equalities
Natural Join: Equijoin on all common attributes

Let A have have 2 fields, x and y, have only field y.

$A/B = \{ x | \forall y \in B \exists x,y \in A\}$

A/B containsall x typles such athf ro every y tuplein B there is an xy tuple in A

Let $R,S,T$ be relations. $C, C1, C2$ are conditions. $L, L1, L2$ are projection lists on R and S.

Communitivity: order of operands does not matter (except renaming, division and difference); $\Pi_L (\sigma_C (R)) = \sigma_C (\Pi_L (R))$; $R1 \bowtie R2 = R2 \bowtie R1$
Associativity: You can move brackets around; $R1 \bowtie (R2 \bowtie R3) = (R1 \bowtie R2) \bowtie R3$
Idempotence: $\Pi_{L2} (\Pi_{L1}(R)) = \Pi_{L2} (R)$ if $L2 \subset L1$; $\sigma_{C2} (\sigma_{c1}(R)) = \sigma_{C1 \wedge C2}(R)$

Suppose we have Hourly_Emps(sin, rating, hourly_wages, hourly_worked)

Diagram 16

Wages is related to rating

Update Anomly: Can we change the wages in only the first tuples?
Insertion Anamoly: What if we want to insert an employee and don't know the hourly wage for his/her rating?
Insertion Anamoly II: We can only introduce a new rating scheme when we enter an emplyee with this rating into the relation
Deletion Anamoly: If we delete all employees with rating 5, we lose the information about the wage for rating 5!
Storage Redundancy

A functional dependency $X \to Y$ ($X$ and $Y$ are sets of attributes; $X,Y \subset$schema($R$)) holds over relation $R$ if for every allowable instance $r$ of $R$:
* Given tuples $t1$ and $t2$ in instance $r$ of $R$, if the X values of $r1$ and $t2$ agree, then the $Y$ values must also agree
* $t1 \in r, t2 \in r, \Pi_x(t1) = \Pi_x(t2) \to \Pi_y(t1) = \Pi_y(t2)$

FDs are determined by the semantics of the application.

Key Candidates: Let schema$(R) = (a1, a2, ... an)$; a set of attribtues $X \subset$ schema$(R)$ is a candidate key for a relation $R$ if $X \to R (= X \to a1,a2, \ldots, an)$; That is some set of attributes of R (could be a single element set) implies all of the attributes of the relation

In the first example, we would move rating -> wage into it's own table and then just store rating!

($X,Y,Z$ are sets of attributes)

pname	Description
Reflexivity	If $Y \subset X$ then $X \to Y$
Augmentation	$X \to Y$ then $\forall Z, XZ \to YZ$
Transitivity	If $X \to Y, Y \to Z$ then $X \to Z$
Union	If $X \to Y, X \to Z$ then $X \to YZ$
Decomposition	If $X \to YZ$, then $X \to Y, X \to Z$

Armstrong's Axioms

Reflexivity

Augmentation

Transitivity

$F^+$

closure of F

set of all DFs that are implied by F through Armstrong's Axioms

Algorithm for finding the attribute closure of $X \to X^+$ wrt F:

Input: set of all attributes $A$ such that $X \to A$ is in $F^+$

Basis: $X^+$
Induction: If $Z\subset X^+$ and $Z \to A$ is in $F$, add $A$ to $X^$
End when $X^+$ cannot be changed

Finding all key candidates

X is akey candidate if X+ = R AND there is no subset Y of X such that Y+ = R

X has to imply everything and it has do do so minimally

If only key-induced FDs hold there is no redundancy in the database

Boyce-Codd Normal Form

A relation R with FDs F is BCNF if for all $X \to A$ in F+:
* $A \in X$ (trivial FDs)
* $X$ contains a key for $R$

R is in BNCF if the only non-trividal FDs that hold over R are key constraints

A relation R with FDs F is in 3NF (3rd Normal Form) if for all $X\ \to A$ in F+

$A \in X$ (trivial FDs)
X Contains a key for R (X is a superkey) or
A is a part of some key for R

If a relation is not in the desired normal form we decompose it.

Given a relation R with attributes A1, ..., An, a decomposition of r consists of replacing R by two or more relations such that:
each new scheme contains a subset of attributes of R and nothing more
every attribute of R appears as an attribute on one of the new relations
* The ruples of R are split such that corresponding subtuples appear in the enw relations

Disadvantages: some queries now have to look at more tables and are thus more expensive

Loseless Join decompositions: The decomposition of R into R1 and R2 is lossless-join wrt F if and only if the closure of F contains: Let {a1, ..., an} be the set of attribute sin R1 and R2: then {a1, ..., an} $\to$ R1 or {a1,..., an} $\to$ R2; In particular if $U \to V$ (non trivial) holds over R then the decomposotion of R into UV and R - V is lossless-join (rejoin does not create new tuples because U is key in UV and foreign key in R-V); Denoted as a decomposition along FD $U \to V$

Given a relation R with FDs F and $X \to Y \in FD$ violating BCNF:

Compute X+
Decompose R into X+ and $(R - X+) \cup X$
Repeat until all relations in BCNF

If $X^+ = \{X,Y\}$ we denote this as a decomposition along $X \to Y$

Dependency Preserving Decomposition: Given a relation R with FDs F. A depdendency preserving decomposition allows us to check each dependency in F by looking at one of the decomposed relations; Not all BCNF decompositions are dependency preserving

Any relation R can be decomposed into relations that are 3NF and dependency preserving

SELECT desired attribtues
FROM list of relations
WHERE qualification (optional)

Conversion to relational algebra

SQL	Algebra
SELECT	projection
WHERE	selection and join

Sample: Diagram 17

SQL statement:

SELECT rating, age
FROM Skaters
WHERE rating >= 10 OR age > 15

Relational Agebra:

$\Pi_{rating, age}(\sigma_{rating >= 10 \lor age > 15}(Skaters))$

The only difference is that SQL does not eliminate duplicates

Comparision: attribute1 op constant; atttribute1 op attribute2

op is one of <, =, >, <>, <=, >=, LIKE

Can use arithematic operations on numbers

Qualification/Condition: name = 'Cheng' AND NOT age = 18; name LIKE '%e_g' %: any string, _: any character; there are other string operators like string-length, concatentation

SQL uses multiset semantic, which allows duplicates, except on primary keys

DISTINCT``` is an optional keyword that will eliminate duplicates

eg:
```sql
SELECT DISTINCT sname FROM Skaters

Renaming Columns: SELECT sid, sname AS skater FROM Skaters WHERE raing < 9
Expressions: SELECT sname, rating+1 AS updgrade FROM Skaters
Constants: SELECT rating AS reality, '10' AS dream FROM Skaters; Will add colum dream with value 10 to output
Ordered Output: SELECT * FROM Skaters ORDER BY age, rating; will wort by age THEN rating

You can have MULTIPLE relations in the FROM clause. Relation.Attribute is used to disambiguate

Cross Product: SELECT * FROM Skaters, Participates
Join: SELECT sname FROM Skaters, Participates WHERE Skaters.sid = Participates.sid; OR SELECT sname FROM Skaters JOIN Participates ON Skaters.sid = Participates.sid

Input relations for set operators must be set-comaptible. Result relation is a set (not a multiset)

Find skaters that have participated in a regional or a local competition

SELECT P.sid
FROM Participates P, Competition C
WHERE P.cid = C.cid AND C.type = 'local'
UNION
Select P.sid
FROM Participates P, Competition C
WHERE P.cid = C.cid AND C.type = 'regional'

Multiset version: UNION ALL

Find skaters that have participated in a regional AND a local competition

SELECT P.sid
FROM Participates P, Competition C
WHERE P.cid = C.cid AND C.type = 'local'
INTERSECT
Select P.sid
FROM Participates P, Competition C
WHERE P.cid = C.cid AND C.type = 'regional'

Multiset Version: Intersect ALL

Find skaters that have participated in a local but not in a regional

SELECT P.sid
FROM Participates P, Competition C
WHERE P.cid = C.cid AND C.type = 'local'
EXCEPT
Select P.sid
FROM Participates P, Competition C
WHERE P.cid = C.cid AND C.type = 'regional'

Multiset version: EXCEPT ALL

A where clause and contain an SQL query which is called a subquery

Find names of skaters who ahve participated in competition #103

SELECT S.sname
FROM Skaters S
WHERE S.sid IN (SELECT P.sid
                FROM Participates P
                WHERE P.cid = 103)

Can invert selection using NOT IN

Basical it is WHERE (a1, a2) in (SELECT a3, a4...)

EXISTS is TRUE iff the relation is non-empty

Find names of skaters who have participated in competion 103

SELECT S.sname
FROM Skaters S
WHERE EXISTS (SELECT *
              FROM Participates P
              WHERE P.cid = 103 AND
                    P.sid = S.sid)

Can use ANY and ALL (like universal and existential quantifiers)

Syntax: WHERE attr op ANY (Select ...); WHERE attr op ALL (SELECT ...)

Find highested rated skater

SELECT *
FROM SKATERS
WHERE rating >= ALL (SELECT rating FROM Skaters)

Find skaters who have reserved all boats

SELECT sname
FROM Skaters S
WHERE NOT EXISTS ((SELECT C.cid
    FROM Competition C)
    EXCEPT
    (SELECT P.cid
    FROM Participates P
    WHERE P.sid=S.sid))
SELECT sname
FROM Skaters S
WHERE NOT EXISTS (SELECT C.cid
    FROM Competition C
    WHERE NOT EXISTS (SELECT P.cid
        FROM Participates P
        WHERE P.cid = C.cid AND
            P.sid = S.sid))

Counting: SELECT count(*) FROM Skaters
Averages: SELECT avg(age) FROM Skaters WHERE rating = 10
Distinct: SELECT count(DISTINCT sid) FROM Participates

Also, SUM, MAX, MIN

SELECT target-list
FROM relation list
WHERE qualification
GROUP BY grouping listn

A group is a set of tuples that ahve the same value for all attributes in the grouping list. One answer tuples is generated per group. Target-liust contains attributes and/or aggregation terms

eg:

SELECT rating, avg(age)
FROM Skaters
GROUP BY rating

If any aggregation is used, then each element in the attribute list of the SELECT clause must either be aggregated or appear in a group-by clause

These are just WHERE claues on GROUP BY

eg: Find average age of skaters with at least two skaters at each rating level

SELECT rating, avg(age)
FROM Skaters
GROUP BY rating
HAVING COUNT(*) >= 2

Aggregate operations cannot be nested!

A view is an umaterialized relation: we store a definition rather than tuples, and get the tuples from existing tables

CREATE VIEW ActiveSkaters (sid, sname)
    AS SELECT DISTINCT S.sid, S.sname
    FROM Skaters S, Participates P
    WHERE S.sid = P.sid

Results in issues for comparisons and arithemtic. Need 3-valued logic (true, false, unknown)

NOT unknown = unknown
A or B = true if either A = true of B = true
A or B = false if A = false and B = false
A or B = unknow if A = false and B = unknown, or vice versa, or A = B = unknown
A and B = true if A = true and B = true
A and B = false if either A or B = false
A and B = unknown if A = true and B = unknown, or vice versa, or A = B = unknown

In SQL the WHERE clause eliminates rows that are not true (ie: false or unknown)

Same as an inner join, but with dangling tuples paadded with NULL

A regular join on Skaters and Participates only includes result tuples that are in both Skaters and Participates.

An (left) outer join on Skaters (left) and Participates includes Skater tuples that weren't in Participates and pads the missing information with nulls

SELECT S.sid, Sname, P.cid
FROM Skaters S LEFT OUTER JOIN Participates P
ON S.sid =  P.sid

A single conceptual/Logical Schema defines the logical strucre. Conceptual db design

A Physical Schema describes the files and indexes used

Views describe how users see the data

Physical Data Independence: The conceptual schema protects from changes in the physical structure of data

Logical data independence: External schema proctects from changes in conceptual schema of data

INSERT INTO SKATERS VALUES (68, 'Jacky, 10, 10)

INSERT INTO Skaters (sid, name) VALUES (68, 'Jacky')

Inserting the result of a query

INSERT INTO ActiveSkaters (
    SELECT Skaters.sid Skaters.name
    FROM Skaters, Participates
    WHERE Skaters.sid = Participates.sid)

DELETE FROM Competitions WHERE cid = 103

DELETE FROM Competition

Will delete EVERYTHING

UPDATE Skaters
SET ranking = 10, age = age + 1
WHERE name = 'debby' OR name = 'lilly'

Domain Constraints (data type, UNIQUE, NOT NULL)
Primary Key Constraints
Foreign Key Constraints
Attributes and tuple based checks
SQL2 Assertions: global constraints (just check that conditions aren't violated, rejection only behaviour)
SQL3 Triggers: upon user modification, checks are performed and reactions specified

CREATE TABLE Skaters (
sid INTEGER PRIMARY KEY NOT NULL,
sname VARCHAR(20),
rating INTEGER CHECK(rating > 0 AND rating < 11),
age INTEGER)

Condition is check when the attribute is changed. Condition violations result in rejections. Can be anything that would follow a WHERE clause including subqueries

eg: sname VARCHAR CHECK (sname NOT IN (SELECT name FROM forbidden))

DB2 does not allow subqueries or refences to other attributes...

CREATE TABLE Skaters (
sid INTEGER PRIMARY KEY NOT NULL,
sname VARCHAR(20),
rating INTEGER,
age INTEGER
CHECK (rating <= OR age > 5))

Checked on update and insert

May need to add or delete constraints laters. So we can NAME THEM

CREATE TABLE Skaters (
sid INTEGER NOT NULL,
sname VARCHAR(20),
rating INTEGER CONSTRAINT rat CHECK(rating > 0 AND rating < 11),
age INTEGER
CONSTRAINT pk PRIMARY KEY (sid),
CONSTRAINT ratage CHECK
    (rating <= 4 OR age >5))

Allows us to do ALTER TABLE Skaters DROP CONSTRAINT ratage or ALTER TABLE Skaters ADD CONSTRAINT rateage CHECK (rating <= 5 OR age > 5)

predicate expressing a condition that we wish the database to always satisfy and what to do when it fails

CREAT ASSETION smallClub CHECK
    ((SELECT COUNT(*) FROM SKATERS) + (SELECT COUNT(*) FROM Competitions) < 100)

Check whenever any relations involved are modified

Attribute based checks happen on inserts and upates of that just one tuples. Tuple based checks same thing (cheap)

Assertion, insert or update or delte of tuples involved in any relation (expensive)

Trigger: Prodcued that starts automatically if specified changes occur to the DBMS

Made up of three parts

Event (activates the trigger): usually insert/update/delete
Condition (test whether the trigger should run)
Actions (what happens if the trigger runs)

CREATE TRIGGER updateSkater
    AFTER DELETE ON Skaters
REFERENCING NEW_TABLE AS DeletedSkaters
FOR EACH STATEMENT
    INSERT
        INTO StatisticsTable(ModTable, ModType, Count)
        SELECT 'Skaters', 'delete', COUNT(*)
        FROM DeletedSkaters

CREATE TRIGGER ratingIncrease
    AFTER UPDATE OF rating on Skaters
    REFERENCING
        OLD AS o
        NEW AS n
FOR EACH ROW
    WHEN (n.rating > 1 + o.rating)
        UPDATE Skaters
        SET skating = 1 + o.rating
        WHERE sid = n.sid

When?

AFTER UPDATE OF
BEFORE UPDATE OF
INSTEAD OF UPDATE

(could be UPDATE, INSERT, DELETE)

Scopes?

Row level trigger: references OLD/ NEW
Statement level: OLD_TABLE / NEW_TABLE

Indicated in the RERFENCING command

Can execute SEVERAL actions

FOR EACH ROW/STATEMENT
WHEN ...
BEGIN ATOMIC
    do thing1
    do thing2
END

Can also set SQL States. eg: SIGNAL SQLSTATE '75000' SET MESSAGE_TEXT='Rating increase must be by one'

XML is built with elements

Elements components: Name; Text (eg: foundations); Child elements (nesting); Attribtues (included in starttag)

Attribtues are made of names and values eg: <p id="first one">

Eache element is a node. Nested elements build a path. The leaves are attributes, text or just elements with neither. Element siblings are ordered

Each element can have an ID attribute allowing elements to references other elements. This produces a graph

A valid xml documents conforms to a Document Type Description (DTD) although they are optional. Has a BNF-life grammar which established constraints on element structure, attributes and content

Sample:

<!DOCTYPE people[
    <!ELEMENT people(person*)>
    <!ELEMENT person(name*, (lastname|familyname)?)>
    <!ATTLIST person PID ID #REQUIRED
        age CDATA #IMPLIED
        children IDREFS #IMPLIED
        mother IDREF #IMPLIED
    >
    <!ELEMENT name(#PCDATA)>
    <!ELEMENT lastname(#PCDATA)>
    <!ELEMENT familyname (#PCDATA)>
]>

Data types: PCDATA (parsed character data) or CDATA (unparsed)

Attributes: ID unique identifier (similar to primary key); IDREF: reference to single ID; IDREFS: space-seperated list of references
Values: can give a default value; #REQUIRED must exist; #IMPLIED optional

Specified in an XML file with <!DOCTYPE name SYSTEM "path/to/thing.dtd">

Can use regex style things too. * is 0 or more. + is 1 or more, (a | b)? is one or the other

Specificy a path expresion to match XML data by navigating down (or accross) the XML tree. returns a node-set

Examples: /bibliography/book/author all author elements by root navigating through those elements; /bibliography/book/@ISBN All ISBN attributes; //title all title elements anywhere in the document; /bibliography/*/title titles of bibliography entries assuming that there could be books, journals, reports, etc...; /bibliography/book[@year>1995] returns books where the year > 1995; /bibliography/book[author='FooBar']/@Year returns the years of books written by FooBar; /bibliography/book[count(author) <2]; /bibliography/book/author[position()=1]/name position is the location of the node in the node set

FLOWR = For/let-where-orderby/return

Diagram 18

For vs Let

For binds node variables for iteration

Let binds collection variables for one value

Diagram 19

Diagram 20

Aggregation

let $a = avg(document("bib.xml")/bib/book/price)
for $b in document("bib.xml")/bib/book
where $b/price > $a
return $b

Can use joins with where $b1/author = $b2/author

subqueries, sorting, if-then-else, existential and universally quantified

eg: ```where some $b in $a/book satisfies count($b/author) > 10$

Storage as extra attribute CREATE TABLE MyXml (id INT NOT NULL PRIMARY KEY, Info XML)

Insert with:

INSERT INTO MyXML(id, INFO) VALUES (1000,
    '<customerinfo cid="1000">
    <name>Kathy Jones</name>
    <addr country =Canada">
        <street>123 fake</street>
        <city>Ottawa</city>
        <prov-state>Ontario</prov-state>
        <pcode-zip>H0H 0H0</pcode-zip>
    </addr>
    </customerinfo>')

db2-fin:xmlcolumn

XQUERY
    for $d in db2-fn:xmlcolum('MYXML.INFO')/customerinfo
    where $d/addr[city="Toronto"]
    return $d/name

SELECT and Xquery

SELECT XMLQUERY (
'for $d in $doc/customerinfo
    return $d/name'
    passing INFO as "doc")
FROM MyXml m
WHERE XMLEXISTS(
    '$i/customerinfo/addr[city="Toronto"]' passing m.info as "i")

Imagine a money transfer Transfer(id1, id2, value)

SQL

SELECT balance into :balance
FROM accounts
WHERE accoundid = :id1

balance += value

UPDATE accounts
SET balance = :balance
WHERE accountid = :id1

SELECT balance into :balance
FROM accounts
WHERE accountid = :id2

balance -= value

UDPATE accounts
SET balance = :balance
WHERE accountid = :id2

read(A)
A := A + 50
write(A)
read(B)
B := B - 50
write(B)

A transaction is the DBMS'sabstract view of a user program. It is a sequence of read operations, r(x) and write operations w(X) on objects x (tuple, relation) of the db

Read: bring obkect into main memory from disk, send variable to application

Write: bring object into main memory from disk and modify it. Write it back to disk (might be done later)

ACID: Atomicity: A transaction is atomic if it executed in it entirety or not at all; Consistency: A transaction must preserve the consistency of the data; Isolation: In case that transaction is executed concurrently, the result must be the same as if they were executed serially; Durability: The changes made by a transaction must be permanent
Consistency: Each transaction must leave the database if the DB is consistent before the tranasction started. Hard to enforce at the db level unless using integrity constraints
Isolation: DBMS can execute transactions concurrently. Enforced by a concurrency control protocol.
Atomicity: A transaction either commits or aborts; In the case of an abort then you have to undo all the operations done; Local Recovery ensures that partial results can be done. Writes keep a pre-image
Durability: There must be a guarantee that changees introduced by a transaction will surive failures

In the event of a crash we have to do Global Recovery. Involves ensuring commited transactions are in the db, that aborted changes are undone and that tranasctions at the time of the crash are undone.

When are changes written from main memory to the disk?

Diagram 20

Force all changes of transaction are flushed to the disk BEFORE commit. Eg writes are done straight to disk

Easy recovery for commited or aborted tranasctions, but need cleverness to deal with inprogress transactions

Changes are never flushed before a commit. Once committed, changes can be flushed at leasure. Inprogress and aborted transactionsa are easyt to deal with, nothing on disk. Need to deal with commited but potentially not written transactions.

Ideally we would flush at commit time, but this isn't possible due to I/O not being atomic.

Cost model for data access. We just oncsider number of disk reads.

Types of opertions: scan over all records SELECT * FROM Skaters; Equality search SELECT * FROM Skaters WHERE sid = 100; Range Search SELECT * FROM Skaters WEHRE age > 5 and age <= 10; Insert INSERT INTO Skaters VALUES (23, 'lilly, 10, 8); Delete could be equality or range; udpate is just delete + insert

Linked, unordered list of all pages of the file

scan retriving all records = good because all pages retrieved anyway
equality search on primary key = bad have to read ~1/2 of pages
range search = bad
insert = good, low cost
delete/update same as search

Records ordered according to one or more attributes of the relation

scan retriving all records = good
equality search on sort attribute = good about log2(number of pages)
range search = good find first one, then scan along
insert = bad, have to find proper page, need to reorganize
delete/update same as search. update may be expensive
sorted output: good if on sorted attribute

Sorted files can only be ordered by one attribute. Solution is to build an index for any attribute that is frequently queried. Created on a search key (different from primary key)

CREATE INDEX name ON Skaters(sid)
DROP INDEX name

B+tree is the most widely used index. Each ndoe represents on page (the transfer unit to/from disk). Leafs contain data entries which consist of value of the search key and the record identifier (rid = page-id, slot). Data entries are POINTERS. Root and inner ndoes have auxiliary index entries. They are height balanced.

fanout = number of children for each node. Minimum 50% occupancy fornot the root.

Indirect indexes can be in the form <key, rid> but could be in form <key, list of matching rid>

Direct indexes can contain the actual tuple, but expensive.

Clustered: relatin in file sorted by search key attributes of the idnex
Unclustered: relation in heap file or sorted by attribute different to the search key attribute of the idnex

Can be clustered by at most one index

Diagram 21

Relation R(A,B,C,D,F)

A and B are int (6 byte), C-F are char[40] (1 byte per char). Tuple = 172 bytes.

200,000 tuples

Each data page has 4000 bytes and is around 80% full.

Number of pages = (number of tuples * tuple size)/(fill rate * page size). in this case 10,750 pages

B values are uniformilly distributed

Non-clustered index B-tree with <k, list of rid>

Index pages are 4K and between 50%-100% fulll

rid = 10 bytes

size of pointer in intermediate page = 8 bytes

Index entry size in root/intermediate pages = size(key) + size(pointer) 6 + 8 = 14 bytes

Average number of rids per data entry = number of uples / different values (if uniform). eg: 200,000/20,000 = 10

Average length per data entry = size of key + (number of rids * size of rid). eg: 6 + 10*10 = 106

Average number of date entrier per page = fill-rate * page size / length of entry. eg 0.75*5000/106 = 28 entries per page

Estimate number of leaf page = number of different values / number of entrier per page. eg: 20,000 / 28 = 715

Number of entries in intermediate pages = fill-rate * page size / lneght of index enty. eg: 0.5 * 4000 / 15 = 143, 1*4000/14 = 285

Height of tree = Number of leaf pages / (min | max)? number of entries in intermediate pages

parser translate into internal query representation (syntax, references existing stuff, view translation)
query optimizer turns the query into an efficient execution plan
plan executor executes the execution plan (nowai!) and delivers data

Query decomposition is variable because order can change, need to pick the best / most efficient

Access Path: method used to retrieve set of tuples from a relation

Cost models

number of pages
number of I/O (one page can be read from disk multiple times)

Query Processing operations

Selection
- Scan without an index
- scan with an index
Sorting
joining
- nested loop join
- Sort-merge join
- etc...
grouping and duplicate elimination

Number of tuples in table P = CARD(P).

No index: Search on arbitrary attribute: scan the whole relation. Cost = number of pages; Search on primary key attribute: Cost = 1/2 number of pages on average
With index on selection attribtues: Cost = number of index pages + number of data pages

Clustered index number of data pages = (number of matching tuples / tuples per page)

Unclustered number of data pages = number of tuples!

N pages, B

Pass 0: bring B pages into buffer, sort in main memory; cost N/B
Pass 1: ...

Diagram 21

Diagram 22

Simple Nested Loop join
Page oriented nested loop join
Block oriented nested loop join
index nexted loop join
Sort-Merge Join

for each tuple in the outer relation P, scan the entire inner relation S

Cost = Number of outer page + number of outer tuples*(number of inner pages)

for each page of outer, get each page of inner and write out matching pairs.

Cost = Number of outer pages + Number of Outer Pages * Number of Inner Pages

If the smaller table fits in main memory + 2 extra buffer pages: load smaller into main and compare page by page on larger

Cost = Inner Pages + Outer Pages

If no relation fits in main meory:

For each block of B-2 Pages of Outer

load block into main memory
get each page of inner and join with B-2 pages of outer. write output to remaining page

Cost = Number of Outer Pages + (Number of outer pages / (number of buffer pages -2) * number of inner pages)

If there is an index, we can exploit this

Cost = non-index number of pages + (number of non-indexed tuples * cost of finding indexed tuple)

If clustered, then can compare all tuples in found pages (cheap). If unclustered, one page per indexed tuples

Sort inner and outer on join column, then scan them to do a merge (join on that column).

Advance scan of outer until current outer-tuple >= current inner tuple, then advance scan of inner until current inner-tuple >= current P tuple, until current inner-tuple = current outer-tuple

Cost: (number of passes for sorting + 1 for merge) * (number of inner pages + number of outer pages)

Very expensive. Need to sort to eliminate duplicates. Done at very very end or when results are already sorted. Use with care.

Try and avoid intermediate complete steps and perform all operations on current page/block.

relational model is powerful, but very rigid. Expressive at the expense of performance.

NoSQL focuses on scalability via distribution, instead of scaling-up

Trade offs:
1. expressiveness vs performance
2. consistency vs performance
3. consistency vs availability in a distributed systems

Types of NoSQL by data model: key-value; document; extensible-record, column; graph

data item has a key and some value. Value can be complex, but transparent to DBMBS. No relationships between different key/value pairs

Interface: get(key); put(key, value); versioned usually so get(key, version), put(key,value, version)
Pros: scalability is easy accross many nodes; highly available; partitioned, replication, multi-versioned with weak consistency

ie: XML/JSON/ key-value entries

Interface: path expression if nexted; key searches within document colelctions; update operations on entire document

store by column first, not by record. Vertically partioned. that is, split a record by attributes, each colum stores primary key value + attribute value.

Crib Sheet

1ER Model¶

1.1ISA Hierarchies: Subclasses¶

1.1.1Constraints¶

1.2Relationship¶

1.2.1Many-to-many¶

1.2.2One-to-many¶

1.2.3One-to-one¶

1.2.4Participation Constraints¶

1.2.5Weak Entities¶

1.2.6Ternary Relationship¶

1.2.7Aggregation¶

2Relational Model¶

2.1SQL¶

2.1.1Data Types¶

2.1.2Table Creation¶

2.1.3Destorying/Altering Relations¶

2.1.4Insert/Delete/Update Tuples¶

2.1.5Integrity Constraints¶

2.1.5.1NOT NULL¶

2.1.5.2Primary Key Constraints¶

2.1.5.3Foreign Key¶

2.2Translating ER to R¶

2.2.1Entity Sets to Relations¶

2.2.2Many-to-Many relationship sets¶

2.2.3Key Contraints¶

2.2.4Key Contraints and participation contraints¶

2.2.5Partitipation Conatraints¶

2.2.6Renaming¶

2.2.7Weak Entity Sets¶

2.2.8ISA Hierarchies¶

2.2.8.1General approach¶

2.2.8.2object-oriented approach¶

2.2.8.3One big relation approach¶

2.2.9Aggregation¶

3Relational Algebra¶

3.1Types of Joins¶

3.1.1Division¶

3.1.2Rules of Relational Algebra¶

4Functional Dependicies¶

4.1Types of Problems¶

4.2Functional Dependencies¶

4.3FD Axioms¶

4.4Normalization¶

4.4.1BCNF¶

4.4.23NF¶

4.5Decomposition¶

4.5.1Decomposing into BCNF¶

4.5.2Decomposing into 3NF¶

5SQL¶

5.1WHERE clause¶

5.2Multirelational Queries¶

5.2.1Union, Intersection, Difference¶

5.2.1.1Union¶

5.2.1.2Intersection¶

5.2.1.3Diference¶

5.2.2Nested Queries (IN)¶

5.2.3Exists¶

5.2.4Quantifiers¶

5.2.5Division¶

6Advanced SQL¶

6.1Aggregation¶

6.1.1Grouping¶

6.1.1.1HAVING clause¶

6.2Views¶

6.3NULL Values¶

6.4Out Join¶

6.5Levels of Abstraction¶

6.6DB Modification¶

6.6.1Inserts¶

6.6.2Deletion¶

6.6.3Updates¶

7SQL Integrity¶

7.1Constraints¶

7.1.1Attribute Based Checks¶

7.1.2Tuples Based Checks¶

7.1.2.1Naming Constraints¶

7.1.3Assertions¶

7.1.4When and How much?¶

7.2Triggers¶