How Do You Know When a Database if Totally Normalized
| This article needs attention from an practiced in Databases. (March 2018) |
Database normalization is the process of structuring a database, usually a relational database, in accord with a series of then-called normal forms in order to reduce data back-up and meliorate data integrity. It was first proposed by Edgar F. Codd as part of his relational model.
Normalization entails organizing the columns (attributes) and tables (relations) of a database to ensure that their dependencies are properly enforced by database integrity constraints. It is accomplished by applying some formal rules either by a process of synthesis (creating a new database pattern) or decomposition (improving an existing database blueprint).
Objectives [edit]
A basic objective of the first normal form defined by Codd in 1970 was to permit information to exist queried and manipulated using a "universal information sub-language" grounded in start-gild logic.[1] (SQL is an instance of such a data sub-language, albeit ane that Codd regarded as seriously flawed.[2])
The objectives of normalisation beyond 1NF (first normal form) were stated every bit follows by Codd:
- To free the drove of relations from undesirable insertion, update and deletion dependencies.
- To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increase the life span of awarding programs.
- To make the relational model more than informative to users.
- To make the collection of relations neutral to the query statistics, where these statistics are liable to alter as time goes by.
—E.F. Codd, "Farther Normalisation of the Information Base of operations Relational Model"[3]
When an attempt is made to change (update, insert into, or delete from) a relation, the post-obit undesirable side-effects may arise in relations that have not been sufficiently normalized:
- Update anomaly. The same information can be expressed on multiple rows; therefore updates to the relation may effect in logical inconsistencies. For example, each record in an "Employees' Skills" relation might contain an Employee ID, Employee Accost, and Skill; thus a change of address for a particular employee may need to exist applied to multiple records (one for each skill). If the update is merely partially successful – the employee'south address is updated on some records but non others – then the relation is left in an inconsistent state. Specifically, the relation provides conflicting answers to the question of what this item employee's address is.
- Insertion anomaly. There are circumstances in which certain facts cannot be recorded at all. For example, each tape in a "Kinesthesia and Their Courses" relation might incorporate a Faculty ID, Kinesthesia Proper name, Faculty Rent Engagement, and Course Code. Therefore, the details of any faculty member who teaches at least one form tin be recorded, simply a newly hired faculty member who has not yet been assigned to teach any courses cannot exist recorded, except by setting the Form Lawmaking to null.
- Deletion anomaly. Nether sure circumstances, deletion of data representing certain facts necessitates deletion of data representing completely dissimilar facts. The "Faculty and Their Courses" relation described in the previous case suffers from this type of anomaly, for if a faculty member temporarily ceases to exist assigned to any courses, the final of the records on which that faculty member appears must be deleted, effectively also deleting the faculty member, unless the Course Code field is set to null.
Minimize redesign when extending the database construction [edit]
A fully normalized database allows its construction to be extended to suit new types of data without changing existing structure too much. As a issue, applications interacting with the database are minimally affected.
Normalized relations, and the relationship betwixt 1 normalized relation and another, mirror real-world concepts and their interrelationships.
Normal forms [edit]
Codd introduced the concept of normalization and what is now known as the offset normal form (1NF) in 1970.[4] Codd went on to define the second normal form (2NF) and tertiary normal form (3NF) in 1971,[5] and Codd and Raymond F. Boyce defined the Boyce–Codd normal form (BCNF) in 1974.[half dozen]
Informally, a relational database relation is oftentimes described as "normalized" if it meets third normal class.[7] Most 3NF relations are complimentary of insertion, updation, and deletion anomalies.
The normal forms (from least normalized to almost normalized) are:
- UNF: Unnormalized form
- 1NF: First normal form
- 2NF: Second normal grade
- 3NF: 3rd normal form
- EKNF: Elementary key normal form
- BCNF: Boyce–Codd normal form
- 4NF: Fourth normal form
- ETNF: Essential tuple normal form
- 5NF: Fifth normal grade
- DKNF: Domain-fundamental normal form
- 6NF: 6th normal grade
UNF (1970) | 1NF (1970) | 2NF (1971) | 3NF (1971) | EKNF (1982) | BCNF (1974) | 4NF (1977) | ETNF (2012) | 5NF (1979) | DKNF (1981) | 6NF (2003) | |
---|---|---|---|---|---|---|---|---|---|---|---|
Primary key (no indistinguishable tuples)[4] | |||||||||||
Atomic columns (cells cannot accept tables as values)[five] | |||||||||||
Every not-trivial functional dependency either does not begin with a proper subset of a candidate key or ends with a prime aspect (no partial functional dependencies of non-prime attributes on candidate keys)[5] | |||||||||||
Every non-niggling functional dependency either begins with a superkey or ends with a prime attribute (no transitive functional dependencies of not-prime attributes on candidate keys)[v] | |||||||||||
Every non-trivial functional dependency either begins with a superkey or ends with an elementary prime attribute | Northward/A | ||||||||||
Every non-petty functional dependency begins with a superkey | N/A | ||||||||||
Every not-trivial multivalued dependency begins with a superkey | N/A | ||||||||||
Every join dependency has a superkey component[8] | N/A | ||||||||||
Every join dependency has simply superkey components | N/A | ||||||||||
Every constraint is a consequence of domain constraints and cardinal constraints | |||||||||||
Every bring together dependency is trivial |
Instance of a step by step normalization [edit]
Normalization is a database design technique, which is used to blueprint a relational database tabular array upward to higher normal class.[nine] The process is progressive, and a college level of database normalization cannot be achieved unless the previous levels have been satisfied.[10]
That means that, having information in unnormalized class (the least normalized) and aiming to achieve the highest level of normalization, the kickoff step would be to ensure compliance to first normal form, the 2nd step would be to ensure second normal course is satisfied, and so along in lodge mentioned above, until the data conform to sixth normal form.
Nonetheless, it is worth noting that normal forms across 4NF are mainly of academic interest, every bit the problems they be to solve rarely appear in practice.[eleven]
The data in the following example were intentionally designed to contradict nearly of the normal forms. In existent life, information technology is quite possible to be able to skip some of the normalization steps considering the table doesn't comprise anything contradicting the given normal form. It besides commonly occurs that fixing a violation of one normal course likewise fixes a violation of a higher normal form in the process. Likewise one table has been chosen for normalization at each step, meaning that at the finish of this example procedure, there might still be some tables not satisfying the highest normal course.
Initial data [edit]
Let a database table be with the following structure:[ten]
Championship | Author | Author Nationality | Format | Price | Subject | Pages | Thickness | Publisher | Publisher Land | Publication Blazon | Genre ID | Genre Name | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Commencement MySQL Database Design and Optimization | Chad Russell | American | Hardcover | 49.99 |
| 520 | Thick | Apress | USA | E-book | 1 | Tutorial |
For this instance, it is assumed that each book has only 1 author.
Equally a prerequisite to arrange to the relational model, a table must have a main fundamental, which uniquely identifies a row. Two books could have the same title, but an ISBN number uniquely identifies a volume, so it can be used as the chief cardinal:
ISBN# | Title | Author | Author Nationality | Format | Price | Subject | Pages | Thickness | Publisher | Publisher Country | Publication Type | Genre ID | Genre Proper noun | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1590593324 | Beginning MySQL Database Design and Optimization | Republic of chad Russell | American | Hardcover | 49.99 |
| 520 | Thick | Apress | USA | E-volume | one | Tutorial |
Satisfying 1NF [edit]
To satisfy First normal form, each column of a table must have a single value. Columns which contain sets of values or nested records are not allowed.
In the initial tabular array, Subject contains a set of subject field values, meaning it does non comply.
To solve the problem, the subjects are extracted into a separate Field of study tabular array:[10]
ISBN# | Title | Format | Author | Author Nationality | Cost | Pages | Thickness | Publisher | Publisher country | Genre ID | Genre Proper noun |
---|---|---|---|---|---|---|---|---|---|---|---|
1590593324 | Beginning MySQL Database Blueprint and Optimization | Hardcover | Chad Russell | American | 49.99 | 520 | Thick | Apress | United states of america | 1 | Tutorial |
ISBN# | Subject name |
---|---|
1590593324 | MySQL |
1590593324 | Database |
1590593324 | Design |
A foreign key column is added to the Subject-table, which refers to the principal key of the row from which the bailiwick was extracted. The aforementioned information is therefore represented merely without the use of non-simple domains.
Instead of one table in unnormalized form, there are now two tables befitting to the 1NF.
Satisfying 2NF [edit]
The Book table has ane candidate key (which is therefore the main central), the composite primal {Title, Format}.[12] Consider the following tabular array fragment:
Championship | Format | Writer | Author Nationality | Price | Pages | Thickness | Genre ID | Genre Proper name | Publisher ID |
---|---|---|---|---|---|---|---|---|---|
Beginning MySQL Database Design and Optimization | Hardcover | Republic of chad Russell | American | 49.99 | 520 | Thick | 1 | Tutorial | 1 |
Offset MySQL Database Blueprint and Optimization | E-book | Chad Russell | American | 22.34 | 520 | Thick | ane | Tutorial | 1 |
The Relational Model for Database Management: Version 2 | East-volume | Eastward.F.Codd | British | thirteen.88 | 538 | Thick | 2 | Popular science | 2 |
The Relational Model for Database Management: Version two | Paperback | E.F.Codd | British | 39.99 | 538 | Thick | 2 | Popular scientific discipline | 2 |
All of the attributes that are not part of the candidate key depend on Championship, but just Price as well depends on Format. To arrange to 2NF and remove duplicities, every non candidate-key attribute must depend on the whole candidate central, not just office of information technology.
To normalize this table, make {Title} a (simple) candidate key (the chief central) and then that every not candidate-key attribute depends on the whole candidate key, and remove Price into a separate table so that its dependency on Format can be preserved:
|
|
At present, the Book tabular array conforms to 2NF.
Satisfying 3NF [edit]
The Book tabular array even so has a transitive functional dependency ({Author Nationality} is dependent on {Author}, which is dependent on {Title}). A like violation exists for genre ({Genre Name} is dependent on {Genre ID}, which is dependent on {Championship}). Hence, the Book table is not in 3NF. To make information technology in 3NF, let'south use the following tabular array structure, thereby eliminating the transitive functional dependencies by placing {Author Nationality} and {Genre Proper name} in their own corresponding tables:
Title | Author | Pages | Thickness | Genre ID | Publisher ID |
---|---|---|---|---|---|
Beginning MySQL Database Blueprint and Optimization | Chad Russell | 520 | Thick | 1 | ane |
The Relational Model for Database Management: Version 2 | Due east.F.Codd | 538 | Thick | 2 | 2 |
Title | Format | Price |
---|---|---|
Beginning MySQL Database Design and Optimization | Hardcover | 49.99 |
Get-go MySQL Database Pattern and Optimization | E-book | 22.34 |
The Relational Model for Database Management: Version 2 | East-book | thirteen.88 |
The Relational Model for Database Management: Version ii | Paperback | 39.99 |
Writer | Author Nationality |
---|---|
Chad Russell | American |
Eastward.F.Codd | British |
Genre ID | Genre Name |
---|---|
1 | Tutorial |
ii | Popular science |
Satisfying EKNF [edit]
The elementary key normal form (EKNF) falls strictly between 3NF and BCNF and is not much discussed in the literature. It is intended "to capture the salient qualities of both 3NF and BCNF" while avoiding the issues of both (namely, that 3NF is "likewise forgiving" and BCNF is "prone to computational complexity"). Since it is rarely mentioned in literature, information technology is not included in this case.[13]
Satisfying 4NF [edit]
Assume the database is owned by a book retailer franchise that has several franchisees that own shops in different locations. And therefore the retailer decided to add a tabular array that contains information about availability of the books at different locations:
Franchisee ID | Title | Location |
---|---|---|
one | Beginning MySQL Database Design and Optimization | California |
1 | Start MySQL Database Design and Optimization | Florida |
1 | Beginning MySQL Database Design and Optimization | Texas |
one | The Relational Model for Database Direction: Version 2 | California |
1 | The Relational Model for Database Management: Version 2 | Florida |
1 | The Relational Model for Database Management: Version two | Texas |
2 | Beginning MySQL Database Design and Optimization | California |
2 | Outset MySQL Database Design and Optimization | Florida |
ii | Beginning MySQL Database Design and Optimization | Texas |
two | The Relational Model for Database Direction: Version 2 | California |
two | The Relational Model for Database Direction: Version 2 | Florida |
ii | The Relational Model for Database Management: Version 2 | Texas |
3 | Beginning MySQL Database Design and Optimization | Texas |
Every bit this table construction consists of a compound principal key, it doesn't contain whatsoever non-key attributes and it'due south already in BCNF (and therefore likewise satisfies all the previous normal forms). Nevertheless, assuming that all available books are offered in each expanse, the Championship is not unambiguously jump to a certain Location and therefore the table doesn't satisfy 4NF.
That means that, to satisfy the fourth normal form, this table needs to be decomposed as well:
|
|
At present, every tape is unambiguously identified by a superkey, therefore 4NF is satisfied.[xiv]
Satisfying ETNF [edit]
Suppose the franchisees can also guild books from unlike suppliers. Allow the relation also be subject field to the following constraint:
- If a certain supplier supplies a certain title
- and the title is supplied to the franchisee
- and the franchisee is being supplied past the supplier,
- then the supplier supplies the title to the franchisee.[15]
Supplier ID | Title | Franchisee ID |
---|---|---|
1 | Kickoff MySQL Database Design and Optimization | ane |
2 | The Relational Model for Database Direction: Version 2 | 2 |
3 | Learning SQL | three |
This table is in 4NF, but the Supplier ID is equal to the join of its projections: {{Supplier ID, Book}, {Book, Franchisee ID}, {Franchisee ID, Supplier ID}}. No component of that join dependency is a superkey (the sole superkey being the entire heading), and so the table does non satisfy the ETNF and tin be farther decomposed:[15]
|
|
|
The decomposition produces ETNF compliance.
Satisfying 5NF [edit]
To spot a table non satisfying the 5NF, it is usually necessary to examine the data thoroughly. Suppose the table from 4NF example with a little modification in data and let's examine if information technology satisfies 5NF:
Franchisee ID | Championship | Location |
---|---|---|
ane | Beginning MySQL Database Pattern and Optimization | California |
one | Learning SQL | California |
1 | The Relational Model for Database Management: Version two | Texas |
2 | The Relational Model for Database Management: Version two | California |
Decomposing this tabular array lowers redundancies, resulting in the post-obit two tables:
|
|
The query joining these tables would render the following data:
Franchisee ID | Title | Location |
---|---|---|
1 | Offset MySQL Database Design and Optimization | California |
1 | Learning SQL | California |
one | The Relational Model for Database Management: Version 2 | California |
1 | The Relational Model for Database Management: Version two | Texas |
i | Learning SQL | Texas |
1 | First MySQL Database Design and Optimization | Texas |
2 | The Relational Model for Database Direction: Version 2 | California |
The Join returns three more rows than it should; adding another table to clarify the relation results in three separate tables:
|
|
|
What will the Join return now? Information technology actually is not possible to bring together these iii tables. That means it wasn't possible to decompose the Franchisee - Book Location without data loss, therefore the tabular array already satisfies 5NF.[xiv]
C.J. Appointment has argued that only a database in 5NF is truly "normalized".[xvi]
Satisfying DKNF [edit]
Let'due south take a look at the Book table from previous examples and meet if it satisfies the Domain-key normal form:
Championship | Pages | Thickness | Genre ID | Publisher ID |
---|---|---|---|---|
Commencement MySQL Database Pattern and Optimization | 520 | Thick | 1 | 1 |
The Relational Model for Database Direction: Version 2 | 538 | Thick | 2 | 2 |
Learning SQL | 338 | Slim | i | three |
SQL Cookbook | 636 | Thick | 1 | 3 |
Logically, Thickness is determined by number of pages. That ways it depends on Pages which is non a key. Permit's fix an example convention saying a book upward to 350 pages is considered "slim" and a book over 350 pages is considered "thick".
This convention is technically a constraint but it is neither a domain constraint nor a key constraint; therefore we cannot rely on domain constraints and primal constraints to keep the data integrity.
In other words — nothing prevents us from putting, for example, "Thick" for a book with only 50 pages — and this makes the tabular array violate DKNF.
To solve this, a table holding enumeration that defines the Thickness is created, and that column is removed from the original table:
|
|
That way, the domain integrity violation has been eliminated, and the table is in DKNF.
Satisfying 6NF [edit]
A simple and intuitive definition of the 6th normal form is that "a table is in 6NF when the row contains the Primary Key, and at about ane other aspect" . [17]
That means, for example, the Publisher tabular array designed while creating the 1NF
Publisher_ID | Proper noun | State |
---|---|---|
ane | Apress | United states of america |
needs to be further decomposed into two tables:
|
|
The obvious drawback of 6NF is the proliferation of tables required to represent the information on a single entity. If a table in 5NF has ane primary central column and N attributes, representing the same information in 6NF will require Northward tables; multi-field updates to a single conceptual tape will require updates to multiple tables; and inserts and deletes will similarly require operations across multiple tables. For this reason, in databases intended to serve Online Transaction Processing needs, 6NF should not be used.
However, in data warehouses, which exercise not permit interactive updates and which are specialized for fast query on large information volumes, certain DBMSs utilise an internal 6NF representation — known equally a columnar data store. In situations where the number of unique values of a column is far less than the number of rows in the table, column-oriented storage permit pregnant savings in space through data compression. Columnar storage also allows fast execution of range queries (east.1000., prove all records where a particular column is betwixt X and Y, or less than Ten.)
In all these cases, however, the database designer does non have to perform 6NF normalization manually by creating carve up tables. Some DBMSs that are specialized for warehousing, such as Sybase IQ, use columnar storage by default, but the designer however sees just a single multi-cavalcade table. Other DBMSs, such as Microsoft SQL Server 2012 and subsequently, let you specify a "columnstore alphabetize" for a detail tabular array.[18]
Come across also [edit]
- Denormalization
- Database refactoring
- Lossless join decomposition
Notes and references [edit]
- ^ "The adoption of a relational model of data ... permits the development of a universal data sub-language based on an applied predicate calculus. A first-order predicate calculus suffices if the collection of relations is in start normal class. Such a language would provide a yardstick of linguistic ability for all other proposed data languages, and would itself exist a strong candidate for embedding (with appropriate syntactic modification) in a multifariousness of host languages (programming, command- or problem-oriented)." Codd, "A Relational Model of Data for Large Shared Data Banks" Archived June 12, 2007, at the Wayback Motorcar, p. 381
- ^ Codd, East.F. Chapter 23, "Serious Flaws in SQL", in The Relational Model for Database Direction: Version 2. Addison-Wesley (1990), pp. 371–389
- ^ Codd, E.F. "Farther Normalisation of the Data Base of operations Relational Model", p. 34
- ^ a b Codd, East. F. (June 1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM. 13 (6): 377–387. doi:10.1145/362384.362685. S2CID 207549016. Archived from the original on June 12, 2007. Retrieved August 25, 2005.
- ^ a b c d Codd, Due east. F. "Farther Normalization of the Data Base Relational Model". (Presented at Courant Computer Science Symposia Serial 6, "Data Base Systems", New York Metropolis, May 24–25, 1971.) IBM Research Report RJ909 (Baronial 31, 1971). Republished in Randall J. Rustin (ed.), Data Base Systems: Courant Computer Science Symposia Serial 6. Prentice-Hall, 1972.
- ^ Codd, E. F. "Recent Investigations into Relational Data Base Systems". IBM Research Study RJ1385 (April 23, 1974). Republished in Proc. 1974 Congress (Stockholm, Sweden, 1974), N.Y.: North-Holland (1974).
- ^ Engagement, C. J. (1999). An Introduction to Database Systems. Addison-Wesley. p. 290.
- ^ Darwen, Hugh; Date, C. J.; Fagin, Ronald (2012). "A Normal Form for Preventing Redundant Tuples in Relational Databases" (PDF). Proceedings of the 15th International Conference on Database Theory. EDBT/ICDT 2012 Articulation Conference. ACM International Conference Proceeding Series. Association for Computing Machinery. p. 114. doi:10.1145/2274576.2274589. ISBN978-1-4503-0791-8. OCLC 802369023. Retrieved May 22, 2018.
- ^ Kumar, Kunal; Azad, S. K. (October 2017). Database normalization pattern design. 2017 quaternary IEEE Uttar Pradesh Section International Briefing on Electrical, Computer and Electronics (UPCON). IEEE. doi:ten.1109/upcon.2017.8251067. ISBN9781538630044. S2CID 24491594.
- ^ a b c "Database normalization in MySQL: Four quick and piece of cake steps". ComputerWeekly.com. Archived from the original on Baronial 30, 2017. Retrieved March 23, 2021.
- ^ "Database Normalization: 5th Normal Form and Across". MariaDB KnowledgeBase . Retrieved Jan 23, 2019.
- ^ The table fragment itself has several candidate keys (simple key {Price}, and chemical compound keys of Format together with any column except Price or Thickness), but we assume that in the complete table just {Championship, Format} will be unique.
- ^ "Additional Normal Forms - Database Design and Relational Theory - folio 151". what-when-how.com . Retrieved January 22, 2019.
- ^ a b "Normalizace databáze", Wikipedie (in Czech), November 7, 2018, retrieved January 22, 2019
- ^ a b Date, C. J. (December 21, 2015). The New Relational Database Dictionary: Terms, Concepts, and Examples. "O'Reilly Media, Inc.". p. 138. ISBN9781491951699.
- ^ Date, C. J. (Dec 21, 2015). The New Relational Database Dictionary: Terms, Concepts, and Examples. "O'Reilly Media, Inc.". p. 163. ISBN9781491951699.
- ^ "normalization - Would like to Understand 6NF with an Case". Stack Overflow . Retrieved January 23, 2019.
- ^ Microsoft Corporation. Columnstore Indexes: Overview. https://docs.microsoft.com/en-united states of america/sql/relational-databases/indexes/columnstore-indexes-overview . Accessed Mar 23, 2020.
Further reading [edit]
- Date, C. J. (1999), An Introduction to Database Systems (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-four.
- Kent, West. (1983) A Uncomplicated Guide to Five Normal Forms in Relational Database Theory, Communications of the ACM, vol. 26, pp. 120–125
- H.-J. Schek, P. Pistor Information Structures for an Integrated Data Base Management and Information Retrieval System
External links [edit]
- Kent, William (February 1983). "A Simple Guide to Five Normal Forms in Relational Database Theory". Communications of the ACM. 26 (ii): 120–125. doi:10.1145/358024.358054. S2CID 9195704.
- Database Normalization Basics by Mike Chapple (About.com)
- Database Normalization Intro, Part ii
- An Introduction to Database Normalization by Mike Hillyer.
- A tutorial on the offset iii normal forms past Fred Coulson
- Description of the database normalization basics by Microsoft
- Normalization in DBMS past Chaitanya (beginnersbook.com)
- A Pace-by-Step Guide to Database Normalization
- ETNF – Essential tuple normal form
Source: https://en.wikipedia.org/wiki/Database_normalization
0 Response to "How Do You Know When a Database if Totally Normalized"
Post a Comment