Before you set up a new database, usually you spend a lot of time at the white board. Here are some basic tips: my Dos and Don'ts of database design. Most probably they will reduce your efforts and help you to gain a clean database design. I didn't write the book on database design, but I think my experience earned in many projects could be helpful in some cases. My examples refer to Progress® databases and Progress Software® 4GL, but you'll get the idea, even when you use another database system.
Let's start with a few naming conventions. The usage of dashes, spaces, digits and special characters is a bad idea, although your database and operating system might handle these characters (Cobol semantics like CUST-NAME-1 are ugly and outdated). Ensure the uppercase and lowercase conversion of each name (applies to tables, prefixes, attributes, sequences etc.) is unique within the scope of your enterprise-wide databases. Check your spelling, renaming tables and attributes afterwards is a PITA.
CamelCase all names. Acronyms and abbreviations should not be used in names when they aren't well known by your users. If you can't avoid them, write only the first character in capital letters, especially in composite names like UpsServiceTypes or Customer.VatId. Well, no rule without exception, ID (unique tuple identifier in a table) as well as OID (enterprise-wide unique object identifier) should always be printed in capital letters, as long as the abbreviation is part of the name of a technical attribute (e.g. VatId - Value Added Tax Identifier vs. CustID or CustOID - primary key of Customers).
Avoid language mixups, especially if you're not a native speaker and/or your application has no English user interface. Names like Buddies.BudVerjaardag sound plain silly, but Maat.MtVerjaardag is understandable (at least if your understanding of Dutch is flawed). Check your spelling. Once your application is running, it's hard to live with typos.
Table names and labels designate the business object. Don't use technical wording nor geek speech. Persistent instances of customers live in a table named Customers, assigned UPC numbers in AssignedUpcNumbers, UPS shipments in UpsShipments and UPS parcels in UpsParcels. Since you store more than one instance of a business object in each table, use plural only.
Each table can be identified by an (enterprise-wide) unique prefix. Never use a prefix twice. If you have both Invoices and Inventories, assign different prefixes like Inv for Invoices and Ivt for Inventories. The prefix is part of each attribute name and should be used in related sequences and index names as well.
So far, so easy. When it comes to attribute names, naming conventions become more complicated. Let's start with technical attributes, because there is no occasion for interpretations.
In order to guarantee uniqueness, each table has a technical primary key (a surrogate primary key populated by the create trigger with a unique sequence value, but preferential a UUID), which will never get a business meaning. Don't argue, primary keys with business meaning as well as composite keys are a bad idea. There is nothing to say against additional unique columns with business meaning, but do not merge the underlying technical implementation with your business logic. Name the primary key = table prefix + OID (or ID), e.g. CustOID or CustID. If an object has children or is an attribute of other objects, use the unchanged and unextended name of the parent table's primary key as foreign key in the child table respectively referencing table.
Say you've a table Invoices and a table Addresses:
Addresses.AdrOID [primary key]
Invoices.InvOID [primary key]
Invoices.AdrOID [foreign key]
Index Invoices.AdrOID and you can code
FOR EACH Addresses OF Invoices:
FOR EACH Invoices WHERE Invoices.InvNetAmount >= 1000.00,
EACH Addresses OF Invoices WHERE Adresses.AdrZipCode BEGINS '34':
FOR EACH Addresses WHERE Addresses.AdrOID = Invoices.AdrOID:
There is one exception to this rule. Sometimes an object is an attribute of another object multiple times, without being a class itself. Different roles are marked by a number sign '#'. The most important foreign key name is kept as is, other roles are extended by '#Role':
Invoices.InvOID [primary key]
Invoices.AdrOID [billing address]
Invoices.AdrOID#Delivery [delivery address]
Actually, this is way beyond a clean (normalized) database design. Also, most design tools will not handle such non-normalized structures. If possible, you should avoid attribute name extensions, better normalize instead. To bring this point home, let's say your customers provide permanent delivery addresses. By the way, delivery addresses tend to have their own attributes and behavior. Most probably a bunch of shipping addresses are an attribute of Customers:
DeliveryAddresses.DelAdrOID [primary key]
DeliveryAddresses.CustOID [foreign key]
DeliveryAddresses.AdrOID [foreign key]
DeliveryAddresses.DelAdrDispatchType [another attribute, which in real life would be the reference to a carrier]
Invoices.InvOID [primary key]
Invoices.AdrOID [billing address]
Invoices.DelAdrOID [delivery address]
Let's come to attributes with business meaning. Besides technical attributes in different roles, I can think of other cases where it is necessary to extent attribute names. For example default values. As long as there is just one default value, put it in the attribute's definition. Otherwise you've a table storing those values:
Discounts.DiscAppliesToBusinessType [e.g. wholesale, distributors, retail...]
Since discounts given to customers are calculated individually, the percentage can vary from customer to customer and it makes no sense to reference Discounts in Customers. However, in the interest of a readable model it is good style to mark the source, therefore the attribute discount percent of Customers keeps it's source:
There are other advantages of consistent naming rules. In commercial applications you're dealing with discount percentages in tons of objects. Imagine you need to analyze your enterprise wide discount policy. Finding all instances of discount percentages can become a PITA in complex systems. Consistent naming provided, you can search in your system tables for 'DiscPercent*' and you get a complete list:
If your application shall be used by a group of (affiliated) companies, where each single company is representing another client in the multi-client capable accounting system, things become difficult. The easiest solution would be the physical splitting of your ERP database. Keep all common objects like countries, currencies, users, clients (=accounting clients) etc. in one database, and all company related objects in another database. Connect your users to the first ERP database and the accounting database, let them choose a client, then create an alias for the client's ERP database to ensure all client databases can share the same programs. Large operations tend to shop and sell subsidiary companies every once in a while. The usage of physical client databases makes this kind of moves a simple and painless task.
Unfortunately, sometimes a developer's life is not that easy. In a multicorporate enterprise many subsidiary companies work on the same projects, billing their time and material partly within the group. That means subsidiary companies share access to a lot more business objects than just countries and currencies. Besides a ton of group-wide objects, templates to ensure enterprise-wide identical customer account numbers and such stuff, you need the attribute accounting client in many objects. Do not use the same attribute name in all tables, because database systems and design tools can't handle the primary relations if you do it. Name the column client number (or client OID) differently in each table, using the source pointers explained before, e.g.
The above said leads to the cognition, that consistent naming is a good idea in general. IOW: Without a strong naming convention your project will fail. Each and every name must be self-explanatory and similar meanings must be kept in identical wording. Some examples:
Invoices.InvPrinted says whether an invoice has been printed or not, Invoices.InvDatePrinted stores the date of the last printout, Invoices.InvPrintCounter tells us how many times an invoice has been printed yet and can be used to mark copies. The same goes for confirmations of orders and other forms:
OrderConfirmations.OrdConfPrinted, OrderConfirmations.OrdConfDatePrinted, OrderConfirmations.OrdConfPrintCounter and so on.
Look at the first attribute in my example. In common speech Invoices.InvPrinted can stand for a Boolean value as well as for a date. To avoid any confusion, you can make it even clearer by naming the logical attribute Invoices.InvIsPrinted, which leads to perfectly understandable code like ...
FOR EACH Invoices WHERE NOT Invoices.InvIsPrinted AND
Invoices.InvDateCreate =< (TODAY - 10) AND
EACH Customers OF Invoices,
EACH Staff OF Customers:
lOk = sendEmail(Staff.StEmailAddy,
'Send out invoice # ' + STRING(Invoices.InvNumber) + ' $'
'To ' + crlf + getMailAddress(Customers.AdrOID) + crlf + ' immediately')
Staff.StBrowniePoints = Staff.StBrowniePoints - 1.
... and more examples. All types of amounts are addressed by the same name:
Invoices.InvNetAmount Orders.OrdNetAmount ...
Invoices.InvTaxAmount Orders.OrdTaxAmount ...
Invoices.InvGrossAmount Orders.OrdGrossAmount ...
All numbers are called 'Number' and not 'No', 'Num' ('Num' usually means 'number of') or whatever:
Customers.CustNumber (if there is a numeric customer number)
Countries.CoIsoNumber (ISO 3166 numeric code)
Alphanumeric codes are (usually) named 'Code' like
Countries.CoIsoCode (ISO 3166 alphanumeric code)
Products.PrdCode (or Products.PrdSku)
Borderline cases are 3rd party, non-unique technical keys with business meaning like the UPS 1Z Tracking Number, which contains both digits and letters. I'd call it UpsParcels.UpsP1zTrackingNumber, because the term is a matter of common knowledge and, technically spoken, '1Z' even indicates an alphanumeric value.
The same goes for all common name components like 'description', 'remarks', 'name', 'quantity', 'price' and so on, I guess you've got the idea. If possible, try to express the data type by attribute names, not only in attributes of the type date and logical. 'Url' or 'Description' indicate a single-line character field, 'LongDescriptions', 'Remarks' or 'Notes' usually get stored in large text fields, 'Percent', 'Amount' and 'Price' imply decimal values, 'NumberOf' or 'PageNumber' represent integers and so on.
As for the visible parts of your model, there is not much more to say, except check your spelling before you save definitions and assign a help text to each attribute. Besides the above mentioned object identifiers and one to many relationships, you need a policy for many to many relationships too. Those are kind of technical classes, making complex relationships persistent. Users will never see their names nor attributes, so you may use geek speech. Here is a proven system: name those tables composing your unique table prefixes delimited by the digits '2' (to) and '4' (for). If your customers can belong to different groups, the table representing the relationship 'customers [belonging] to customer groups' is named Cust2CustGrp and contains only three keys:
Cust2CustGrp.Cust2CustGrpOID [primary key]
Cust2CustGrp.CustOID [foreign key]
Cust2CustGrp.CustGrpOID [foreign key]
To handle all customers of a group you code
FOR EACH Cust2CustGrp OF CustomerGroups,
EACH Customer OF Cust2CustGrp:
To get a list of all groups a customer belongs to you write:
FOR EACH Cust2CustGrp OF Customers,
EACH CustomerGroups of Cust2CustGrp:
In some rare cases these prevailing technical classes have other attributes. Pragmatically, here I'd go for an descriptive table label and stick with the geeky table name. Actually, most probably those attributes are simple connections, keeping the table itself invisible to users. E.g. if you've a table storing Xmas present types, you could assign the type (or value) of presents depending on one of the groups assigned to your customers:
Cust2CustGrp4Xpt.Cust2CustGrp4XptOID [primary key]
Cust2CustGrp4Xpt.Cust2CustGrpOID [foreign key]
Cust2CustGrp4Xpt.XptOID [foreign key]
Cust2CustGrp4Xpt.Cust2CustGrp4XptOID [primary key]
Cust2CustGrp4Xpt.CustOID [foreign key]
Cust2CustGrp4Xpt.CustGrpOID [foreign key]
Cust2CustGrp4Xpt.XptOID [foreign key]
Pick whatever fits your needs best.
Now let's come to another important rule: Separate all technical stuff from your business logic. You can't avoid technical attributes in tables representing business objects, but you can and you should handle them separately. For example you can assign values like
Table.PrefixUserLastUpdate (if you don't log user activities, probably you need to store these data on creation too)
Table.PrefixIsActive || Table.PrefixIsDeleted
in database triggers. Be aware that in n-tier architectures database triggers usually do not know the user. If you need to log user activities, you can implement this feature in your key wrapping widgets. Since your technical primary keys can't be used in user interfaces, you create a key wrapping widget for each primary key. This widget knows the invisible primary key and enables the user to choose or enter one or more attributes with business meaning, which can be used to identify an object. Looking at a data viewer, those widgets appear just like fill-in fields with search button or combo boxes. In the background they pass values of technical keys as well as screen values of their visible attributes with business meaning to an application server, or another process handling your persistent objects.
Back to logging. Since every data viewer must contain at least one key wrapping widget (one handling the primary key and probably a few others handling foreign keys), you can determine the current user here. Just pass another hidden value to your persistence handler. Then in the database trigger you compare the old and new buffer, logging changes only. With a Progress® database, you can fully automate user activity logging using generated includes in write triggers, made up by a tool accessing the virtual system tables (VST). By the way, you should assign values to primary keys in create triggers only. At this point, recap another important rule on state of the art software design: Do not put any business logic into the user interface code. Think SOA and encapsulate technical services as well as audit trail requirements.
Another rule of thumb is: Do not delete physically. Admitted deletions are technically possible, they are way too expensive, not really necessary and furthermore you destroy information which as a rule you will need some day. Deleting logically on the other hand perfectly keeps your referential integrity, and it is way faster because your database servers update just one column in a parent table, instead of bothering with often almost endless cascading deletes along with RI checks. Adding a WHERE clause [NOT] ParentTable.PrefixIsDeleted, or, much better, [NOT] ParentTable.PrefixIsActive is cheap in comparison with all the nasty side effects of physical deletion. Tell your delete button to set a logical attribute isDeleted to true, or even dump the button and use a check box instead, which allows your users to reactivate inactive objects.
Large projects can easily exceed the physical limits set by your database system. If you deal with very large amounts of data in particular entities, ensure that primary keys of (physically sliced) mega entities are never used as foreign key in other tables. Only the (logical) mega 'table' keeps knowledge about relations to other entities. That should not lead to problems, because these entities are usually children of others (for example sales transactions of sales slips of POS terminals of shops). Implement a smart data access layer handling the requests from higher application levels. Depending on key value ranges and/or date-time attributes, the data access layer can determine in which table a requested tuple is located and in which table a new tuple must be stored, while from the higher level's perspective this conglomerate of tables comes into view as one logical table.
The next warning has, like the two rules above, the potential for a bunch of articles: Avoid array fields. Most persistent arrays I saw, were the work of lazy code monkeys who weren't capable to look a step further. Although some database systems like Progress® can handle array fields, most database systems do not (why should they support tables in tables for database designers not able to normalize properly?). Furthermore, lots of front ends and underlying components as well as development tools will not handle extended attributes. Migrating applications it's hard enough to handle these constructs in settled (legacy) databases, so don't create new troubles. As for Progress® word indexes, which work like a charm with character arrays, there is an alternative compatible with other databases. Just add a word indexed large text field and populate it with a string of the attributes in question in your write trigger.
Modelers and developers following the relational theory as set in stone most possible will be offended by some of the code examples above. In former paradigms it was -politely expressed- not the best practice to use syntax like ChildTable OF ParentTable, because (using attributes with business meaning as primary and foreign keys) it was not obvious which attribute pair got used to join the objects. However, we got rid of that incredible stupid concept in the meantime. OF has evident advantages:
A clean database design provided, those misunderstandings caused by ommission cannot occur, because each and every join uses a single pair of indexed technical keys in both tables. The technical implementation of relationships has nothing to do with business logic, thus the consistent usage of OF increases code readability. Actually, technical attributes should not appear in any code handling business logic (exceptions like Table.PrefixIsActive, standing for not logically deleted, and other technical attributes with at least a portion of business meaning admitted).
If OF fails, you have a technical problem like a missing index on a foreign key column or (indexed) attribute names are equal in both tables, which both must not happen. Fortunately the compiler will quit with an error message in this case. That means, the consistent usage of OF followed by a WHERE clause expressing business logic by testing attributes with business meaning, prevents you from logical errors as well as errors and ommissions in the physical database design.
As I said in the beginning, my intention was not to write a book explaining each and every aspect of database design. Most probably that's impossible, because different business requirements do need different solutions. I wrote this article off the top of my head on a rainy Saturday afternoon, so please don't expect completeness. And since I make a living with IT consulting, you'll agree that it would be a bad idea to publish all my business secrets ;)
Published: December 2004 LastUpdate: May 2005
Progress® Database Design Guide Local PDF
Good Programming Practices - Data Design
Progress® Programming Code Standards Local DOC