DataCruxWeb Data Modeling Overview

Since we're not too far away from a formal DataCruxWeb release, I thought I'd start getting some details out there. Here's a snippet of a model file from a real, working DataCruxWeb app:


@entity Task

properties are id, name, description, priority, assignedTo, dueDate, complete

  name is a string
  description is a freeform text block, optional
  priority is an item from a static list, list name is 'priorities'
  assignedTo is an object, entity is 'StaffMember'
  dueDate is a date, uses day/month/year
  complete is a flag, optional

sort by increasing complete, increasing dueDate, decreasing priority



Only the first two lines are required. It can also be written like this, which I prefer:


@entity Task

properties are id, name, description, priority, assignedTo, dueDate, complete

  name         is a string
  description  is a freeform text block, optional
  priority     is an item from a static list, list name is 'priorities'
  assignedTo   is an object, entity is 'StaffMember'
  dueDate      is a date, uses day/month/year
  complete     is a flag, optional

sort by increasing complete, increasing dueDate, decreasing priority



Data modeling in both the original Objective-C version of DataCrux and early versions of DataCrux were tricky, so I really wanted to get this right in a public release of DataCruxWeb.

I actually considered a few approaches. XML never seemed like a good choice as it's just too verbose for something like this. I considered writing a desktop or web-based tool to generate the model files, but that seemed like perhaps a bit too much overhead for now.

I also tried one other custom file format before settling on the style above. While writing the parser for that format, it occured to me that I wasn't really solving the problem, which is that most config files look like they were written by and for a computer.

Finally, I decided that I wanted a format that could read like an informal outline -- something similar to an informal specification a developer receives in an email.


How the Data Model is Used

When DataCruxWeb is fired up, it looks for its database. If the database can't be found, it automatically creates it along with all of the associated tables. As you'd expect, it uses the model file to figure out what tables and columns to create.

This isn't the only time the model is used, though. Every DataCruxWeb data object has a sense of which entity it belongs to. If you try to set a value on a DataCruxWeb object using a property name that doesn't exist, the request will just be ignored. In addition, properties can be disabled on a per-page (or potentially, per-request) basis. This is helpful if you want to wall off sensitive data from the end user.

Before an object is saved to the database, you can ask it to validate its contents. The validation is (obviously) governed by the model. Of course, validation can be more than checking for valid data. A filter may actually transform a value from what the web browser sends to a better internal represenation.

The model is also used by the controller layer to determine which objects have to be loaded in from the database. As an example, the Task entity has a to-one relationship with 'StaffMember', so the controller automatically loads available StaffMember objects so that they can be displayed in a dropdown box in a form. This doesn't happen if the relationship key is disabled for the given page, though.


A Walkthrough

This bit, as most people have probably guessed, names the entity.


@entity Task



Core Data and WebObjects folks are already familiar with the terminology, but the practical application of the entity name is that it's what the database table will be called and what a custom class would use.

It's worth noting that as of now, the entity definition is the only place you'll see any sort of special characters used. Everything else is based on alpha numerics, commas, and new lines.

Next are the properties:
                                                                            

properties are id, name, description, priority, assignedTo, dueDate, complete



These are the names that will be used for validation, form processing, column names, and so on. The line just has to start with a "properties are" clause, with the properties separated by commas. Similar to Core Data, the term "property" covers both simple value attributes (numbers, date, strings) as well as relationships.

The "properties are" line is the ultimate authority on what properties are associated with an entity. Although the lines that follow describe the types and validation for each property, they are technically optional. In fact, the "name is a string" line is probably reundant. By default properties are assumed to be a string and required.

Adding a line like "name is a string" will not create a property. The idea here is to allow the developer to easily disable properties by removing them from the "properties are" line. That said, there is some room for confusion here, so I may tweak the behavior.


Next up, we have several lines of property definitions.


name         is a string
description  is a freeform text block, optional
priority     is an item from a static list, list name is 'priorities'
assignedTo   is an object, entity is 'StaffMember'
dueDate      is a date, uses day/month/year
complete     is a flag, optional



The formula is pretty simple. It's basically a property name, followed by "is a/an", followed by a descriptive type, and a few following options. There's quite a long list of types, and part of the reason for that is that a descriptive type determines not only what kind of value will go in the database, but how it will be filtered, validated and/or transformed. In other words, there are a lot of possible combinations.

For example, the definition for "name" is "is a string." This means that the value will be treated as a basic PHP string, and only very basic filtering will be done. Garbage characters will be removed and whitespace will be trimmed and collapased.

The definition for "description" is slightly different -- "freeform text block, optional". This property will also be treated as a string, but whitespace will not be collapsed, which is appropriate for "notes" fields and the like, where you may want to preserve multiple line breaks or spaces. The line also ends with "optional," since properties are implictly required. You can also say "not required," if that's more to your liking.

The definition for priority is "item from a static list", which essentially acts like an enum (multiple choice to the layperson :). If the value doesn't match anything in the predetermined list named "priorities," it's set to a default value. This is the case for both going into and coming out of the database, so cleanup is done at every possible opportunity. As of right now, the static list is a raw PHP structure, but I'm going to just add it to the model file.

The "assignedTo" property is a dynamic to-one relationship to another object. The are various other terms that have the same result, such as "to-one relationship." This definition also specifies that the target object is a 'StaffMember' instance, though DataCruxWeb also supports dynamically-typed relationships.

I think dueDate and complete are fairly self-explanatory. A date property can (obviously), also store times, so the definition says "uses day/month/year". The "complete" property is a flag, which is synonmous with "boolean" or "bool".


The last line specifies the default sort order for objects returned from the database:


sort by increasing complete, increasing dueDate, decreasing priority



I probably don't need to explain much here. This is analagous to a SQL 'order by' clause.


How the Model is Stored

The plain text model file is parsed once, then stored in a serialized format on disk. I need to clean up the code that handles all this a bit. I'm also looking for an intelligent way to handle refreshing the archive file. For now, it's brute force: you have to remove the .dcarchive file. The obvious answer is to compare the contents or a serial number of the archive to the plain text version, but I'd like to do as few fopen() calls as possible.


There's more to talk about in terms of modeling, which I'll get to after talking about the actual API a bit.
Design Element
DataCruxWeb Data Modeling Overview
Posted Sep 1, 2005 — 4 comments below




 

Ted — Sep 05, 05 372

Over at www.sateh.com there is quite a post on wicket.sf.net, a new 'component-based' Java web dev framework which may be interesting to compare with DataCruxWeb. Also interesting is the Project Revision Control system (prcs.sf.net) that can be used as an CVS alternative.

Ted — Sep 05, 05 373

Over at www.sateh.com there is a post on wicket.sf.net, a new 'component-based' Java web dev framework which may be interesting to compare with DataCruxWeb. Also interesting is the Project Revision Control system (prcs.sf.net) that can be used as an CVS alternative.

Ted — Sep 05, 05 374

Oops! One comment more than expected by me! So, if you like, take this for compensation: prcs.darwinports.org ! (BTW I've recently run across an interesting anti-spam measure article at ridiculousfish.com/blog/ ).

Elliot Anderson — Sep 11, 05 393

I want to see more :D

Looks very interesting!




 

Comments Temporarily Disabled

I had to temporarily disable comments due to spam. I'll re-enable them soon.





Copyright © Scott Stevenson 2004-2015