on Apr 7th, 2008DomainObject and DataMappers
Update: Part2 Now Available
(The first thing you should do is to download this file: data_mapper.zip, it contains all the files for this article and I will sometimes refer to files within the archive since some examples are to long to put in the article. This article also assumes that you’re fluent in SQL, know how to work with PDO (php.net/PDO) and have decent understanding of object oriented concepts)
Today we’re going to dive into the beautiful world of data mappers, what are they? How do they work? Why should you use them? Let’s start at the beginning, you know relational databases – those nifty tools you use to save information in a (hopefully) structured way so it’s easy to search, index and do various other things with. Now, databases are very good at what they do there’s no arguing that, and querying them from PHP is as easy as it’s ever been with at least two different APIs to choose from for each database driver that’s supported.
However, problems arise when we want to apply different business rules to our data that is store in a relational database – one solution is to use stored procedures, but you can only fit so much logic into those before it gets to cumbersome. After stored procedures we have functions, yes – simple php functions that usually look something like this:
<?php
function update_user_name($id, $name) {
// Do something to the users row in the database
}
However these functions have several drawbacks, first you need to either use a global variable to hold the database connection or pass the connection to every function every time, which quickly gets annoying. Second these functions don’t fit very well with our object oriented applications, where we prefer to think of our domain as objects instead of sets and functions. So, after functions there’s the monolithic class that usually isn’t much more then a wrapper for the previously mentioned functions.
Enter Active Record, the pattern that Ruby on Rails made infamous – Active Record is half the solution to our problem, however it couples our domain model very hard with the database logic – which is something we really want to avoid (why it’s preferable to avoid this is out of the scope of this article, as it would make for a smaller essay). After Active Record we have what this article is all about, Domain Objects and Data Mappers.
Alright now, stop.
Let’s go back to the beginning, but a different beginning then the relational databases – let’s look at our object oriented application. Here in object utopia everything and everyone is an object, we have a User, Post, Comment and all these other objects all fighting for our attention. As long as we stay inside our application all our objects are happy and life is good, but when our request is over and our application is shutting down the objects have to be stored somewhere – that’s right, in the database. Now a problem arises for our objects, they don’t know how to store themselves in a database, in fact they’re pretty ignorant and only care about their own realm. Let’s take a look at one of these objects, namely the User:
<?php
class User extends DomainObject {
protected $name;
function setName ($name) {
$name = (string)$name;
if (strlen($name) < 2)
throw new Exception('User::$name must be atleast two characters long');
$this->name = $name;
}
function getName () {
return $this->name;
}
}
Some of you might have noticed that the User” class extends the “DomainObject” class, we’ll get to that soon. Our User object is rather simple, it has one field the protected $name and we can change and retrieve that name with it’s two accessor methods setName() and getName() respectively. As you can see here the User has no idea of what a database is and especially not how to store itself in one, now enter the UserMapper – the class responsible for knowing how to store User objects in the database and later retrieve their rows and restore them to objects (Open the data_mapper/model/UserMapper.php file for reference, as the file is to big to put in the article).
This mapper isn’t complete in any way (lacking methods for deleting and updating User objects for example), but it will suffice for this article to explain the principle behind Data Mappers. As you can see the UserMapper class extends the DataMapper class, the DataMapper class is empty except for one static variable named $dbConn that we set to a PDO connection (in boostrap.php). Let’s take a look at the internals of the UserMapper class, first up is the insert()-method.
It’s one and only argument is a User object, and the first thing it does is to check so that the method getId() (which the User object has inherited from the DomainObject class, open up the file data_mapper/lib/DomainObject.php, it should be self-explaining) returns null, which means that this User doesn’t have an Id, which in turn means that it doesn’t have a row representing it in the database – so we’re free to insert it and create the row. Secondly, we create a prepared statement and feed the user’s name to it for the only parameter and execute it. The user now has a row representing it in the database, after that we retrieve the value of the last inserted id and give it to the user object. Last we add the object to the $map class field with it’s id as key, more on this later.
And the runner up is… find()!
The next method up is find(), find takes one parameter which is an $id integer and tries to find a row in the database that matches the $id and create a User object from it, let’s dissect it line by line. The first three lines should be self explanatory, we execute a prepared statement and retrieve the only row that it returned, after that we check so that $row isn’t false (if it’s just return false since there is no row with that id), last we call the method getObjectForRow() and pass it the row we got from the database, so guess which method is next for dissection?
You guessed it - it’s getObjectForRow(), this method is our most complex so far. On the first line we get the user_id column value and save into $user_id – this is just for convenience and readability further down. Second we check if an object already exists in $map under the key with the value of $user_id, if it does there’s no reason to create a new object and we can just return it. However, if it doesn’t exist we create a new User, assign the name and id to it and put it in the map under it’s id, then return it.
So what do we have this $map array to? This “pattern” is called Identity Map (http://martinfowler.com/eaaCatalog/identityMap.html) and is what we use so we don’t get two objects representing the same database row, to quote Fowler:
“An Identity Map keeps a record of all objects that have been read from the database in a single business transaction. Whenever you want an object, you check the Identity Map first to see if you already have it.“
So, now we’ve gone through the User and UserMapper class, let’s try them out (the tables used in these examples can be found in data_mapper/examples/sql/tables_ddl.sql) in a real example:
<?php
require 'bootstrap.php';
$anakin = new User;
$anakin->setName("Anakin Skywalker");
var_dump($anakin->getId()); // NULL, Object does not have a row in the database
UserMapper::insert($anakin); // Saves Anakin Skywalker to the database
var_dump($anakin->getId()); // int(1), Object has a row in the database and a id
On the first line we just include bootstrap.php which is located in the same folder that connects to the database (remember to edit the database connection details in bootstrap.php so they fit your database) and includes all files we need for our examples.
Next up is the creation of a Sith Lord, Anakin Skywalker. We create a new User object and set it’s name to “Anakin Skywalker”, after that we var_dump() it’s return value from getId() which returns NULL, meaning that we haven’t saved the object to the database yet, so let’s do that with UserMapper::insert($anakin); After that we check the getId() return value again, and sure enough it returns int(1).
Let’s try something a bit more advanced, have a look at this example, where we create and then retrieve a User from the database:
<?php
require 'bootstrap.php';
$obiwan = new User;
$obiwan->setName("Obi-Wan Kenobi");
UserMapper::insert($obiwan); // Saves Obi-Wan Kenobi to the database
var_dump($obiwan->getId()); // int(2)
$sameObiwan = UserMapper::find($obiwan->getId());
var_dump($sameObiwan->getId()); // int(2)
if($obiwan === $sameObiwan) {
echo '$obiwan and $sameObiwan points to the same object';
}
The first eight or so lines are almost identical to the previous example, so I’ll skip those here. The first interesting line is number ten, where we call UserMapper::find() with the id-value of the object we just created and inserted – now why would we do that? In real life we probably wouldn’t, but this is an example so bare with me. The find() method returns an object to us, now what is this? We already have an object representing that row, preposterous! An impostor! Hold your horses, in the next line you can see that $sameObiwan’s getId() method also returns int(2), so have two objects pointing at the same now? No, not exactly – if you check the next if()-statement you will see that $obiwan and $sameObiwan (as it’s name suggest) points to the very same object, why you may ask. Because in our UserMapper::getObjectForRow()-method we check to see if we already have an object that represents the row with that very id we’re trying to create, and returns that if we have. And since we inserted the $obiwan object and the insert()-method saved the object into the $map (UserMapper.php, line 33) we get the same object back, nifty eh?
- Let’s hug it out bitch!
Who, what? Oh, Post is here for his fifteen minutes of fame, sure – let’s take it up a notch. First of all check out the data_mapper/model/Post.php file, it should be pretty similar to the User.php file in the same directory that we looked at earlier. Post extends DomainObject, has two fields ($text, $user) and two accessor methods each for these fields. Next, bring up data_mapper/model/PostMapper.php and let’s have a look at it. As the UserMapper class PostMapper extends DataMapper, however the findAll() method is new, also the insert() method works a bit different.
PostMapper::findAll()’s three first rows should be self explanatory we select all posts, left join in the user for each post and retrieve them all with fetchAll(). Next we start looping through the rows we got, first we call Posts’s getObjectForRow()-method (which is identical to the UserMapper’s method with the same name except we create a Post-object instead of a User-object) to create our Post-object. Secondly, since we left-joined the users to the posts and each returned row both contain the Post and User rows, we call UserMapper::getObjectForRow() on the same row to create our user. We assign the User to the Post by calling setUser() and then put our creation in the $posts array. This process is done for each Post/User-pair and then we return all of it at the last line.
PostMapper::insert() is also a bit different from it’s UserMapper cousin, so let’s have a look shall we. The first thing we do is to retrieve the Post’s user and make sure it’s actually an instance of the User object (we can’t save a post that doesn’t have a user, it’s not a magical post that can post itself now is it?), then we have to make sure the user actually has an id, if doesn’t (the getId() method returns null) we insert it first into the database so it get it’s id that we later insert into the posts-table. The next rows in PostMapper::insert() is close to identical to UserMapper::insert(), so they should need no further explanation.
Let’s try this out in practice, shall we? Let’s have a look at this nugget of gold:
<?php
require 'bootstrap.php';
$anakin = UserMapper::find(1);
$leia = new User;
$leia->setName('Leia Organa');
$leias_post = new Post;
$leias_post->setText('You came in that thing? You\'re braver than I thought.');
$leias_post->setUser($leia);
$anakins_post = new Post;
$anakins_post->setText('I find your lack of faith disturbing.');
$anakins_post->setUser($anakin);
PostMapper::insert($leias_post);
PostMapper::insert($anakins_post);
var_dump($leia->getId(), $leias_post->getId(), $anakins_post->getId()); // int(3), int(1), int(2)
At the first row we retrieve or Anakin Skywalker object (which we know should have id 1 from our previous examples), secondly we create a new user under the name of Leia Organa, this is to demonstrate that the PostMapper::insert()-method works both with a saved and unsaved User object as a Post’s User. Next we create Two posts and assign the first to Leia and the second to Anakin. Last we call PostMapper::insert() on the two posts, what happens now is that with the first post ($leias_post) the PostMapper will see that Leia isn’t saved in the database and will call UserMapper::insert() on Leia, so she gets put in the database – and then save her post afterwards. Anakin however is already in the database and his post is inserted without having to do anything with him.
Last we call var_dump() on leia and the two posts to show that they were put in the database, we should get back something like this int(3) int(1) int(2).
We’re closing on on the end, but let’s do one more example before we call it for the evening. It’s time to retrieve the posts we inserted previously, that’s easily done with the PostMapper::findAll() method, let’s have a look at this listing:
<?php
require 'bootstrap.php';
$posts = PostMapper::findAll();
foreach ($posts as $post) {
printf('%s said: %s', $post->getUser()->getName(), $post->getText());
}
We’ve already gone through most of the logic in this example, as most of the heavy lifting is done in the findAll() method which we dissected earlier, and as you can see everything works as expected – we get an array of our posts back and loop through it, calling getUser() on the post returns our user object so we can chain getName() on that and get the user’s name, calling getText() returns the text of the post – exactly as expected.
So, why on Hoth would we ever want to do this? As I said in the paragraph about active record: It’s all about decoupling our database from our object oriented application (domain model), this allows us to develop our application and our database independently of each other. A change in the database schema doesn’t require more then a fix to the insert/find-methods in our mappers. A lot more can (and has) been said on this subject, so I advice you to go and read up on it, it’s a very interesting problem.
There’s one last thing I’d want to touch a bit on before I close for today, namely writing your mappers and domain objects by hand (as in this article) or use a tool such as Propel or phpDoctrine, personally I stand on the writing-by-hand side of things, but both approaches have their respective advantages and disadvantages so I urge you to go and check out these two projects (even though phpDoctrine is more aching towards Active Record then Data Mapper).
Over and out, Fredrik.
Fantastic article! Clear, well thought out code and enough ‘back-story’ to put the article in context without straying from the point. I had not long started on a sort of poor-man’s ActiveRecord clone for a ‘fun’ project I’ve been toying with, I think now I will go back and rewrite the DB side of things using this approach - not only will it be less DB intensive thanks to the map, but this way also seems to ‘fit’ a little easier with my own personal way of thinking. Thanks greatly, and I look forward to exploring your past articles!
This sound perfectly.
I am a PHPer also, and learn more from your post,thanks.
It’s nice to see a sophisticated article about PHP and persistence layers - thanks for this excellent work.
I’m working with a similar approach for a while now an the only drawback of your solution is, that for larger tables resolving associated classes immediately (like in Post->User) might be a performance issue - especially when using your implementation of PostMapper::findAll(). Using lazy loading for associated classes - in most cases - solves such problems.
So, one would have an internal variable $_userId (or whatever) and the implementation of Post->getUser() would be like this:
class Post
{
// …
public function getUset() { UserMapper::find( $this->_userId ); }
// …
}
This solves also the problem of association chains: Imagine an enhanced version of your model, where you have (multiple) addresses associated to users. In other words you have a foreign key from the address table to the user table and from the post table to the user table. You’d have to change your implementation of PostMapper::findAll(), which in first place has nothing to do with the address table (and IMHO should’t have to).
And further more the PostMapper::getObjectForRow() doesn’t return a valid object when called not from within findAll(). Public interfaces shouldn’t depend on the the way they are used.
And to finally spin the wheel a bit further: Think of a solution that hides the concrete persistence layer behind the DomainObject (the persistence layer might be changed from database to XML or a concrete db to another (MySQL to Oracle)). The DomainObject would then function as a bridge to the Mapper like this:
$yoda = new User( “Yoda” );
$yoda->save();
var_dump( $yoda );
$yoda2 = User::find( $yoda->getId() );
Regards, Kyle
Chris C and Heresy.Mc - thanks for your comments.
Kyle:
Yes, in fact the original article and examples had a LazyLoad class that worked like a proxy for the User-object in the Post, but I felt that the examples became to complicated and the article was getting to long so I decided to cut it as I thought of this as more of a introduction do DO/DM:ing then a “complete” guide to all techniques you can use (locking, unit-of-work, value fields, etc.), maybe for a follow up article. The thing about writing articles is to know where to limit yourself, I could’ve written a smaller doctoral thesis on this subject ;p
About the getObjectForRow()-method, I think I just missed putting protected in front of it and creating an abstract base method in DataMapper(), or I could’ve left it out on purpose to keep things clear - don’t remember.
Also I wouldn’t tangle the Domain Objects with knowledge of the mappers, as one of the main benefits of using a Data Mapper is that your DO’s are completely ignorant of any type of database / persistence and can act as pure in memory objects.
- Fredrik
I don’t mean to flame you or anything, but if you are trying to decouple your database from your application, this pattern is doing the exact opposite. By hardwiring the relationship between the field names and the method names, and by adding application-specific methods, even as simple as findAll(), you are in fact binding your application to your storage method. This is why serious ORM such as Doctrine has a notion of abstract “domain objects” and a superior query language for their manipulation; in order to truly make your code database-independent, you need a high level (and fat) indirection layer that provides reliable interfaces both to your application and to the database.
Also, I spotted that you use implicit caching in your example. This is a great idea, but it is also a minefield. Imagine if I clone one of your object instances, change a field and save it. Worse yet, imagine that another concurrent script execution modifies and saves it. This inevitably means that all subsequent code in the current execution will break, because any subsequent request for the object would return an inaccurate copy of an object, which will be modified and saved back eventually, resulting in mysterious data loss that is nearly impossible to debug.
Mike Seth, (I don’t mean to flame you either) Maybe it wasn’t clear in my example, but when talking about decoupling the application from the database I was talking about decoupling the actual domain objects from the way they are stored, which I do (although it’s very simple and I still hard-code the Mapper+Method names, but again they are only examples on ~50 lines of code and not a complete ORM) - and never mention database independence (which is one of the great lies of our programming generation, or well - at least mine.) - and yes to achieve even this separation I would need a good high level layer that abstracts it all away, but again - they are examples meant for the people that have no or almost no idea on how to work with the very basics of a data mapper, the concept if you will.
The implicit caching with the identity map is indeed a minefield, but you shouldn’t really clone a domain object now should you? And the race condition will always exist in one way or another no matter how much you try to work around it (at least in php) due to the nature of php and it’s share-nothing-architecture, no?
This is both quite good and timely, it’s treatment of the DataMapper pattern is elegant and clear, however I am just trying to start application development with patterns and had to go through the Apress title ‘PHP Objects, Patterns and Practice’ which also dealt with this pattern and evolved it to something quite complex but more thorough, my issue is how do you use all these patterns with say the Zend framework’s MVC in an Ajax application, putting them altogether just eludes me. Great post, thanks.
Simply outstanding ^_^! I like posts like that. Your blog is added to my favorites ;-). Continue writing.
I’m curious as to this DomainObject-fellow. His children, how is the error-handling taken care of? Lets say I have a Post-object, and it has a $message-field. The message cannot be empty, it cannot be longer than 200 characters and let’s say (for the fun of it) it cannot contain the word “sausage”. Is all this checked inside or outside the Post-object? For example:
http://pastebin.se/195238
Or how exactly is it handled in a smooth’n'efficient way?