It’s hard to imagine pushing the limits of object oriented PHP so far that your web servers choke, but the truth is those limits are reached faster than you think. We’ve run some tests over at Wufoo and it turns out that any sort of mass object creation is pretty much not going to work at scale. The problem is this limit on object creation forces developers to balance code consistency, which is desirable—especially for the old-schoolers, with performance. While replacing objects with arrays when possible makes things a little better, the most performance friendly approach involves appending strings. For your convenience, we’ve run some tests that measure page execution times and memory usage to create the following guideline to help you plan out what areas of your code may have to break away from an object oriented nature.
The Benchmarks
Basically, we set up a simple PHP page to iterate over a loop and create 1) a giant concatenated string, 2) an array of arrays containing the word ‘test’, and 3) an array of objects with one variable set to ‘test’.
Load Time | Memory Used | |
Control | 12.6ms | 1.42mb |
11,000 strings | 15.7ms | 1.45mb |
11,000 arrays | 26.6ms | 3.99mb |
11,000 objects | 148.8ms | 7.70mb |
25,000 arrays | 44.1ms | 7.25mb |
1,500,000 strings | 253.2.6ms | 7.14mb |
Doesn‘t the array in the array of objects test itself consume (at least some) load time and memory?
It does, but it is a small amount. I ran some tests and it seems that if you were to make 11,000 variables and assign an object to each variable instead of keeping in an array, the memory used in the object test would drop from 7.7mb to 7.36mb. That is small enough to still keep the data relevant, and also by returning an array of the objects the developers avoids working directly with the SQL query and the construction of objects. It’s a good point though—I wasn’t aware of the overhead on each array position when storing an object.
When I need to have something link $users = UserAccessor->loadAllUsers() -> I don’t return array of object or something like that I return Iterator object who saves memory because is only one object. In 90% of the time I need foreach for the records so iterator is very handy here because I pull only the object I currently need. In the othe 10% I have toArray() method to convert Iterator to normal object
I assume by “a giant concatenated string” you mean an array of strings. It wouldn’t be fair to compare 11K objects vs. 11K arrays vs. 1 string. Among your solutions you could also add usage of the Flyweight pattern (when applicable).
FWIW, I generally find that memory usage of a structure is roughly proportional to strlen(serialize($structure)). Realizing this makes you think twice about using long string keys in an array when you’re going to need 1000s of them. And instead of 1000 arrays with 5 keys each, it’s less memory to create 5 arrays with 1000 values each.
At some point your optimization starts to affect maintainability and you’re better off throwing more memory at the problem (or reducing your paging limit).
I think this test has not too much sense. Of course it is good practice to know precisely how things works behind the wall, but why is it useful to compare an object with a string or an array? Their purpose is much more different they are not comparable. And usually one will use only a few object so this overhead will not so big. And even the overhead is not important for most of the projects but the development time it is.
Use of Iterators can minimize memory consume. If we talking of database access and PDO, we don’t fill our server with all data retrieved. Working this, we can iterate over the User Collection, once per time.
Steve, good tip on the strlen(serialize). That will make things a bit easier to run a quick check on in the future.
Iterators are definitely an option also — similar to the paging mention above. I guess the difference being that iterators still only require one query. The main downside is the amount of work up front, and the flexibility needed if you don’t want the developer to have to deal with the construction of the object. But you’re right, when memory is a concern and full code reusability is wanted, an iterator may be the best approach.
The whole point of including the string in the test is because it helps achieve the desired output. For example, if you need a CSV file, you can loop over a recordset and add on to a giant string to get your desired output. Obviously not the best approach, but worth noting just how fast it is.
I haven’t make any benchmarks but I would quote one of Marcus Börger articles:
“The big difference Arrays require memory for all elements allow to access any element directly Iterators only know one element at a time only require memory for the current element forward access only Access done by method calls”
It would be interesting to see benchmarks with ArrayObjects/ArrayIterators used.
I needed recently to loop over large array of arrays and create in each loop new ArrayObject from current array element. Instead of having in each loop something like $foo = new ArrayObject($largeArray->current()) I’ve created empty ArrayObject $foo before entering loop and than on each iteration $foo->exchangeArray($largeArray->current()).
I haven’t done any serious benchmarking but I quickly compared memory usage and execution time using KCashegring and results where in favor of exchangeArray() approach.
What order am I supposed to read these comments in?
Please do not continue this bad myth, that one should not use OO, because it would be slow.
Good article! We have run into similar issues, particularly because we rely completely on OO at some points and don’t use a DB at all until data comes back for reporting. Even then, we have some studies with over 22,000 data points (dynamic fields) in a single record (crazy I know — we do all sorts of mysql acrobatics to allow it). Before we optimized we had to bump the memory limit up to 300M just to allow larger studies to run. Servers are cheap — but not that cheap.
Anyways, Thomas, I don’t think anyone is saying that you should avoid using OO. I think the Ryan’s point is that as developers we need to be aware of resource usage — especially as you scale up.
Great article, thanks. I’m just starting up on what may be a fairly intensive PHP OO application and these concerns are at the forefront of my design and planning