Continuing on with the development of our database class, the next problem to tackle is SQL injection/spoofing. This security concern refers to users who try to make a query do something other than what it was intended to do by typing SQL into a form field. PHP’s coverage of injection will get you caught up to speed. The problem is easy to identify, but hard to completely stop. Unfortunately, there is no preventInjection()
function to save the day. We need to look at the different options available and see how to work them into the database class.
Considerations
Escaping characters: By ensuring that characters are escaped, you can be comfortable knowing that all user input is treated as a string, so that words like
OR
are not treated as SQL operators.Every query is different: When checking the validity of a query, there is no general rule for what should be accepted. If you are performing a
DELETE
, you will not want users to be able to insert aLIKE
or anOR
. If you are performing anINSERT
, a common word such as “where” needs to be allowed from a textarea. Based on the operation, you may wish to filter out different words. While escaping characters will handle most of this for us, we may need more flexibility in certain situations, so being able to specify illegal terms is important.SafeSQL: This file handles SQL injection for you as best as it can. I do not have a ton of experience with it, so I am opting to implement my own solution, but it is another option out there for those of you who wish to use it.
Different databases: Originally, I had thought to make the class work regardless of the database. I am beginning to think it may be better to create different files depending on the database. The primary reason for this is because of functions like
mysql_real_escape_string()
andmysql_insert_id()
.
Implementation
The new function, clean()
, takes two parameters: a string and an array of strings. The first is the text to modify, and the second is an optional parameter that contains a list of all words to remove. You can use it like this:
$cDB = new cDatabase("","","","");
$cleanvar = $cDB->clean($var, array(" like", " or"));
The returned string will be properly escaped, ignored if it is numeric, and will have the illegal phrases you specified removed.
How It Works
The function loops through the array of illegal words, and removes them. It then checks for non numerical strings and escapes them properly.
function clean($text, $removals = array()) {
$conn = mysql_connect($this->host, $this->user, $this->password)
or die(mysql_error()); for($i = 0; $i
I’m intrigued to as to why you wouldn’t just escape single quotes for MySQL? Why go through the nuisance of connecting to the database just to use the mysql-specific function?
Good point Jonathan.
And why should your database class have to worry about injection? Surely you’d be better off trapping this with your form data filtering and validation routines so that your database class never HAS to deal with this kind of hacking…
I’m afraid your clean function is a terrible solution - it will corrupt data like this comment, which contains both the keywords or and like.
SQL injection is an easily solved problem in practically every other language with a DB library - PHP’s addslashes() and building SQL queries by gluing strings together is an unfortunate exception. Consider this example from Python:
cursor.execute(""" UPDATE animal SET name = %s WHERE name = %s """, (new_name, cur_name))
The DB API handles all escaping for you, adding quotes around strings and escaping nasty characters as necessary. It looks like SafeSQL takes the same approach. In fact, so does PEAR.
Simon - in the function the keyword array is optional, so you have control over maintaining data like this comment. It is easy to preserve words like “or” or “like”. I still agree with you about other limitations though.
Tim - I could easily see this function being in my form validation class. Right now, it is in my database class because it is called from
sqlInsert()
andsqlUpdate()
as the query is being built. Out of curiosity, what are your reasons for keeping this type of operation away from a database class.Jonathan - Once I change the way connections are handled with the class, this function will not be opening a new connection or adding any overhead. But I do agree with you still, and I should just go ahead and escape them myself.
Ryan - Good question. Sorry. Didn’t mean that post to sound so damning! You’ve made me think about it though so I’ll try and explain why my perspective on this is slightly different:
Having created a somewhat different database class, I’ve been quite interested in what you’ve been doing with these articles as you’ve approached the problem from a completely different angle. Thanks to your excellent articles, I’m probably going to adapt some of my methods!
My own database class only deals with the execution of a query, dealing with errors and returning relevant information to the executed query.
There is no validation on the SQL before it is run simply because most of the time the SQL is created within another function or object. The only way that the SQL could therefore be broken, hacked or injected then is via the user input (whether that be POST, GET, SESSION, file input etc).
In my humble opinion validation routines should be run on inputted data before ANYTHING else is done with it. For this reason I have several specific filtering and validation routines to be run on ALL input variables as soon as they are called into use. This basically adds security to your PHP no matter what you’re doing - regardless of using the data for SQL generation, file manipulation, SESSION creation etc.
I guess basically what I’m saying is that making sure that user-inputted data is secure should always occur well before even considering using it in a query.
However, I also see that SPECIFIC validation for SQL operations could be handled by the database class (as you have here).
It’s down to whether you have a seperate validation and data security class or whether each of your classes has it’s own validation and data security functions built in.
Having thought about it some more - perhaps the optimum solution is to do both.
In any case I’ll shut my big mouth and let you get on with it…
Tim, I can agree with that. I do all input validation before a query is even considered except for this one function, but I can definitely see your point of view.