Introduction: SQL injection refers to the vulnerability where hackers can manipulate user input to a Web application and gain unintended access to a database. For example, banks want their users to be able to make transactions online, provided they supply their correct password. A common architecture for such a system is to have the user enter strings into a Web form, and then to have those strings form part of a database query written in the SQL language. If systems developers are not careful, the strings provided by the user can alter the meaning of the SQL statement in unexpected ways.
Example: Suppose a bank offers its customers access to a relation
AcctData(name, password, balance)
That is, this relation is a table of triples, each consisting of the name of a customer, the password, and the balance of the account. The intent is that customers can see their account balance only if they provide both their name and their correct password. Having a hacker see an account balance is not the worst thing that could occur, but this simple example is typical of more complicated situations where the hacker could execute payments from the account.
The system might implement a balance inquiry as follows:
- Users invoke a Web form where they enter their name and password.
- The name is copied to a variable n and the password to a variable p.
- Later, perhaps in some other procedure, the following SQL query is executed:
SELECT balance FROM AcctData
WHERE name = 5 : n ' and password = ' : p '
For readers not familiar with SQL, this query says: "Find in the table AcctData a row with the first component (name) equal to the string currently in variable n and the second component (password) equal to the string currently in variable p; print the third component (balance) of that row." Note that SQL uses single quotes, not double quotes, to delimit strings, and the colons in front of n and p indicate that they are variables of the surrounding language. Suppose the hacker, who wants to find Charles Dickens' account balance, supplies the following values for the strings n and p:
n = Charles Dickens' — p = who cares
The effect of these strange strings is to convert the query into
SELECT balance FROM AcctData
WHERE name = ' C h a r l e s Dickens' —' and password = 'who c a r e s'
In many database systems — is a comment-introducing token and has the effect of making whatever follows on that line a comment. As a result, the query now asks the database system to print the balance for every person whose name is ' C h a r l e s Dickens', regardless of the password that appears with that name in a name-password-balance triple. That is, with comments eliminated, the query is:
SELECT balance FROM AcctData
WHERE name = ' C h a r l e s Dickens'
In above Example, the "bad" strings were kept in two variables, which might be passed between procedures. However, in more realistic cases, these strings might be copied several times, or combined with others to form the full query. We cannot hope to detect coding errors that create SQL-injection vulnerabilities without doing a full interprocedural analysis of the entire program.
Buffer Overflow: A buffer overflow attack occurs when carefully crafted data supplied by the user writes beyond the intended buffer and manipulates the program execution. For example, a C program may read a string s from the user, and then copy it into a buffer b using the function call:
- If the string s is actually longer than the buffer b, then locations that are not part of b will have their values changed. That in itself will probably cause the program to malfunction or at least to produce the wrong answer, since some data used by the program will have been changed. But worse, the hacker who chose the string s can pick a value that will do more than cause an error. For example, if the buffer is on the run-time stack, then it is near the return address for its function. An insidiously chosen value of s may overwrite the return address, and when the function returns, it goes to a place chosen by the hacker. If hackers have detailed knowledge of the surrounding operating system and hardware, they may be able to execute a command that will give them control of the machine itself. In some situations, they may even have the ability to have the false return address transfer control to code that is part of the string s, thus allowing any sort of program to be inserted into the executing code.
- To prevent buffer overflows, every array-write operation must be statically proven to be within bounds, or a proper array-bounds check must be performed dynamically. Because these bounds checks need to be inserted by hand in C and C programs, it is easy to forget to insert the test or to get the test wrong. Heuristic tools have been developed that will check if at least some test, though not necessarily a correct test, has been performed before a strcpy is called. Dynamic bounds checking is unavoidable because it is impossible to determine statically the size of users' input. All a static analysis can do is assure that the dynamic checks have been inserted properly. Thus, a reasonable strategy is to have the compiler insert dynamic bounds checking on every write, and use static analysis as a means to optimize away as many bounds check as possible.
- It is no longer necessary to catch every potential violation; moreover, we only need to optimize only those code regions that execute frequently. Inserting bounds checking into C programs is nontrivial, even if we do not mind the cost. A pointer may point into the middle of some array, and we do not know the extent of that array. Techniques have been developed to keep track of the extent of the buffer pointed to by each pointer dynamically. This information allows the compiler to insert array bounds checks for all accesses. Interestingly enough, it is not advisable to halt a program whenever a buffer overflow is detected. In fact, buffer overflows do occur in practice, and a program would likely fail if we disable all buffer overflows. The solution is to extend the size of the array dynamically to accommodate for the buffer overruns.
- Interprocedural analysis can be used to speed up the cost of dynamic array bounds checks. For example, suppose we are interested only in catching buffer overflows involving user-input strings, we can use static analysis to determine which variables may hold contents provided by the user. Like SQL injection, being able to track an input as it is copied across procedures is useful in eliminating unnecessary bounds checks.