.oO Phrack 49 Oo.
Volume Seven, Issue Forty-Nine
File 08 of 16
CGI Security Holes
--------------------------------------------------------------------------
by Gregory Gilliss
This article will discuss the Common Gateway Interface, its
relationship to the World Wide Web and the Internet, and will endeavor to
point out vulnerabilities in system security exposed by its use. The UNIX
operating system will be the platform central to this discussion.
Programming techniques will be illustrated by examples using PERL.
1. Introduction
The Common Gateway Interface (CGI) is an interface specification that
allows communication between client programs and information servers which
understand the Hyper-Text Transfer Protocol (HTTP). TCP/IP is the
communications protocol used by the CGI script and the server during the
communications. The default port for communications is port 80 (privileged),
but other non-privileged ports may be specified.
CGI scripts can perform relatively simple processing on the client
side. A CGI script can be used to format Hyper-Text Markup Language (HTML)
documents, dynamically create HTML documents, and dynamically generate
graphical images. CGI can also perform transaction recording using standard
input and standard output. CGI stores information in system environment
variables that can be accessed through the CGI scripts. CGI scripts can also
accept command line arguments. CGI scripts operate in two basic modes:
- In the first mode, the CGI script performs rudimentary data
processing on the input passed to it. An example of data processing is the
popular web lint page that checks the syntax of HTML documents.
- The second mode is where the CGI script acts as a conduit for data
being passed from the client program to the server, and back from the
server to the client. For example, a CGI script can be used as a front end
to a database program running on the server.
CGI scripts can be written using compiled programming languages,
interpreted programming languages, and scripting languages. The only real
advantage that exists for one type of development tool over the other is that
compiled programs tend to execute more quickly than interpreted programs.
Interpreted languages such as AppleScript, TCL, PERL and UNIX shell scripts
afford the possibility of acquiring and modifying the source (discussed
later), and are generally faster to develop than compiled programs.
The set of common methods available to CGI programs is defined in
the HTTP 1.0 specification. The three methods pertinent to this discussion
are the `Get` method, the `Post` method, and the `Put` method. The `Get`
method retrieves information from the server to the client. The `Post`
method asks the server to accept information passed from the client as input
to the specified target. The `Put` method asks the server to accept
information passed from the client as a replacement for the specified target.
2. Vulnerabilities
The vulnerabilities caused by the use of CGI scripts are not
weaknesses in CGI itself, but are weaknesses inherent in the HTTP
specification and in various system programs. CGI simply allows access to
those vulnerabilities. There are other ways to exploit the system security.
For example, insecure file permissions can be exploited using FTP or telnet.
CGI simply provides more opportunities to exploit these and other security
flaws.
The CGI specification provides opportunities to read files, acquire
shell access, and corrupt file systems on server machines and their attached
hosts. Means of gaining access include: exploiting assumptions of the
script, exploiting weaknesses in the server environment, and exploiting
weaknesses in other programs and system calls. The primary weakness in
CGI scripts is insufficient input validation.
According to the HTTP 1.0 specification, data passed to a CGI script
must be encoded so that it can work on any hardware or software platform.
Data passed by a CGI script using the Get method is appended to the end of a
Universal Resource Locator (URL). This data can be accessed by the CGI
script as an environment variable named QUERY_STRING. Data is passed as
tokens of the form variable=value, with the tokens separated by ampersands
(&). Actual ampersands, and other non-alphanumeric characters, must be
escaped, meaning that they are encoded as two-digit hexadecimal values.
Escaped characters are preceded by a percent sign (%) in the encoded URL. It
is the responsibility of the CGI script to escape or remove characters in
user supplied input data. Characters such as '<' and '>', the delimiters for
HTML tags, are usually removed using a simple search and replace operation,
such as the following:
----------------8<----------------------------------------------------------
# Process input values
{$NAME, $VALUE) = split(/=/, $_); # split up each variable=value pair
$VALUE =~ s/\+/ /g; # Replace '+' with ' '
$VALUE =~ s/%([0-9|A-F]{2})/pack(C,hex,{$1}}/eg; # Replace %xx characters with ASCII
# Escape metacharacters
$VALUE =~ s/([;<>\*\|'&\$!#\(\)\[\]\{\}:"])/\\$1/g;# remove unwanted special characters
$MYDATA[$NAME} = $VALUE; # Assign the value to the associative array
----------------8<----------------------------------------------------------
This example removes special characters such as the semi-colon
character, which is interpreted by the shell as a command separator.
Inclusion of a semi-colon in the input data allows for the possibility
of appending an additional command to the input. Take note of the forward
slash characters that precede the characters being substituted. In PERL, a
backslash is required to tell the interpreter not to process the following
character.*
The above example is incomplete since it does not address the
possibility of the new line character '%0a', which can be used to execute
commands other than those provided by the script. Therefore it is possible to
append a string to a URL to perform functions outside of the script. For
example, the following URL requests a copy of /etc/passwd from the server
machine:
http://www.odci.gov/cgi-bin/query?%0a/bin/cat%20/etc/passwd
The strings '%0a" and '%20' are ASCII line feed and blank respectively.
The front end interface to a CGI program is an HTML document called a
form. Forms include the HTML tag . Each tag has a variable
name associated with it. This is the variable name that forms the left hand
side of the previously mentioned variable=value token. The contents of the
variable forms the value portion of the token. Actual CGI scripts may
perform input filtering on the contents of the field. However if the
CGI script does not filter special characters, then a situation analogous to
the above example exists. Interpreted CGI scripts that fail to validate the
data will pass the data directly to the interpreter. **
Another HTML tag sometime seen in forms is the