Using CGI Programs
The Common Gateway Interface (CGI) is a standard for interfacing external applications with web servers. CGI was originally developed as part of the NCSA HTTP server and is an old standard for interfacing external applications with HTTP servers. It still enjoys considerable use.
CGI was created to allow dynamic data to be generated in response to HTTP requests and return the results to the clients's browser. Plain HTML documents are typically static, while a CGI program allows the response data to be dynamically created.
CGI scripts are written in any language that can read from the standard-input, write to the standard-output, and access environment variables. This means that virtually any programming language can be used, including C, Perl, or even Unix shell scripting.
However, since CGI was first developed, several better means of creating dynamic web pages have been created that are faster and more efficient. Read more about such replacements in Using GoActions.
Embedthis GoAhead supports CGI so that existing CGI applications can be fully supported. GoAhead has a high-performance and fully featured CGI Handler that alleviates many of the pains with configuring CGI setup.
Configuring CGI Programs
Requests for CGI programs are identified by a unique URI prefix specified at build time. This is typically "cgi-bin". The CGI programs and scripts are stored in special CGI directories outside the document root.
When a URI is requested by a browser that includes the "/cgi-bin/" prefix, the script name immediately after "/cgi-bin/" will be run. For example:
https://www.embedthis.com/cgi-bin/cgitest
Invoking CGI Programs
When a CGI program is run, the GoAhead CGI handler communicates request information to the CGI program via Environment Variables.
CGI Environment Variables
CGI uses environment variables to send your program additional parameters. The following environment variables are defined:
Variable | Description |
---|---|
AUTH_TYPE | Set to the value of the HTTP AUTHORIZATION header. Usually "basic", "digest" or "form". |
CONTENT_LENGTH | Set to the length of any associated posted content. |
CONTENT_TYPE | Set to the content mime type of any associated posted content. |
DOCUMENT_ROOT | Set to the path location of web documents. Defined by the DocumentRoot directive in the GoAhead configuration file. |
GATEWAY_INTERFACE | Set to "CGI/1.1" |
HTTP_ACCEPT | Set to the value of the HTTP ACCEPT header. This specifies what formats are acceptable and/or preferable for the client. |
HTTP_CONNECTION | Set to the value of the HTTP CONNECTION header. This specifies how the connection should be reused when the request completes. (Keep-alive) |
HTTP_HOST | Set to the value of the HTTP HOST header. This specifies the name of the server to process the request. When using Named virtual hosting, requests to different servers (hosts) may be processed by a single HTTP server on a single IP address. The HTTP_HOST field permits the server to determine which virtual host should process the request. |
HTTP_USER_AGENT | Set to the value of the HTTP USER_AGENT header. |
PATH_INFO | The PATH_INFO variable is set to extra path information after the script name. |
PATH_TRANSLATED | The physical on-disk path name corresponding to PATH_INFO. |
QUERY_STRING | The QUERY_STRING variable is set to the URI string portion that follows the first "?" in the URI. The QUERY_STRING is note decoded. |
REMOTE_ADDR | Set to the IP address of the requesting client. |
REMOTE_HOST | Set to the IP address of the requesting client (same as REMOTE_ADDR). |
REMOTE_USER | Set to the name of the authenticated user. |
REMOTE_METHOD | Set to the HTTP method used by the request. Typical values are: "DELETE", "GET", "HEAD", "OPTIONS", "POST", "PUT", or "TRACE". |
REQUEST_URI | The complete request URI after the host name portion. It always begins with a leading "/". |
SCRIPT_NAME | The name of the CGI script being executed in a format suitable for self-referencing URL. |
SERVER_ADDR | The IP address of the server or virtual host responding to the request. |
SERVER_HOST | Set to server hostname without port. |
SERVER_NAME | The server's hostname, alias or IP address as it would appear in self-referencing URLs. |
SERVER_PORT | The HTTP port of the server or virtual host serving the request. |
SERVER_PROTOCOL | Set to "HTTP/1.0" or "HTTP/1.1" depending on the protocol used by the client. |
SERVER_URL | Set to server hostname with port. Suitable for use in a URL. |
SERVER_SOFTWARE | Set to "Embedthis GoAhead/VERSION" |
Example
Consider the following URI which will run the Perl interpreter to execute the "pricelists.pl" script.
http://hostname/cgi-bin/myScript/products/pricelists.pl?id=23&payment=creditCard
This URI will cause the following environment settings:
Variable | Value |
---|---|
PATH_INFO | /products/pricelists |
PATH_TRANSLATED | /var/goahead/web/products/pricelists Where /var/goahead/web is the DocumentRoot |
QUERY_STRING | id=23&payment=credit+Card |
REQUEST_URI | /cgi-bin/myScript/products/pricelists?id=23&payment=credit+Card |
SCRIPT_NAME | myScript |
This URI below demonstrates some rather cryptic encoding of URIs. The hex encoding %20, is the encoding for the space character. Once passed to the CGI program, the convention is for CGI variables to be delimited by "&".
http://hostname/cgi-bin/cgiProgram/extra/Path?var1=a+a&var2=b%20b&var3=c
This URI will cause the following environment settings:
Variable | Value |
---|---|
PATH_INFO | /extra/Path |
PATH_TRANSLATED | /var/goahead/web/extra/Path |
QUERY_STRING | var1=a+a&var2=b%20b&var3=c |
REQUEST_URI | /cgi-bin/cgiProgram/extra/Path?var1=a+a&var2=b%20b&var3=c |
SCRIPT_NAME | cgiProgram |
URI Encoding
When a URI is sent via HTTP, certain special characters must be escaped so the URI can be processed unambiguously by the server. To escape the special characters, the HTTP client should convert them to their %hex equivalent. Form and query variables are separated by "&". For example: a=1&b=2 defines two form variables "a" and "b" with their values equal to "1" and "2" respectively.
CGI Programming
CGI program can return almost any possible content type back to the client's browser: plain HTML, audio, video or any other format. CGI programs can also control the user's browser and redirect it to another URI. To do this, CGI programs return pseudo-HTTP headers that are interpreted by GoAhead before passing the data on to the client.
GoAhead understands the following CGI headers that can be output by the CGI program. They are case-insensitive.
Header | Description |
---|---|
Content-type | Nominate the content Mime Type. Typically "text/html". See the mime.types for a list of possible mime types. |
Status | Set to a HTTP response code. Success is 200. Server error is 500. |
Location | Set to the URI of a new document to which to redirect the client's browser. |
ANY | Pass any other header back to the client. |
For example:
Content-type: text/html <HTML><HEAD><TITLE>Sample CGI Output</TITLE></HEAD> <BODY> <H1>Hello World</H1> </BODY></HTML>
To redirect the browser to a new location:
Location: /newUrl.htmlTo signify an error in the server:
Status: 500
CGI for VxWorks
CGI's standard implementation requires that standalone processes be executed and their outputs returned to the browser via the WebServer. In VxWorks, processes are not implemented, but rather tasks are. In addition to understanding the mechanisms used in the implementation of VxWorks CGI tasks, developers of CGI processes must be aware of the differences between processes on other operating systems and tasks on VxWorks.
- VxWorks tasks can be spawned using code already loaded in memory. On VxWorks systems with no file system, the CGI task code can be included in the operating system image and is not necessarily contained in a file.
- If the CGI code is contained in a file, a browser request for it will cause it to be loaded into memory prior to its execution. It will be unloaded and reloaded each time it is invoked, which allows the upgrading to a new version between invocations.
- The VxWorks taskSpawn API is used to spawn the CGI task.
- An entry point symbol name must be used to spawn the task. The request for the CGI process can define this entry point name in the request by including the query string keyword=value pair "cgientry=symbolname", where symbolname is a function name in the CGI code that is to be executed. If cgientry is not defined in this way, a default entry name will be searched for in the loaded code. The default name is "basename_cgientry", where basename is the name of the requested CGI process minus any file extension or path information (e.g., if the request is for "cgi-bin/cgitest.out", the default entry point symbol name will be "cgitest_cgientry"). If the entry point symbol name is not found or if the requested module cannot be loaded, the CGI request will fail.
- The priority of the spawned task will be the same priority at which WebServer is running.
- The stack size of the spawned task is 20K.
- The task name will be the same as the entry point name.
- The standard CGI environment variables are copied to the task environment. They can be retrieved/modified by the getenv/putenv APIs.
- Command line arguments (if any) are passed to the user's entry point via a (int argc, char **argv) standard convention, where argc is the number of arguments and argv is an array of strings.
- As in standard CGI processes, the VxWorks CGI task can retrieve additional POST data from the standard input device and must write any output to be returned to the client to the standard output device. These devices are actually temporary files where stdin and stdout have been redirected.
- User-defined CGI task codes should always be terminated with a return rather than an exit API. This allows environment space and redirected I/O files used by the task to be cleaned up and released back to the operating system appropriately.
Hints and Tips
If you have special data or environment variables that must be passed to your CGI program, you can wrap it with a script that defines that environment before invoking your script.
Other Resources
The following URIs may be helpful in further reading about CGI:
- For an introduction to CGI: http://en.wikipedia.org/wiki/Common_Gateway_Interface
- For the actual CGI specification: http://tools.ietf.org/html/draft-robinson-www-interface-00
- Other CGI resources: http://www.cgi-resources.com