Basic queries: SQL and RA

Retrieving data with SQL SELECT

To look at the data in our tables, we use the select (SQL) statement. The result of this statement is always a new table that we can view with our database client software or use with programming languages to build dynamic web pages or desktop applications. Although the result table is not stored in the database like the named tables are, we can also use it as part of other select statements. The basic syntax (which is not case sensitive) consists of four clauses:

        SELECT <attribute names>
        FROM <table names>
        WHERE <condition to pick rows>
        ORDER BY <attribute names>;

• Of the four clauses, only the first two are required. When you are learning to build queries, it is helpful to follow a specific step-by-step sequence, look at the data after each modification to the query, and be sure that you understand the results at each step. We’ll build a list of customers who live in a specific zip code area, showing their first and last names and phone numbers and listing them in alphabetical order by last name.

1. Look at all of the relevant data—this is called the result set of the query, and it is specified in the FROM clause. We have only one table, so the result set should consist of all the columns (* means all attributes) and rows of this table.

        SELECT *
        FROM customers;

Customers
cfirstnameclastnamecphonecstreetczipcode
TomJewett714-555-121210200 Slater92708
AlvaroMonge562-333-41412145 Main90840
WayneDick562-777-30301250 Bellflower90840

2. Pick the specific rows you want from the result set (for example here, all customers who live in zip code 90840). Notice the single quotes around the string you’re looking for—search strings are case sensitive!

        SELECT *
        FROM customers
        WHERE cZipCode = '90840';

Customers in zip code 90840
cfirstnameclastnamecphonecstreetczipcode
AlvaroMonge562-333-41412145 Main90840
WayneDick562-777-30301250 Bellflower90840

3. Pick the attributes (columns) you want. Notice that changing the order of the columns (like showing the last name first) does not change the meaning of the data.

        SELECT cLastName, cFirstName, cPhone
        FROM customers
        WHERE cZipCode = '90840';

Columns from SELECT
cLastNamecFirstNamecPhone
MongeAlvaro562-333-4141
DickWayne562-777-3030

4. In SQL, you can also specify the order in which to list the results. Once again, the order in which rows are listed does not change the meaning of the data in them.

        SELECT cLastName, cFirstName, cPhone
        FROM customers
        WHERE cZipCode = '90840'
        ORDER BY cLastName, cFirstName;

Rows in order
cLastNamecFirstNamecPhone
DickWayne562-777-3030
MongeAlvaro562-333-4141

Why SQL works: the RA select and project

Like all algebras, RA applies operators to operands to produce results. RA operands are relations; results are new relations that can be used as operands in building more complex expressions. We’ll introduce two of the RA operators following the example and sequence above.

1. To represent a single relation in RA, we only need to use its name. We can also represent relations and schemes symbolically with small and capital letters, for example relation r over scheme R. In this case, r = customers and R = the Customers scheme.

2. The select (RA) operator (written σ) picks tuples, like the SQL WHERE clause picks rows. It is a unary operator that takes a single relation or expression as its operand. It also takes a predicate, θ, to specify which tuples are required. Its syntax is σθr, or in our example:

σcZipCode='90840'customers.

The scheme of the result of σθr is R—the same scheme we started with—since we haven’t done anything to change the attribute list. The result of this operation includes all tuples of r for which the predicate θ evaluates to true.

3. The project (RA) operator (written π) picks attributes, confusingly like the SQL SELECT clause. It is also a unary operator that takes a single relation or expression as its operand. Instead of a predicate, it takes a subscheme, X (of R), to specify which attributes are required. Its syntax is πXr, or in our example:

πcLastName, cFirstName, cPhonecustomers.

The scheme of the result of πXr is X. The tuples resulting from this operation are tuples of the original relation, r, cut down to the attributes contained in X.

• For X to be a subscheme of R, it must be a subset of the attributes in R, and preserve the assignment rule from R (that is, each attribute of X must have the same domain as its corresponding attribute in R).

• If X is a super key of r, then there will be the same number of tuples in the result as there were to begin with in r. If X is not a super key of r, then any duplicate (non-distinct) tuples are eliminated from the result.

Just as in the SQL statement, we can apply the project operator to the output of the select operation to produce the results that we want: πXσθr or

πcLastName, cFirstName, cPhone σcZipCode='90840'customers.

Since RA considers relations strictly as sets of tuples, there is no way to specify the order of tuples in a result relation.