Stage tables
For many applications with a database back end, stage tables are the first point of entry (or first line of defense, depending on your perspective) for data entering a database. In their simplest form, stage tables are images of the contents of a data file. In terms of purpose, stage tables serve as intake, upload, or collector mechanisms. A very common method of populating stage tables is by using SQL*Loader. Stage table loading can also be used to perform data hygiene. Create a unique index on a particular column, and the upload process takes care of removing duplicates. Other bad or discarded line items can be collected and reported back to the data provider. Depending on your needs, you can also use stage tables to begin or perform transformation processes.
From a use perspective, you will probably want to reuse stage tables. The application or database - it doesn’t matter which one, just as long as it gets done - should perform a clean-up or preparation step prior to uploading data. Is it done by using REPLACE in SQL*Loader or by a prepared statement in Java which uses TRUNCATE TABLE? From a design perspective, does the table need to have logging enabled? If so, why? Are you ever going to rollback or recover anything in a stage table? Probably not, so keep the DDL type of features or options to a minimum. Now that the data is inside that database, where does it go next?
Operational tables
One way to define tables of this type is to consider what they contain. Live data, such as customer or account information, can be stored in operational tables. These tables are the guts of the database, so to speak. In one form or another, everything in an application revolves around the contents of these tables. Operational tables should be first and foremost with respect to security, backup, and recovery, and therefore, should employ whatever means or features to meet these requirements. This includes logging, enabled row movement for flashback, explicit permissions, auditing, fine grained access, virtual private database, and whatever else your situation/application may require.
These tables are further characterized by how they are designed. Normalization is a word or process that should immediately come to mind, and along with this comes the consideration of related objects such as indexes, views, sequences, and triggers, to name a few. These tables are what I referred to as "normal" tables in the introduction and normal seems to be a good fit for two reasons. First is that these tables are normally seen throughout a schema, and second is that these tables typically need to be normalized.
Another type of operational table is one related to how an application or schema operates. An example would be storing information about disposition tables (see below). Another example is familiar for those who design security (roles and permissions) into a schema as opposed to having Oracle manage this function.
Functional tables
Lookup or intersection tables fall into this category as their main purpose is to perform a function (i.e., resolve a many-to-many relationship). Another type of functional table is a shopping cart/order entry table (contains an order number and one or more line items). These tables are relatively easy to identify and their purpose is discussed in virtually all database design books. Functional tables bear mentioning because of their close association with operational tables; however, not all tables are created equal.
Identify the crucial functional tables and place them into the same category with respect to the importance placed on operational tables. How can you distinguish between them? One way is to consider how the contents change, that is, is the data static or not? Static tables (e.g., job code versus job description) can be repopulated from a script. Dynamic tables require recovery. Do you directly populate these tables or is it done indirectly (via a trigger)?
Disposition tables
Tables of this type are frequently created by performing a join between a stage table and a combination of one or more operational and functional tables. Disposition tables are commonly used for reporting or staging data to be spooled out into disposition files. You may find many tables of this type in a schema, all with the same definition. What differentiates them is the date they were created or the number or records (weekly report or newsletter recipient list consisting of fewer rows as compared to an end of month report or recipient list).
A disposition table can also be similar to a stage table in that its contents are routinely flushed and repopulated for a pending disposition or report process. An alternative means of keeping disposition data is to borrow from the data warehouse side of things and use a flight history type of table. You will not need to retain 100% of the disposition table's contents, but you will need to capture what is essential. Who sent what, and when, and can details be reconstructed if necessary? If details are lost due to data changes elsewhere in the database, then the record keeping requirements become more complicated. It then becomes a trade off between keeping scores of disposition tables and maintaining a huge flight history type of table.
Archive tables
A close cousin on the production side to a warehouse's flight history table is the archive table. By its nature, an archive table's content is permanent, so one way to delineate the life span of what is in a disposition table is to consider the degree or permanency of the data. Forever (or, at least whatever the service level agreement retention policy is) equals archive, and something more short term is subject to being disposed of in a disposition table.
Other tables
Audit tables are discussed in detail throughout much of Oracle's documentation and generally so in most textbooks. It is not the data, but who did what to the data and when. One type of table being seen more in applications is the metadata table (data about data). Not sure of what an example of this type of table looks like? Several tables in Oracle's data dictionary are excellent examples. The DBA/ALL/USER_TABLE family contains data about a table's data (number of rows, last analyzed, etc.).
Figure 2-1 HR Schema
In Closing
Production database tables, for the most part, seem to lack the same naming clarity as seen in data warehouses. Fact and dimension tables have clearly defined names because of their purpose, but what type of table is your customer account table? Whether or not you agree that the type name should be operational, what does matter is that everyone involved in the management of this type of table have the same understanding as to its purpose. If someone asks you to name table types in Oracle (and this applies to other database systems), ask that person if he or she meant physical or logical, how they are designed (DDL) or how they are used, production or warehouse, and so on. You can help expand that person's frame of reference by going beyond what is in the
Database Concepts documentation.
Oracle Tables and Statistics
The guys who wrote Oracle are pretty smart. One of the things they built in the database is this program called the optimizer. The optimizer’s job is to take SQL statements and decide how to get the data that is being asked for in the SQL statement and how to get it in the quickest way possible.
When a SQL statement is executed, the database must convert the query into an execution plan and choose the best way to retrieve the data. For Oracle, each SQL query has many choices for execution plans, including which index to use to retrieve table row, what order in which to join multiple tables together, and which internal join methods to use (Oracle has nested loop joins, hash joins, star joins, and sort merge join methods). These execution plans are computed by the Oracle cost-based SQL optimizer commonly known as the CBO.
The choice of executions plans made by the Oracle SQL optimizer is only as good as the Oracle statistics. To always choose the best execution plan for a SQL query, Oracle relies on information about the tables and indexes in the query.
Once the optimizer has done its job, it provides an execution plan to Oracle. An execution plan is like a set of instructions that tells Oracle how to go and get the data.
This is a pretty simple plan. In it, the optimizer tells Oracle to first go get all the rows of the EMP department, and then sort those rows (Reading an execution plan is somewhat of an art, so trust us on this one, the full scan of the EMP table is first).
Did you notice in the plan that there is a column called ROWS? This is the number of rows that the query will process. How did Oracle know that it was going to process 100 rows. This Oracle thing is smart, isn’t it.
Well, Oracle isn’t quite that smart. In this case, Oracle knew (or in most cases it’s a good guess) we would process 100 rows because we generated statistics on the EMP table after we created the table. The optimizer uses these statistics to generate execution plans.
The optimizer program uses statistics on tables and on the indexes surrounding those tables, so it’s important to have statistics on both. In the next section, we will show you how to generate statistics on tables and indexes in your database. Starting with the introduction of the dbms_stats package, Oracle provides a simple way for the Oracle professional to collect statistics for the CBO.
The old-fashioned analyze table and dbms_utility methods for generating CBO statistics are obsolete and somewhat dangerous to SQL performance because they don't always capture high-quality information about tables and indexes. The CBO uses object statistics to choose the best execution plan for all SQL statements.
The dbms_stats utility does a far better job in estimating statistics, especially for large partitioned tables, and the better stats result in faster SQL execution plans. Here is a sample execution of dbms_stats with the options clause.
exec dbms_stats.gather_schema_stats( - ownname => 'SCOTT', - options => 'GATHER AUTO', - estimate_percent => dbms_stats.auto_sample_size, - method_opt => 'for all columns size repeat', - degree => 15 - ) How to Generate Oracle Statistics
Oracle provides a stored procedure (or program) for you to run that will generate the statistics is needs. Oracle requires statistics on both tables and any associated indexes (we will talk about indexes next), and most of the time you will generate both with just one command.
To generate statistics we use the dbms_stats stored package. There are two procedures contained within the dbms_stats package that you will mostly be interested in, dbms_stats.gather_schma_stats and dbms_stats.gather_table_stats. Also, in Oracle database 10g you will want to gather system statistics and fixed view statistics. Let’s look at each of these operations in a bit more detail next.
There is also an analyze command that you can use to generate statistics. It’s been deprecated in Oracle Database 10g (which means it’s really not supported anymore). So we don’t cover it in this book.
Using dbms_stats.gather_schema_stats
The dbms_stats.gather_schema_stats procedure allows you to gather statistics for all objects in a give schema. This is the easiest way to generate statistics for a large number of objects. Here is an example of using the dbms_stats.gather_schema_stats procedure to gather statistics on the SCOTT schema of a database:
EXEC dbms_stats.gather_schema_stats(’SCOTT’, cascade=>TRUE);
This command will generate statistics on all tables in the SCOTT schema. Since we included the cascade command, the indexes will also have statistics generated on them. This is important, you need statistics on indexes as well as on tables in Oracle!
Of course, this is just the basic way to run this command. Several options are available, but for now as a new DBA this will do. In fact, Oracle 10g automatically collects database statistics every night out of the box. Later you will want to investigate some of the Oracle Database 10g statistics gathering options such as histograms, and granularity.
If you create a new table, then it may not be practical or desirable to re-generate statistics on the entire schema if the schema is quite large and the database is very busy. Instead you will use the dbms_stats.gather_table_stats command to generate statistics for a single table, and optionally for related table indexes. Here is an example:
EXEC dbms_stats.gather_table_stats(‘SCOTT’,’EMP’,cascade=>TRUE);
In this case we are generating statistics for the EMP table in the SCOTT schema. Again we use the cascade parameter to insure all of the indexes get analyzed.