Here are a few observations that I made, some are based on books & online articles. These are mostly applicable to Oracle:
- Make the most often used column in a where clause as the leading column in a concatenated index. Some times, we may need to use a dummy condition such as dept_id>0 and other conditions.
- Avoid using upper or lower functions on columns which are indexed. Oracle ignores indexes and does a full table scan. Also, avoid substr function on indexed columns, instead use SQL like ‘%string%’.
- EXISTS is a faster and a cheaper alternative because the optimizer realizes that when the sub-query has been satisfied once, there is no need to proceed any further, and the next driving row can be fetched.
- UNION ALL returns all rows (no duplicates eliminated) and thus faster, compared to UNION (fetches unique rows).
- HAVING filter “after” rows are returned by query, try to put this in WHERE clause so that condition is applied without returning rows.
- Use cluster columns, if they go together as one, for instance emp id and emp name.
- Use dummy where clause to force use of indexes on character columns, see example here. SELECT ename FROM emp ORDER BY ename; // This uses full table access// Use this instead. SELECT ename FROM emp WHERE ename > TO_CHAR(0); // This uses fast full index scan.
- Use function based indexes on columns which are often used with a function, such as nvl(comm,o) function in queries.
- Avoid using functions on indexed columns, instead try using between or some other where clause, if possible.
- Use BIND variable where possible. We could also set ‘FORCE’ for cursor sharing, see below.
- Create indexes on columns, which are often used for sorting purpose.
- Use decode instead of sub query joins, etc.
- Use partitions for large tables.
- Use index organized tables for queries involving exact match and range searches especially on the primary key.
- Set optimizer_index_cost_adj to a lower OR higher number depending on our desire to increase or decrease the affinity for indices.
- Force cursor sharing if possible.
Hash Joins: Hash joins are used for joining large data sets. The optimizer uses the smaller of the two tables/data sources to build a hash table on the join key in memory.It then scans the larger table, probing the hash table to find the joined rows.This operation involves joining two sets of rows and returning the result. Apart from this the result needs to be SORTED for unique values i.e. an operation involving sorting a set of rows to eliminate duplicates.If the table is small enough to fit into the memory,then the cost is limited to a single read pass over the data for the two tables.But if,the hash table is too big to fit in to the memory,optimizer chooses partitioning.
When the Optimizer Uses Hash Joins:
The optimizer uses a hash join to join two tables if they are joined using an equijoin and if either of the following are true:
- A large amount of data needs to be joined.
- A large fraction of the table needs to be joined.
Clusters
You can create a CLUSTER as follows,
Suppose you want to create a cluster named EMP_DEPT for EMP and DEPT tables.
- First create a cluster named EMP_DEPT.
CREATE CLUSTER emp_dept(deptno number(2));
- Now create the cluster tables as follows,
CREATE TABLE dept (
deptno NUMBER(2) PRIMARY KEY, . . . )
CLUSTER emp_dept (deptno);
CREATE TABLE emp (
empno NUMBER(5) PRIMARY KEY,
ename VARCHAR2(15) NOT NULL,
. . .
deptno NUMBER(2) REFERENCES dept)
CLUSTER emp_dept (deptno);
- In this cluster DEPTNO is the cluster key. Be sure to choose your cluster key carefully and it should represent the join condition between the cluster tables.
Some Points to remember about Clusters:
- Clusters should be used for tables predominantly used in join queries, they help in increasing the performance. They should be preferred in Data Warehouse Environments where the data in the cluster tables is not likely to change often.
- Clusters can degrade the performance of your database when used in OLTP (Online Transaction Processing) environments where the data in the cluster tables is likely to be modified often.
Function based Indexes
Function Based Index can be created as shown below:
CREATE INDEX sal_comm_finx ON EMP ( sal * nvl(comm,0));
Function-based indexes facilitate queries that qualify a value returned by a function or expression. The value of the function or expression is pre-computed and stored in the index. Specific features of function-based indexing include:
- Indexes can be created where the search key is an expression.
- They provide an efficient mechanism for evaluating predicates involving functions.
- Case-insensitive sorts can be Performed.
- Descending order indexes can be created. They are treated as a special case of function-based indexes.
Index Organized Tables (IOT)
You can create index-organized tables in the following way.
CREATE TABLE docindex( token CHAR(20),
doc_id NUMBER,
token_frequency NUMBER,
token_offsets VARCHAR2(512),
CONSTRAINT pk_docindex PRIMARY KEY (token, doc_id))
ORGANIZATION INDEX TABLESPACE ind_tbs;
Index-organized tables are like regular tables with a primary key index on one or more of its columns. However, instead of maintaining two separate storage spaces for the table and B*tree index, an index-organized table only maintains a single B*tree index containing the primary key of the table and other column values.
Index-organized tables provide fast key-based access to table data for queries involving exact match and range searches especially on the primary key as seen above. Changes to the table data (such as adding new rows, updating rows, or deleting rows) result only in updating the index structure (because there is no separate table storage area). Index-organized tables are suitable for accessing data by way of primary key or any key that is a valid prefix of the primary key. There is no duplication of key values and storage requirements are reduced because a separate index structure containing the key values and ROWID is not created.
* Oracle evaluates AND condition from right to left, other databases do from left to right.
* While coding SQL, use consistent spacing, capitalization, etc. This way, Oracle uses existing cached queries (soft parsing versus hard parsing).
* Change SQL condition from NOT to greater sign as in this ‘select * from emp where grade not 1’ to ‘select * from emp where grade > 1’;
* Change OR to IN condition, this is typically faster.
* Like ‘A’ and = ‘A’ are different as like allows trailing spaces in where condition.
* Use LIKE, instead of substring or partial string comparison.
* Use UNION ALL, instead of UNION as UNION uses distinct.
* Use OR condition instead of UNION, this is faster as UNION scans table for every sub select statement. If we have a query “sub query A UNION sub query B”, UNION condition scans table twice. Going by the same logic, “select * from emp where emp_name = ‘Mike’ and not emp_no = ‘007’;” is better than “select * from emp where emp_name = ‘Mike’ MINUS select * from emp where emp_no = ‘007’;
* Some times negative conditions are confusing, resolve them by rewriting. For instance, ‘NOT (field_A and field_B)’ is same as ‘(NOT A) OR (NOT B)’.
Java code through JDBC does an auto commit for each insert, instead we can do “connection.setAutoCommit(False);”
Avoid creation of index if queries or application code can be developed differently.
When use ‘Explain plan’ waits can be identified this way: Dbfile sequential read usually refers to index access; db file scattered read refers to table scans.
Two things that we could quickly check for Oracle performance problems: are statistics up to date and are indexes valid.
For unique index, statistics info is not useful. For non unique index, selectivity is an important aspect. If we are trying to retrieve a small number of rows, it is a good idea to have index on that column. If we are trying to get lots of rows, indexing can have negative performance effect. Since it is difficult to predict selectivity for distributed data, histograms can be used.
Code optimization: Lots of times it is better to use “exists”, compared to join. Having is probably the worst as it filters returned results in the last.
Nested loop: Used when a small number of rows to get joined to a large number of rows. Nested loop is usually preferred by Oracle. First table having a small number of rows, second table having a large number of sorted rows. We need to have an index on larger table (for sorting), smaller table gets (typically also sorted) gets joined.
Hash join: When both tables are about the same. Hash join typically requires one table in a sorted order. Hash join creates a temporary table in memory while constructing a hash join, so hash join is less preferred compared to nested loops. Netezza and other data appliances use Hash joins extensively.
Sort Merge: Most inefficient as it involves sorting both tables first.
Semi join: Uses EXISTS or IN, which are not really joins as such.
Typically, EXISTS is faster than IN. Try avoiding != or not in. Cartesian join is same as ANSI SQL cross join.
Avoid using functions in a where clause (better to use function based indexes).
IOT packs less number of rows in a block (this is a disadvantage), however, it allows better concurrency. For normal indexes, locks and latches are noticed if many users use same index to access data.
Dropping a partition invalidates a global index, whereas local partition index has no such problem.
B tree indexes mostly in OLTP, automatically restructured and space is not recovered. Bitmap indexes mostly in DW, we need to rebuild them frequently.
IOT mostly in DW, some OLTP, use it in the order in which indexes are arranged. Clustered indexes are also used mostly in DW. Partitions are useful in both DW and OLTP.
Cluster may result in row may be else where placed when inserts are encountered. Materialized views may be created on OLTP database so that DW queries can use materialized views (instead of base tables).
Reverse index option when there is lots of contention for sequential insertion going to the same block. If data goes to same block, there is contention and results in waits. Reverse index eliminates this issue. Useful in a RAC environment. We can also “parallel” option to get index creation faster in a multi CPU environment.
For IOT tables, we need to specify IOT option at the time of table creation.
How to create a Cluster: Create a cluster, create a clustered index, add one or more tables to cluster.
To create a materialized view, have privileges for create any table and create any materialized view (explicit privileges are required on underlying table)
Fast Full Index scan: This does not guarantee getting values in any order, so do not expect an order while scanning an indexed table. Fast full index scan just reads all leaf nodes, instead of parsing up and down the tree.
If statistics are not available, Oracle uses dynamic sampling of data.
DB_FILE_MULTIBLOCK_READ_COUNT: Keep this a large number for DW databases.
SORT_AREA_SIZE: Determines maximum sort area for each session.
HASH_JOINED_ENABLED: For hash joins, typically used in OLTP databases
OPTIMIZER_INDEX_CACHING favors nested loops instead of hash joins or sort merge. Nested loops are efficient for one small table, another table having a large number of rows.
OPTIMIZER_INDEX_COST_ADJ: Parameter in favor or opposing index. If this value is set (to high?), it is likely to do full tables scans, instead of using an index.
OPTIMIZER_MAX_PERMUTATIONS: To have in a limited permutations computations.
Performance tuning: start with wait events (from OEM) or spotlight. Statspack allows snapshots at different times for comparison; statspack stores all information in its own schema and its own tablespace (perfstat), it has to be run on the database server, can not be executed from client;
Histograms give a better statistics for values that are spread out; To generate histograms, use ‘analyze table table_name compute statistics for columns col_name’;
DBMS statistics can be used to collect statistics for table, NOT for indexes (meaning, in parallel?) or clusters. DBMS stats runs in parallel. Since DBMS generates for CBO, it includes rows, blocks, etc., but not chained rows. We can gather statistics for Tables, indexes, schema, database or system.
Set first rows in Oracle for CRM applications or interactive applications. Cursor sharing to force and set timed_statistics to true. Time statistics measures wait events and has minimum impact on database performance.
Most database implementations use dedicated servers (as opposed to shared servers, which support connection pooling). This is not a concern as most application servers support connection pooling any way. Only thing that affects in a major way is that if user is away and a session is inactive, it still uses resources.