This is an extension of my previous post (“Netezza Questions and Answers”).
What data type is most suited for zone maps
Zone maps are typically useful for integers, date and time, variations of this data type. Zone maps are useful for ordered that that are usually built into data that is loaded; for example, phone call logs. [Also check more on zonemaps in my original post below].
How are materialized views (MV) used in Netezza.
Similar to other databases, materialized views are defined against base tables. Just like in views, we can not insert, update or delete from MV. MV is automatically used by optimizer when appropriate. When base table data changes, MV gets automatically updated by Netezza. MV is based on a single base table, thus practically not as useful.
What are typical join types in Netezza
Mostly hash joins (note that Netezza may do hash join in memory or on the SPUs if memory is not sufficient for doing a hash join); in some cases sort merge or even some cases nested loops are performed (did not really come across much; In Netezza lingo it is called either function based join or something similar, not sure about the correct term). Of course, cross or product joins are possible like in any databases.
What is equivalent of replication of a table in Netezza on all SPUs.
Imagine a situation in which a small table (us_states) is required for joining against a large table (creditcard_txn). Netezza may decide to read this table from all SPUs (remember, table data is spread on all SPUs) and “assemble” this table on host. This data is then broadcasted to all SPUs, resulting in a copy on each SPU. Note that an important db parameter “factrel_size_threshold” holds that triggering number; any table beyond these many number of rows is considered as a fact table. That is, if this parameter is set to 1 million, any table holding more than 1 million rows is considered by Netezza as a fact table and will not result in this replication or broadcast.
Primary goal of a table design is to distribute data evenly on all tables. Is it a good idea to choose multiple columns in Netezza so that data gets distributed evenly.
NO, unless all columns are used during a join process. Most likely, this results in a large amount redistribution of data during query execution.
From design point of view, do you forsee performance problem using order ID as an integer for one table and Order num as varchar for another table as distribution keys. Assuming that these two tables are often used for join, do you see any performance problem.
YES, we will encounter performance problem as Netezza redistributes integer to varchar type and recreates the first table. This can be avoided using both tables distributed on order number as either integer or varchar.
What is a snippet
Snippet is a small block of database operation, typically three to four operations, that are carried out on all SPUs where data is location. If a query results in these snippets: Snippet A, Snippet B, Snippet C, ….Snippet X; they are carried out in a sequential manner.
List which options are prioritized when join operation is required.
Netezza evaluates joins in this order of preference:
1) Colocated joins: All data for joins are located on the same SPU.
2) Redistribute: All required data is not located in the same SPU, send data to corresponding SPU where driving table data is located.
3) Broadcast: Mentioned as replication above; Send all data from SPU to host which collates all that data, sends it to all SPUs. Each SPU has entire table data. That is, if Netezza machine has 32 SPU, there will be 32 physical tables, one one each SPU.
Coming to joins, typically Netezza prefers in this order:
A) Hash join in memory
B) Hash join on disk
C) Sort Merge join
D) Nested loops
E) Cross join
Oracle preference closely resembles the same or similar order.
How can I look into some system parameters
nzsystem showRegistry: Command for looking into system specific information
/nz/data/postgresql.conf: To check NZ db parameters; we can change this file OR use set command at SQL.
Why integer data type is preferred in Netezza.
A couple of reasons:
1) Better joins, thus effecient.
2) Netezza compress works only for integer type of data, not with varchar or date.
3) Zonemaps are based on integer data types.
How can we find log (SQL) activity for a day.
We can find this under /nz/kit.x.y/log/postgres/pg.log file. Older files are named as pg.log.N (Where N starts from 1, after pg.log file this is the latest file). Assuming that we are looking for a week day within pg.log, we may run
$ cat pg.log | sed -n “/2010-02-01 00/,/2010-02-05 23:59/p” > pg.firstweekFeb2010.log
If this produces no data, look for corresponding log file based on the last update timestamp (ls -ltr sorts them in reverse time stemp order).
What are the ways to get data into Netezza. What happens if inserts are interrupted, how Netezza handles commits.
Please see my previous posting first. Here is a short list: load using nzload, SQL inserts (very slow), create table as command or inserts from other tables, and external tables. When insert is interrupted, all rows that are inserted are already committed unless we use transaction command.
How do you nzsystem command
Most Netezza commands come with a variety of sub commands. For nzsystem, we can list a number of parameters using “nzsystem showregistry”. To find quickly if Netezza is up and running, use “nzsystem showstate”. However, using these commands require you to set “NZ_USER” and “NZ_PASSWORD” at unix level. You can do this with ‘export NZ_USER=username’ and ‘export NZ_PASSWORD=userpassword’, and then run this command. If not, you can specify login and password combo at command level. I do not prefer this way, as any users can do “ps -ef” to check what commands you are using, which also gives login and password you entered at shell command level.
Where are Netezza binaries stored
Two important bin locations for Netezza are: /nz/kit.x.y/sbin and /nz/kit.x.y/bin.
Some configurations are located in /nz/data directory.
How can we remove formatting with NZSQL
NZSQL by default shows text formatted. In cases in which we do not need white spaces, use “nzsql -A” option. For example, “nzsql -A -c “select count(*) from hertz.daily_bookings;” allows us to run SQL command direct without logging into nzsql interactively.
Is there a way to stop NZSQL command, if one of the SQL commands fail.
Yes. A similar feature exists in other databases too. In this case, ON_ERROR_STOP=true on nzsql command so that other commands do not get executed. For instance, we like to create a new table (CTAS) and later drop old table. If a new table creation fails, we certainly do not want to drop old table. In this case, this option is very useful.
Can we access data in other databases with the same NZSQL.
This depends on the version. Version 5 and upwards support selects against database.owner.table, provided that user has same login and password access. Inserts are allowed in the current database with data from other databases, not the other way (meaning, we cannot insert into another database from the database where we logged in).
How can we plan and corresponding CPP files.
We can just to ‘explain’ on a query to see how plan looks. At run time, plans are created under /nz/data/plans we will see corresponding plans generated during run time. Corresponding CPP code is located under /nz/data/cache/