Netezza: some internals

These commands are only for reference purpose. Their use may result in unstable system or even you may mess up. Please use extreme caution.

1) How to get to SPU or nodes: From host issue command similar to this one.

ssh root@spu0101
fruitloop (password)

You will see that Netezza nodes use a barebone UNIX implementation; for instance, no ‘more’ command, etc.

spu0101# mpstat -P ALL
Linux 2.6.31.6-nznps_spu10 (spu0101)    03/10/09

22:27:36     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
22:27:36     all    0.04    0.09    0.75    0.02    0.01    0.01    0.00   99.09   2185.10
22:27:36       0    0.05    0.08    0.65    0.02    0.01    0.01    0.00   99.19     13.18
22:27:36       1    0.06    0.08    0.81    0.02    0.01    0.01    0.00   99.02     13.18
22:27:36       2    0.06    0.09    0.87    0.02    0.01    0.01    0.00   98.94     13.18
22:27:36       3    0.04    0.08    0.81    0.02    0.01    0.01    0.00   99.04     13.18
22:27:36       4    0.04    0.09    0.58    0.02    0.01    0.01    0.00   99.25     13.18
22:27:36       5    0.07    0.09    0.81    0.02    0.01    0.01    0.00   98.99     13.18
22:27:36       6    0.03    0.08    0.68    0.02    0.01    0.01    0.00   99.18     13.18
22:27:36       7    0.02    0.09    0.77    0.02    0.01    0.01    0.00   99.08     13.18

2) Netezza relies heavily on IBM hardware. watch iostat (to check if anything is going on)

IBM blade server (http://www.redbooks.ibm.com/redpapers/pdfs/redp3582.pdf)
telnet 10.0.129.4
Login: XXXXXX
Password: ********
XXXXXX logged on

[Login and password are given in the IBM book]

Run ‘help’ from command prompt; note that this is not a shell. Statistics is probably the most useful command. Blade server is an interface to storage. You cannot reach from outside Netezza host setup.

Oracle to Netezza migration

Major steps involved in Oracle to Netezza migration are these:
1) Capture DDL, you may use “dbms_metadata.get_ddl” to capture all table definitions. Change varchar2 to varchar, does not really matter if char is more than 16 characters; stored the same way if more than 16 characters. Number is either numeric or float type. Timestamp can be either data or time. This is the most time consuming part and mostly done manually though tools are available, such as ErWin reverse engineering.

2) Most important for Netezza is distribution of data on SPU or disk. Choose distribution for each table, you may like to pick up only a handful fields for distribution of data for tables across the entire database. Recommended to use only one distribution key (field) per table, not multiple fields. For small tables, Netezza broadcasts to host where data gets sent to all SPU or nodes; this is similar to replicate in other appliances.

3) With steps (1) and (2) above, DDL are created on Netezza. Create a user and / or group and assign permissions.

4) You may like to NFS mount Netezza binaries on Oracle host (OR) Oracle file system on Netezza. I prefer the first one, if Oracle system folks do not object.
Mouting NFS from NZ host to Oracle:
Oracle to Netezza migration typically involves spooling data out as a flat text file and using nzload to seed data. 
A) These are contents of /etc/export after making changes for allowing Oracle system to access NFS mount. Last line is added for this purpose. IP address for illustration purpose: Oracle (10.11.12.13) & Netezza (10.10.11.11).
[root@NZ_host1 ~]# cat /etc/exports
# NPS setup — do not edit below this line — NFS
/nz/export      10.0.0.2/16(rw,async,insecure,anonuid=500,anongid=500)
# NPS setup — do not edit above this line — NFS
# For exporting as NFS mount on to Oracle
/nz/kit         10.11.12.13/255.255.255.255(sync,no_root_squash,rw)

B) Issue this command at Netezza host
[root@NZ_host1 ~]# exportfs -ra

C) Make a new directory on Oracle, for example: /home/oracle/NZ_bins

D) On Oracle system, do a mount command
[root@ora_dw1 /]# mount -t nfs 10.10.11.11:/nz/kit /home/oracle/NZ_bins

Once mounted, you should see binaries under bin directory (/home/oracle/NZ_bins). If you are trying to load using nzload, you may encounter a problem with password. You need to cache password on the local host first.

[oracle@ora_dw1 bin]$ nzpassword add -u admin -pw password -host 10.10.11.11
[oracle@ora_dw1 bin]$ nzpassword

Host User
————- —–
10.10.11.11 admin
[oracle@ora_dw1 bin]$ ./nzload -host 10.10.11.11 -u nzadmin1 -pw zzadmin1 -db dw_db1 -t day_time_t -delim ‘|’ -df /tmp/1

Above should load data into remote netezza database. If done, go to Netezza host and use nzsql to find number of rows inserted. Check on Oracle row count
5) NZ load can be streamed using UNIX named pipes. That is, first do ‘mkfifo /oraExport/fifo.tmp’ for example. Then, spool Oracle data and write to this file (spool /oraExport/fifo.tmp). Assuming Netezza binaries are available (see above steps), load directly on to Netezza host. Though these steps can be put in a batch file or shell command, due to nature of loads that they may not go perfectly if scripted.

More Netezza Questions and Answers (Part II)

This is an extension of my previous post (“Netezza Questions and Answers”).

What data type is most suited for zone maps
Zone maps are typically useful for integers, date and time, variations of this data type. Zone maps are useful for ordered that that are usually built into data that is loaded; for example, phone call logs. [Also check more on zonemaps in my original post below].

How are materialized views (MV) used in Netezza.
Similar to other databases, materialized views are defined against base tables. Just like in views, we can not insert, update or delete from MV. MV is automatically used by optimizer when appropriate. When base table data changes, MV gets automatically updated by Netezza. MV is based on a single base table, thus practically not as useful.
What are typical join types in Netezza
Mostly hash joins (note that Netezza may do hash join in memory or on the SPUs if memory is not sufficient for doing a hash join); in some cases sort merge or even some cases nested loops  are performed (did not really come across much; In Netezza lingo it is called either function based join or something similar, not sure about the correct term). Of course, cross or product joins are possible like in any databases.
 
What is equivalent of replication of a table in Netezza on all SPUs.
Imagine a situation in which a small table (us_states) is required for joining against a large table (creditcard_txn). Netezza may decide to read this table from all SPUs (remember, table data is spread on all SPUs) and “assemble” this table on host. This data is then broadcasted to all SPUs, resulting in a copy on each SPU. Note that an important db parameter “factrel_size_threshold” holds that triggering number; any table beyond these many number of rows is considered as a fact table. That is, if this parameter is set to 1 million, any table holding more than 1 million rows is considered by Netezza as a fact table and will not result in this replication or broadcast.
 
Primary goal of a table design is to distribute data evenly on all tables. Is it a good idea to choose multiple columns in Netezza so that data gets distributed evenly.
NO, unless all columns are used during a join process. Most likely, this results in a large amount redistribution of data during query execution.
 
From design point of view, do you forsee performance problem using order ID as an integer for one table and Order num as varchar for another table as distribution keys. Assuming that these two tables are often used for join, do you see any performance problem.
YES, we will encounter performance problem as Netezza redistributes integer to varchar type and recreates the first table. This can be avoided using both tables distributed on order number as either integer or varchar.
 
What is a snippet
Snippet is a small block of database operation, typically three to four operations, that are carried out on all SPUs where data is location. If a query results in these snippets: Snippet A, Snippet B, Snippet C, ….Snippet X; they are carried out in a sequential manner.
 
List which options are prioritized when join operation is required.
Netezza evaluates joins in this order of preference:
1) Colocated joins: All data for joins are located on the same SPU.
2) Redistribute: All required data is not located in the same SPU, send data to corresponding SPU where driving table data is located.
3) Broadcast: Mentioned as replication above; Send all data from SPU to host which collates all that data, sends it to all SPUs. Each SPU has entire table data. That is, if Netezza machine has 32 SPU, there will be 32 physical tables, one one each SPU.
Coming to joins, typically Netezza prefers in this order:
A) Hash join in memory
B) Hash join on disk
C) Sort Merge join
D) Nested loops
E) Cross join
Oracle preference closely resembles the same or similar order.
 
How can I look into some system parameters
 nzsystem showRegistry: Command for looking into system specific information
/nz/data/postgresql.conf: To check NZ db parameters; we can change this file OR use set command at SQL.
 
Why integer data type is preferred in Netezza.
A couple of reasons:
1) Better joins, thus effecient.
2) Netezza compress works only for integer type of data, not with varchar or date.
3) Zonemaps are based on integer data types.
 

How can we find log (SQL) activity for a day.
We can find this under /nz/kit.x.y/log/postgres/pg.log file. Older files are named as pg.log.N (Where N starts from 1, after pg.log file this is the latest file). Assuming that we are looking for a week day within pg.log, we may run
$ cat pg.log | sed -n “/2010-02-01 00/,/2010-02-05 23:59/p” > pg.firstweekFeb2010.log
If this produces no data, look for corresponding log file based on the last update timestamp (ls -ltr sorts them in reverse time stemp order).

What are the ways to get data into Netezza. What happens if inserts are interrupted, how Netezza handles commits.
Please see my previous posting first. Here is a short list: load using nzload, SQL inserts (very slow), create table as command or inserts from other tables, and external tables. When insert is interrupted, all rows that are inserted are already committed unless we use transaction command.
How do you nzsystem command
Most Netezza commands come with a variety of sub commands. For nzsystem, we can list a number of parameters using “nzsystem showregistry”. To find quickly if Netezza is up and running, use “nzsystem showstate”. However, using these commands require you to set “NZ_USER” and “NZ_PASSWORD” at unix level. You can do this with ‘export NZ_USER=username’ and ‘export NZ_PASSWORD=userpassword’, and then run this command. If not, you can specify login and password combo at command level. I do not prefer this way, as any users can do “ps -ef” to check what commands you are using, which also gives login and password you entered at shell command level.
 
Where are Netezza binaries stored

Two important bin locations for Netezza are: /nz/kit.x.y/sbin and /nz/kit.x.y/bin.
Some configurations are located in /nz/data directory.

How can we remove formatting with NZSQL
NZSQL by default shows text formatted. In cases in which we do not need white spaces, use “nzsql -A” option. For example, “nzsql -A -c “select count(*) from hertz.daily_bookings;” allows us to run SQL command direct without logging into nzsql interactively.

Is there a way to stop NZSQL command, if one of the SQL commands fail.
Yes. A similar feature exists in other databases too. In this case, ON_ERROR_STOP=true on nzsql command so that other commands do not get executed. For instance, we like to create a new table (CTAS) and later drop old table. If a new table creation fails, we certainly do not want to drop old table. In this case, this option is very useful.
Can we access data in other databases with the same NZSQL.
This depends on the version. Version 5 and upwards support selects against database.owner.table, provided that user has same login and password access. Inserts are allowed in the current database with data from other databases, not the other way (meaning, we cannot insert into another database from the database where we logged in).

How can we plan and corresponding CPP files.
We can just to ‘explain’ on a query to see how plan looks. At run time, plans are created under /nz/data/plans we will see corresponding plans generated during run time. Corresponding CPP code is located under /nz/data/cache/

Netezza Architecture: Part 1 (Process)

Here is a simple view of Netezza process from client connection perspective. I am attaching a PDF that explains these processes briefly.

Netezza Process (Only selected=

Please click link to open a PDF doc; this doc shows Netezza process architecture and process hierarchy: Netezza process interaction

Follow

Get every new post delivered to your Inbox.