Netezza architecture

IBM published Netezza architecture on their red book site. Nothing different, IBM talks about “FAST”, possibly improved architecture and / or algorithms.

Netezza architecture from IBM

Netezza models: New branding

Looks like Netezza will be branded as

  • Skimmer
  • TwinFin
  • Cruiser

Look at the attached picture, I do not know complete technical details yet.

New Netezza branding by IBM

Netezza Query Monitoring

Netezza query monitoring Netezza twinfin uses a little different architecture compared to old Mustang.

Each node maps to eight disks with 1 TB on each disk on TwinFin 12 system. That is, if we have a TwinFin 12, total capacity is 12 x 8 = 96TB. Here are steps that I used for monitoring purpose; I may update these findings as I find more info:

1) Once a SQL is submitted, corresponding plan is generated at /nz/data/plans/nnnn. Last four digits correspond to the plan that Netezza generates. Check how many steps that plan has by just opening the plan file. As mentioned in my previous posting, each snippet consists of a number and ending with 5xx number. If a snippet is 3, this is a sub section within that plan starting with 3 [01] and ending with 503. As you can recall, entire snippet work is done in parallel. Once this snippet is done, Netezza moves to the next snippet and so on, till completion.

2) Just like in other data appliances, Netezza typically creates a temp table involving two tables that are joined. This table is again joined with the next one, till all of them are done. This mostly involves hash joins.

3) Smaller tables involve replication or broadcast across all SPU. Larger tables involve use of sampling of data. Note that when Netezza data is loaded, it involves a quick statistics update known as express stats.

4) Assuming that we have only one query is running, you can easily evaluate SPU usage with this script at Netezza host:

/nz/kit/bin/nzstats -type spuPartition | grep -i swap

If more than one query is running, you should still get some idea on what is happening. If a SPA is getting filled up quickly compared to others indicate either data distribution problem or data that you are selecting is not evenly spread out on all SPU / SPA.

5) Check GRA and monitor, you can do this through Netezza windows tool.

Checking Netezza SPU swap partition error

You may come across this error due to data distribution. This error is displayed as:

SPU swap partition : Disk temporary work space is full

This is displayed both to the client and in the logs (pg.log or pg.log.x). A couple of simple steps are given below:

1)      If your query consists of sub queries, run each of them. I prefer that you create a table for each sub query section. That is, if your sub query has “select name, make ….from table A order by 1, 2”, create a table using the same SQL with ‘create table t_1 as ‘ followed by that SQL. If this table is created, check distribution.

2)      You can check data distribution is a couple of ways; easy way is to use Netezza administrator. Right click on that table from database tab and choose ‘record distribution’. If data is distributed evenly across all SPU, you should see vertical bars correspondingly.

3)      If step (1) creates a SPU swap partition error, this requires further investigation. This query most likely available under ‘Query history’ under session branch in Netezza administrator. Double click that query and it shows summary.

 

Likely, this points to a problem. In my case, no rows returned and only 5 snippets completed out of expected 7.

(4) If all sub queries and SQL code blocks are completed without a problem when used with ‘create table as’, probably the final SQL block is where you need to check.

Netezza Explain Plan

 
Each query that is submitted to Netezza results in a corresponding query plan. These plans are kept under /nz/data/plans and the archived ones are located under /nz/kit/log/planshist on the Netezza host. Archived files are kept up to a certain number and are purged; something like 5000 files. To save them on a permanent basis, we need to run a scheduler like a cron to move to a directory.

Query plan consists of snippets, beginning with a number ‘n’ (such as 1, 2 or 3, based on how many snippets a code consists of) and ends with 500+n.
 
Illustration (“conf” on the first line is artificially introduced by me):
         1[00]: spu ScanNode table 10001 memoryMode=yes flags=0×4 index=0 cost=24 conf=85
         1[01]: spu ProjectNode, 6 cols, projectFlags=0×0   
                    0: 1: 2: 3: 4: 5:       
         1[02]: spu HashNode table 0 flags=1 numFields=6: 0 0 4 5 6 0        
         1[03]: spu HashHistNode table 10001    
         1[04]: spu TableSizeNode table 10001    
         1[05]: spu ReturnNode     
       501[00]: dbs TableSizeNode table 10001    

 
As can be seen above, this code snippet starts with “1[00]” and ends with “501[00]“. Note that all steps in between are done in parallel; meaning, 1[00], 1[01], 1[02], etc., is all executed in parallel. “spu” refers to code executed at SPU and “dbs” is done at node or host level.
 
All other snippet blocks are executed one after another until end; basically top to bottom, within a snippet code is executed in parallel and thus node does not wait for data as such, it gets streamed as soon as available. Note that snippet block is starts with “5XY”, assuming that we have less than one hundred blocks for snippets!
 
A couple of quick points:
1) All table scans are full table scans; amount of data that is returned to SPU memory is dependent on where filter and other conditions. This results in less amount of data.
2) conf in explain plan refers to confidence level. If we are choosing a year worth of data from a four year sales table, Netezza may assume that it is choosing 25% of data. In this case, confidence level can be anything between 100 to 25%, or even lower, based on data distribution. Basically, Netezza does some kind of sampling based on our where clause. If Netezza does not have required statistics, it may do a quick statistics sampling. This can be seen as a query on its own in Netezza administrator.
3) Look for “DownloadTableNode”, this means that data is getting redistributed on SPU. This is usually a bad thing to happen. Note that double redistribution of two tables is also possible. This means that both tables are relocated based on a common column on another SPU.
4) Netezza sorts data on SPU and brings back this data to host. It does a sort merge, rather than a hash join on returned data OR a nested loop. There is no point trying to do anyting other than sort merge. See my previous posting on “hidden” Netezza parameter which you can change at session level.
5) Just like in any database, time expected, cost and other factors are estimates.
6) Update of statistics is probably one of the most important tasks for dba. A table is recognized as a fact table based on Netezza settings; at nzsql, use ‘show FACTREL_SIZE_THRESHOLD;’ to see this value (OR) you can set it using ‘set FACTREL_SIZE_THRESHOLD = xxxxxx’, you can set it to different numbers.
7) Dimension tables, if small, are broadcasted first, before starting of the query. For fact tables, all statistics details are not generated as it is too expensive for Netezza to accomplish this; instead, Netezza does a dynamic sampling like in Oracle.
8) Each query with some complexity, results in this type of activities:
 a) Create hash for selected first table
 b) Create another hash for next table
 c) hash join both tables into another table on node
 d) hash another table data and hash join with (c) above
 e) repeat above steps until done
This creates a number of temporary tables and are dropped once they are no longer required.
 
Check my earlier posting on hidden Netezza parameters, you can set them at session level to check their impact.

Netezza data types and functions

Here is a list of data types along with associated info. Note that this is generated by “SQuirrel” tool using Netezza JDBC. These data types are likely applicable to both 5.x and 6.x.
TYPE_NAME DATA_TYPE PRECISION NULLABLE CASE_SENSITIVE MAXIMUM_SCALE
NVARCHAR -9 [NVARCHAR] 16000 true true 0
NCHAR -8 [ROWID] 16000 true true 0
BOOLEAN -7 [BIT] 1 true false 0
BYTEINT -6 [TINYINT] 4 true false 0
BIGINT -5 [BIGINT] 20 true false 0
CHAR 1 [CHAR] 64000 true true 0
NUMERIC 2 [NUMERIC] 38 true false 37
DECIMAL 3 [DECIMAL] 38 true false 37
INTEGER 4 [INTEGER] 11 true false 0
SMALLINT 5 [SMALLINT] 6 true false 0
FLOAT 6 [FLOAT] 7 true false 6
REAL 7 [REAL] 7 true false 14
DOUBLE 8 [DOUBLE] 15 true false 14
VARCHAR 12 [VARCHAR] 64000 true true 0
DATE 91 [DATE] 10 true false 0
TIME 92 [TIME] 15 true false 6
TIMESTAMP 93 [TIMESTAMP] 26 true false 6
INTERVAL YEAR 101 [UNKNOWN] 50 true false 0
INTERVAL MONTH 102 [UNKNOWN] 50 true false 0
INTERVAL DAY 103 [UNKNOWN] 50 true false 0
INTERVAL HOUR 104 [UNKNOWN] 50 true false 0
INTERVAL MINUTE 105 [UNKNOWN] 50 true false 0
INTERVAL SECOND 106 [UNKNOWN] 50 true false 0
INTERVAL YEAR TO MONTH 107 [UNKNOWN] 50 true false 0
INTERVAL DAY TO HOUR 108 [UNKNOWN] 50 true false 0
INTERVAL DAY TO MINUTE 109 [UNKNOWN] 50 true false 0
INTERVAL DAY TO SECOND 110 [UNKNOWN] 50 true false 0
INTERVAL HOUR TO MINUTE 111 [UNKNOWN] 50 true false 0
INTERVAL HOUR TO SECOND 112 [UNKNOWN] 50 true false 0
INTERVAL MINUTE TO SECOND 113 [UNKNOWN] 50 true false 0
TIMETZ 1266 [UNKNOWN] 21 true false 6
           
Supported refactoring: If Netezza allows or disallows database commands. For instance, we can not create an index.
           
Key Value        
supportsAddColumn FALSE        
supportsAlterColumnNull FALSE        
supportsAlterColumnType FALSE        
supportsCreateIndex FALSE        
supportsDropColumn FALSE        
supportsDropIndex FALSE        
supportsAddForeignKeyConstraint TRUE        
supportsAddPrimaryKey TRUE        
supportsAddUniqueConstraint TRUE        
supportsAlterColumnDefault TRUE        
supportsAlterSequence TRUE        
supportsCreateSequence TRUE        
supportsCreateTable TRUE        
supportsCreateView TRUE        
supportsDropConstraint TRUE        
supportsDropPrimaryKey TRUE        
supportsDropSequence TRUE        
supportsDropView TRUE        
supportsRenameColumn TRUE        
supportsRenameTable TRUE        
supportsRenameView TRUE        
           
           
Numeric Functions String Functions        
 acos  lcase        
 asin  length        
 atan  locate        
 atan2  ltrim        
 ceiling  rtrim        
 cos  substring        
 cot  ucase        
 degrees concat        
 exp          
 floor          
 log          
 log10          
 mod          
 pi          
 power          
 radians          
 rand          
 round          
 sign          
 sin          
 sqrt          
 tan          
 truncate          
abs          

 

PART 1: Install & configure SQL client for Netezza in 3 minutes

Squirrel is a popular freeware that you can download from http://squirrel-sql.sourceforge.net/#installation. I chose “squirrel-sql-3.1.2-install.jar” as the latest one has incorrect manifest file and thus does not install correctly. I used ‘Overview of all available downloads’ link from Squirrel source forge site and selected this 3.1.2 version.

1) Save it in a new folder (C:\Sqrl312). Double click this file (or use ‘java -jar ‘ option from command line) to go through dialog box driven installation process. You may select all drivers, though this has no effect on Netezza configuration.

2) Download Netezza JDBC driver and install. This also goes through dialog box (C:\nz1).

3) We need to configure Squirrel for Netezza. This involves two major steps:

a) Fire up Squirrel batch file, located under C:\Sqr1312. Name of the file: squirrel-sql.bat (OR) use windows short cut to launch it. Click on ‘Drivers’ tab on the left most frame.

Click on “+” icon to open a dialog box. We need to manually load Netezza JDBC driver. To do that part, click on “Extra Class Path” tab and click ‘Add’

Choose Netezza directory where JDBC is located and choose that jar file for loading in; in this case, it is under C:\nz1 directory.

I filled dialog box; details obscure for obvious reasons. URL follow syntax: jdbc:netezza://<host>:<port>/<dbname>. Netezza default port:5480, same as default Postgres.

Make sure to click ‘List Drivers’ for the selected Netezza jar file.

Dialog box looks this way after completion. Click ‘OK’ after all details are populated and that closes this dialog box.

b) Second step is easy as this involves creation of an alias. Click on the tab just above ‘Drivers’, see the first picture for easy reference. Press “+” sign from this frame and that gives another dialog box. Give a name, such as ‘NZ_prod_1’ and choose Netezza driver just configured (“NZ_db_1”), give user name and password to test this connection.

If all went well, you will see:

If this is successful, you can connect direct to database and start querying.

This is the first part in three part series. Rest of them include a few details on features and Netezza related topic.

Netezza: some internals

These commands are only for reference purpose. Their use may result in unstable system or even you may mess up. Please use extreme caution.

1) How to get to SPU or nodes: From host issue command similar to this one.

ssh root@spu0101
fruitloop (password)

You will see that Netezza nodes use a barebone UNIX implementation; for instance, no ‘more’ command, etc.

spu0101# mpstat -P ALL
Linux 2.6.31.6-nznps_spu10 (spu0101)    03/10/09

22:27:36     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
22:27:36     all    0.04    0.09    0.75    0.02    0.01    0.01    0.00   99.09   2185.10
22:27:36       0    0.05    0.08    0.65    0.02    0.01    0.01    0.00   99.19     13.18
22:27:36       1    0.06    0.08    0.81    0.02    0.01    0.01    0.00   99.02     13.18
22:27:36       2    0.06    0.09    0.87    0.02    0.01    0.01    0.00   98.94     13.18
22:27:36       3    0.04    0.08    0.81    0.02    0.01    0.01    0.00   99.04     13.18
22:27:36       4    0.04    0.09    0.58    0.02    0.01    0.01    0.00   99.25     13.18
22:27:36       5    0.07    0.09    0.81    0.02    0.01    0.01    0.00   98.99     13.18
22:27:36       6    0.03    0.08    0.68    0.02    0.01    0.01    0.00   99.18     13.18
22:27:36       7    0.02    0.09    0.77    0.02    0.01    0.01    0.00   99.08     13.18

2) Netezza relies heavily on IBM hardware. watch iostat (to check if anything is going on)

IBM blade server (http://www.redbooks.ibm.com/redpapers/pdfs/redp3582.pdf)
telnet 10.0.129.4
Login: XXXXXX
Password: ********
XXXXXX logged on

[Login and password are given in the IBM book]

Run ‘help’ from command prompt; note that this is not a shell. Statistics is probably the most useful command. Blade server is an interface to storage. You cannot reach from outside Netezza host setup.

Oracle to Netezza migration

Major steps involved in Oracle to Netezza migration are these:
1) Capture DDL, you may use “dbms_metadata.get_ddl” to capture all table definitions. Change varchar2 to varchar, does not really matter if char is more than 16 characters; stored the same way if more than 16 characters. Number is either numeric or float type. Timestamp can be either data or time. This is the most time consuming part and mostly done manually though tools are available, such as ErWin reverse engineering.

2) Most important for Netezza is distribution of data on SPU or disk. Choose distribution for each table, you may like to pick up only a handful fields for distribution of data for tables across the entire database. Recommended to use only one distribution key (field) per table, not multiple fields. For small tables, Netezza broadcasts to host where data gets sent to all SPU or nodes; this is similar to replicate in other appliances.

3) With steps (1) and (2) above, DDL are created on Netezza. Create a user and / or group and assign permissions.

4) You may like to NFS mount Netezza binaries on Oracle host (OR) Oracle file system on Netezza. I prefer the first one, if Oracle system folks do not object.
Mouting NFS from NZ host to Oracle:
Oracle to Netezza migration typically involves spooling data out as a flat text file and using nzload to seed data. 
A) These are contents of /etc/export after making changes for allowing Oracle system to access NFS mount. Last line is added for this purpose. IP address for illustration purpose: Oracle (10.11.12.13) & Netezza (10.10.11.11).
[root@NZ_host1 ~]# cat /etc/exports
# NPS setup — do not edit below this line — NFS
/nz/export      10.0.0.2/16(rw,async,insecure,anonuid=500,anongid=500)
# NPS setup — do not edit above this line — NFS
# For exporting as NFS mount on to Oracle
/nz/kit         10.11.12.13/255.255.255.255(sync,no_root_squash,rw)

B) Issue this command at Netezza host
[root@NZ_host1 ~]# exportfs -ra

C) Make a new directory on Oracle, for example: /home/oracle/NZ_bins

D) On Oracle system, do a mount command
[root@ora_dw1 /]# mount -t nfs 10.10.11.11:/nz/kit /home/oracle/NZ_bins

Once mounted, you should see binaries under bin directory (/home/oracle/NZ_bins). If you are trying to load using nzload, you may encounter a problem with password. You need to cache password on the local host first.

[oracle@ora_dw1 bin]$ nzpassword add -u admin -pw password -host 10.10.11.11
[oracle@ora_dw1 bin]$ nzpassword

Host User
————- —–
10.10.11.11 admin
[oracle@ora_dw1 bin]$ ./nzload -host 10.10.11.11 -u nzadmin1 -pw zzadmin1 -db dw_db1 -t day_time_t -delim ‘|’ -df /tmp/1

Above should load data into remote netezza database. If done, go to Netezza host and use nzsql to find number of rows inserted. Check on Oracle row count
5) NZ load can be streamed using UNIX named pipes. That is, first do ‘mkfifo /oraExport/fifo.tmp’ for example. Then, spool Oracle data and write to this file (spool /oraExport/fifo.tmp). Assuming Netezza binaries are available (see above steps), load directly on to Netezza host. Though these steps can be put in a batch file or shell command, due to nature of loads that they may not go perfectly if scripted.

Netezza TwinFin Architecture

Netezza TwinFin Hardware Configuration

Netezza TwinFin comes in 3 models: TwinFin 3, TwinFin 6 and TwinFin 12; this number indicates number of S-blades or processors. All models come with

  • Storage: 1 TB hard drives
  • Processor: two CPU, quad core (that is, TwinFin 3 is 3 x 2 CPU with quad core processors).
  • Disk storage: Processor-core factor in TB. For instance, for TwinFin 3 it is 24 TB (take above number, 3 x 2 x 4 core, take this in TB; in other words, each core gets a disk).
  • Netezza version: 5.0 onwards; 6.0 beta is already out as of beginning for 2010 Q1.

  

TwinFin is radically different from earlier models. Netezza implemented CPU, memory and storage along with an FPGA on previous models. TwinFin uses S-blades that includes CPU, memory and FPGA (a new term coined by Netezza: database accelerator card = FPGA + memory + IO interface); storage is separated and is located in a storage array.

S-blades and Storage

TwinFin 12 is a full rack containing all components. Top portion is occupied by storage array, followed by two hosts (Linux based hosts, configured for high availability so that if one host fails another takes over), KVM and an array of S-blades. Data between two hosts are replicated and are always synched up so that if one host fails, another can assume that role without any problem.

Disk space for TwinFin 12 is a total of 96 TB; one third of it is only used for storing data, another one third is for disk mirror and the other one third is for spool or temporary space for processing queries. Roughly, 32 TB is available on TwinFin 12 for storage of data.

 

 TwinFin layout and corresponding hardware on the right.

Follow

Get every new post delivered to your Inbox.