PART 1: Install & configure SQL client for Netezza in 3 minutes

Squirrel is a popular freeware that you can download from http://squirrel-sql.sourceforge.net/#installation. I chose “squirrel-sql-3.1.2-install.jar” as the latest one has incorrect manifest file and thus does not install correctly. I used ‘Overview of all available downloads’ link from Squirrel source forge site and selected this 3.1.2 version.

1) Save it in a new folder (C:\Sqrl312). Double click this file (or use ‘java -jar ‘ option from command line) to go through dialog box driven installation process. You may select all drivers, though this has no effect on Netezza configuration.

2) Download Netezza JDBC driver and install. This also goes through dialog box (C:\nz1).

3) We need to configure Squirrel for Netezza. This involves two major steps:

a) Fire up Squirrel batch file, located under C:\Sqr1312. Name of the file: squirrel-sql.bat (OR) use windows short cut to launch it. Click on ‘Drivers’ tab on the left most frame.

Click on “+” icon to open a dialog box. We need to manually load Netezza JDBC driver. To do that part, click on “Extra Class Path” tab and click ‘Add’

Choose Netezza directory where JDBC is located and choose that jar file for loading in; in this case, it is under C:\nz1 directory.

I filled dialog box; details obscure for obvious reasons. URL follow syntax: jdbc:netezza://<host>:<port>/<dbname>. Netezza default port:5480, same as default Postgres.

Make sure to click ‘List Drivers’ for the selected Netezza jar file.

Dialog box looks this way after completion. Click ‘OK’ after all details are populated and that closes this dialog box.

b) Second step is easy as this involves creation of an alias. Click on the tab just above ‘Drivers’, see the first picture for easy reference. Press “+” sign from this frame and that gives another dialog box. Give a name, such as ‘NZ_prod_1’ and choose Netezza driver just configured (“NZ_db_1”), give user name and password to test this connection.

If all went well, you will see:

If this is successful, you can connect direct to database and start querying.

This is the first part in three part series. Rest of them include a few details on features and Netezza related topic.

Shortcut to phenomenal Oracle 11g performance (?)

As you know, Oracle provides way too many parameters, pretty much infinite combinations if you consider all possibilities of normal parameters and hidden ones. No one probably considers more than a handful to see how Oracle delivers results. After grappling with various settings and tuning tricks, I tried the following parameters. Each of them, based on the query, reduced response from days (yes, days) to seconds of response time. My environment is a mid sized DW database close to 5TB data. Good thing about them is that we can change these parameters and check results at session level:   

 
 
 
 

  • alter session set “_gby_hash_aggregation_enabled” = FALSE;

(Hash group by aggregation is not chosen in this case). 

  • alter session set optimizer_features_enable=’10.2.0.4′;

(To see acceptable values, give a wrong version like ’1.1′ and Oracle will suggest version numbers). 

  • alter session set “_query_rewrite_fudge” = 75;

(This favors materialized view query rewrite; of course, we can also give a hint to use materialized view, that is manually choosing which MV to specify. Whereas a general hint will favor appropriate MV, if available). 

  • alter session set optimizer_cost_index_cost_adj=50;

(Favors use of indexes as opposed to full table scanning. In this case, use of indexes is preferred twice the usual cost of index usage). 

  • alter session set “_optimizer_cost_based_transformation”=off;

(Disables subquery view, in all cases that I came across resulted in severe degradation of performance). 

Switching with these parameters is just a matter of seconds and we can quickly assess impact by either running the query itself or by pulling explain plan. I prefer this way of getting explain plan (run these commands at SQL prompt): 

  • explain plan for your_sql_query;
  • select * from table(dbms_xplan.display); 

 

Make sure to switch back to previous parameter settings. Most tools also allow before and after switching of parameters at a session level. I never used more than one parameter switching for any troubling query. As you guessed, there is no one parameter that worked for all queries; it all depends on the plan. 

Another cool feature that I experimented with in Oracle 11g is SPM (SQL Plan Management). More in on this in my next blog item.

Exadata2: Continued

I was going through key note presentation given by Larry on Exadata2 at Open world 2009. It looks to me that this product will not be as successful, not because of technology rather the way Oracle is suggesting. Larry recommends Exadata2 as a complete database solution for mission critical applications for both OLTP and DW. Assuming both these databases are located on the same rack, here are my observations:

* Companies ranging from a couple of hundred employees to very large organizations, OLTP and DW departments are separate. Even if they are located on same rack, sometimes it is difficult to get source data from OLTP systems by DW groups immediately.

* OLTP source systems are in constant change due to changing requirements, these results in mismatches in DW loads.

* Typically DW systems purge data and that will cause problems in OLTP, if both are implemented on same rack.

* DW systems sometimes acquire an exclusive lock at table level and that certainly introduces problem with OLTP.

* Personnel report to different senior management for OLTP and DW, and their stakes are different.

* Even if two databases are implemented on the same hardware, there are still issues such as one group claiming the other group is causing problems.

* Netezza recommends less than a dba to manage appliance. Exadata2 certainly requires at least two dba to manage. Netezza makes use of FPGA and zone maps, not sure how this is comparable to Oracle technology. Larry mentioned about random I/O, since exadata may hold all data in memory, how useful is this for DW reads or analytical processing.

* No mention of learning curve (if any) for new dba.

* Larry claimed that Exadata2 is 16 times faster than IBM’s fastest and IBM protested by saying that Exadata2 is only 6 times faster. Where are stats & graphs.

Exadata2

Recently Oracle announced a number database releases, including Oracle Exadata2. Apparently, it can compress up to 10 times DW data. This means, up to 50 tera bytes can fit in flash memory at storage side. OLTP compression is about 3 times. I am not sure how extensively their result sets were, I thought compression beyond 5 is difficult to achieve. Here is an architecture of a single rack exadata.

Exadata2 architecture

Exadata2 architecture

As I pointed out earlier, there are no big customers using Exadata. I wonder why this was not sold (or given for free) to a high profile customer and get them present their findings.

Oracle is trying to sell Exadata as *the* database, supporting both OLTP and DW. This will not fly. In many medium to large companies, OLTP and OLAP departments live in different worlds. We can not even get OLTP folks standardize on data definitions and source data. Try to get them use a single database for all their needs?

Data backup: Use paper

Wondering if we will ever use this type of backup.

Just imagine a customer goes to a store or an office, she is given a printout (of something like an invoice) and that paper also includes a “paper backup” of her record. Her data is no longer available on store or office computers any more. She is requested to bring back same invoice to scan data back, in case, she likes to resolve or dispute a certain entry. Since this paper back up is customized to their environment, it is naturally ‘encrypted’. This paper backup is available free.

SQL generation: Interpreted vs Compiled

In many database systems, SQL code is turned into an intermediary code (call it ‘p code’), this gets fed into run time engine. That is, running of SQL on these DBMS systems is a two (major) step process: Turn code  into p code, run at run time engine. This is similar to Java virtual machine, concept wise. This way, DBMS code is more portable. Think of Oracle, all (in theory) we need to change is run time engine for each platform. Same p code generation.

Netezza has a different approach, it produces executable code for SQL. Netezza runs on a specific platform, a propertiary architecture. There is a huge performance advantage if complied (executable) code is generated directly. On the other hand, this relative time may be insignificant on traditional databases such as Oracle, where lots of time is spent on I/O or even database waits.

I particularly like study done with MonetDB. There is quite an exhaustive study done by these folks on compiled versus interpreted SQL. There is about 10 times improvement for compiled code and thus preferred way. Looks like Netezza and Vertica (based on research results from MonetDB) is following this path.

http://monetdb.cwi.nl/projects/monetdb//SQL/Benchmark/TPCH/

Follow

Get every new post delivered to your Inbox.