filter(NULL IS NOT NULL)

“filter(NULL IS NOT NULL)” in explain plan usually means what CBO found a conflict between your WHERE-clause predicates and declarative constraints on the tables involved.

example –


SQL> alter session set statistics_level=all; 

drop table t1a; 
drop table t1b; 

create table t1a as select a.*, 'A' ind from dba_tables a where rownum <= 10; 
create table t1b as select a.*, 'B' ind from dba_tables a where rownum <= 10; 

alter table t1a add constraint t1a_ck check (ind='A') validate; 
alter table t1b add constraint t1b_ck check (ind='B') validate; 

create or replace view v1 as
select * from T1A 
union all 
select * from T1B 
;

select * from v1 where ind='A';

...

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last cost note')); 

PLAN_TABLE_OUTPUT
------------------------------------------------
SQL_ID	2vxaadnbduq8m, child number 0
-------------------------------------
select * from v1 where ind='A'

Plan hash value: 2156607360

----------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |  A-Time    | Buffers |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |	    |	   1 |	      |     4 (100)|	 10 |00:00:00.01 |       4 |
|   1 |  VIEW                | V1   |	   1 |	    1 |     4   (0)|	 10 |00:00:00.01 |       4 |
|   2 |   UNION-ALL          |	    |	   1 |	      |            |	 10 |00:00:00.01 |       4 |
|*  3 |    TABLE ACCESS FULL | T1A  |	   1 |	   10 |     3   (0)|	 10 |00:00:00.01 |       4 |
|*  4 |    FILTER            |	    |	   1 |	      |            |	  0 |00:00:00.01 |       0 |
|*  5 |     TABLE ACCESS FULL| T1B  |	   0 |	    1 |     3   (0)|	  0 |00:00:00.01 |       0 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("IND"='A')
   4 - filter(NULL IS NOT NULL)
   5 - filter("IND"='A')

And if constraints are validated, Oracle does not physically touch the Object at all (Starts=0 above).
Its a good example where constraints actually help performance, not hurt it.

In contrast, this is the plan if constraints are not validated:

SQL> alter table t1b modify constraint t1b_ck novalidate; 

SQL> select * from v1 where ind='A';
...

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last cost note')); 

PLAN_TABLE_OUTPUT
------------------------------------------------
SQL_ID	2vxaadnbduq8m, child number 0
-------------------------------------
select * from v1 where ind='A'

Plan hash value: 1278722009

---------------------------------------------------------------------------------------------------
| Id  | Operation	    | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time	| Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |	   |	  1 |	     |	   4 (100)|	10 |00:00:00.01 |       7 |
|   1 |  VIEW               | V1   |	  1 |	   1 |	   4   (0)|	10 |00:00:00.01 |       7 |
|   2 |   UNION-AL          |	   |	  1 |	     |	          |	10 |00:00:00.01 |       7 |
|*  3 |    TABLE ACCESS FULL| T1A  |	  1 |	  10 |	   3   (0)|	10 |00:00:00.01 |       4 |
|*  4 |    TABLE ACCESS FULL| T1B  |	  1 |	   1 |	   3   (0)|	 0 |00:00:00.01 |       3 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("IND"='A')
   4 - filter("IND"='A')

Starts=1 – Oracle accesses both tables, doing unnecessary IO.

Posted in Oracle, SQL | Tagged , , , | Leave a comment

Roll forward oracle database restored from storage snapshot

Most people use storage snapshots with Oracle DB these days.

Its easy to restore DB to discrete point of time, i.e. to snapshots time.

Here is a method of rolling forward the DB to any needed point of time using archive logs.

1. Present snapshot to the machine. Snapshot should have: DB files, control files, and redo log files.

2. Mount database:

SQL> startup mount;

3. Do instance recovery explicitly:

SQL> select status,checkpoint_change#,checkpoint_time, resetlogs_change#, resetlogs_time, count(*), fuzzy 
from v$datafile_header group by status,checkpoint_change#,checkpoint_time, resetlogs_change#, resetlogs_time, fuzzy;

STATUS  CHECKPOINT_CHANGE# CHECKPOINT_TIME      RESETLOGS_CHANGE# RESETLOGS_TIME         COUNT(*) FUZ
------- ------------------ -------------------- ----------------- -------------------- ---------- ---
ONLINE          1.2835E+13 06-jul-2016 05:30:30                 1 09-mar-2016 19:22:11         34 YES

SQL> recover database;
Media recovery complete.

SQL> select status,checkpoint_change#,checkpoint_time, resetlogs_change#, resetlogs_time, count(*), fuzzy 
from v$datafile_header group by status,checkpoint_change#,checkpoint_time, resetlogs_change#, resetlogs_time, fuzzy;

STATUS  CHECKPOINT_CHANGE# CHECKPOINT_TIME      RESETLOGS_CHANGE# RESETLOGS_TIME         COUNT(*) FUZ
------- ------------------ -------------------- ----------------- -------------------- ---------- ---
ONLINE          1.2835E+13 06-jul-2016 05:59:59                 1 09-mar-2016 19:22:11         34 NO

If you do not do step 2, you have a good chance of getting error below, no matter how many archive logs you apply:

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/opt/oracle/oradata/testdb/system01.dbf'

4. Convert CURRENT controlfile to BACKUP controlfile.


SQL> select CONTROLFILE_TYPE from V$DATABASE;

CONTROL
-------
CURRENT

SQL> alter database backup controlfile to '/tmp/controlfile.bkup' reuse;

Database altered.

SQL> shutdown immediate;
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
SQL> 
SQL> exit  

oracle@testbb1:~$ cp /tmp/controlfile.bkup /opt/oracle/oradata/testdb/control01.ctl
oracle@testbb1:~$ cp /tmp/controlfile.bkup /opt/oracle/oradata/testdb/control02.ctl

oracle@testbb1:~$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Thu Jul 7 17:54:03 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup mount
ORACLE instance started.

Total System Global Area 2.1475E+10 bytes
Fixed Size                  5284640 bytes
Variable Size            3154124000 bytes
Database Buffers         1.8254E+10 bytes
Redo Buffers               61816832 bytes
Database mounted.
SQL> 
SQL> 
SQL> 
SQL> select CONTROLFILE_TYPE from V$DATABASE;

CONTROL
-------
BACKUP

5. Roll Forward, Recover database applying needed archivelogs. (which are presumably available)

SQL> recover database using backup controlfile until time '2016-07-06-08:00:00';

ORA-00279: change 12835325017832 generated at 07/06/2016 05:59:59 needed for thread 1
ORA-00289: suggestion : /opt/oracle/product/12.1.0.2/dbs/arch1_10330_906060131.dbf
ORA-00280: change 12835325017832 for thread 1 is in sequence #10330

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/dump/testdb_arch/1_10330_906060131.arc
ORA-00279: change 12835326223879 generated at 07/06/2016 06:15:09 needed for thread 1
ORA-00289: suggestion : /opt/oracle/product/12.1.0.2/dbs/arch1_10331_906060131.dbf
ORA-00280: change 12835326223879 for thread 1 is in sequence #10331
ORA-00278: log file '/dump/testdb_arch/1_10330_906060131.arc' no longer needed for this recovery

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/dump/testdb_arch/1_10331_906060131.arc
....

cancel
Media recovery cancelled.
SQL> 

SQL> select TYPE,TIMESTAMP,COMMENTS from v$recovery_progress;

TYPE               TIMESTAMP              COMMENTS            
---------------    --------------------   --------------------
Media Recovery     06-jul-2016 06:18:11   SCN: 12835326229999 

SQL> alter database open resetlogs;

Database altered.

It should do it!

p.s. There are several potential problems related to step 4. If it does not work out, here is alternative for step 4: Try to dump control file to trace:

SQL> alter database backup controlfile to trace as  '/tmp/cnt_trace2.ora';

And re-create control file from cnt_trace2.ora using “NORESETLOGS case”.

Another Alternative to step 4: Re-create control file as standby, and it will also let you roll forward.

Posted in Oracle, Systems | Tagged , , , , , , , , , , , , , | Leave a comment

Junos Pulse VPN client on Linux. Two phase auth. 64bit. How to make it all work.

There are several problem common problems with Juniper SSL VPN on support of Linux clients:

  1. Most 64-bit platforms are not supported.
  2. Junos Pulse SSL VPN does not support Linux at all.

Best way to cover problem “1” is: http://ubuntuforums.org/showthread.php?p=11189826#post11189826 There is also well known mad-scientist way but its more complex.

Problem “2” is a bit more involved.

In my case when I login to Juniper SSL web interface it run host-checker and as Linux is not supported the web form does not have way to launch Junos Pulse SSL VPN client. In fact that client does not exists for Linux.
Good news are: old Juniper “Network Connect” client is compatible with Junos Pulse and can be used on Linux client.

So you can still lunch “Network Connect” client, separately from your web session.
If on login you only asked user name and password (either AD password or SecurID), you lucky and you can easy launch “Network Connect” as described in several sources, for example:  http://serverfault.com/questions/363061/how-to-connect-to-a-juniper-vpn-from-linux

But if your VPN server use two phase auth scheme (SecurID + AD password), it can get more involved. Scott has GREAT blog on how to make it work: http://makefile.com/.plan/2009/10/juniper-vpn-64-bit-linux-an-unsolved-mystery
Only thing – you have to copy DSID cookie manually and paste it as argument in ncui call.  There are several projects to automate that, for example: https://code.google.com/p/juniper-vpn/ or http://techblog.constantcontact.com/software-development/juniper-networks-network-connect-securid-and-64-bit-linux/

Both are py scripts to simulate browser behaviour. Great stuff, But it did not work for me. I believe its because before I can get to login page I have one more “Legal” page where “Agree” has to be clicked. So I decided to actually use browser for login, and just have an automated way to pass DSID cookie to the script to launch ncui. Below are steps for that. –

  1. Get network connect client + ncui executable (mostly from Scott’s blog):
    • Login to your VPN https://some.company.com/dana-na/auth/url_default/welcome.cgi
    • Get Network Connect client at: https://some.company.com/dana-cached/nc/ncLinuxApp.jar
    • Install Network Connect client, gcc ncui executable, and get cert file:
      sudo apt-get install gcc-multilib
      mkdir ~/.juniper_networks/network_connect
      cd ~/.juniper_networks/network_connect
      mv ~/Downloads/ncLinuxApp.jar .
      jar -xf ncLinuxApp.jar
      sudo gcc -m32 -Wl,-rpath,`pwd` -o ncui libncui.so
      sudo chown root:root ncui
      sudo chmod 4755 ncui
      echo | openssl s_client -connect some.company.com:443 2>&1 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' | openssl x509 -outform der > ssl.crt
      
  2. Get DSID cookie in persistent form, and Plug it to ncui.
    Problem with DSID cookie – its Session Level, so it does not persist on disk, where you can grep it from. So had to use Greasemonkey plugin for firefox to capture DSID and save to HTML5 storage.
    Install Greasemonkey plugin https://addons.mozilla.org/en-US/firefox/addon/greasemonkey/ and make sure its enabled.

    • Create Greasemonkey js script (replace some.company.com) to dump DSID to webappsstore.sqlite file:
      mkdir ~/Greasemonkey/
      cat >> ~/Greasemonkey/jsessionid.user.js <<EOF
      // ==UserScript==
      // @name           DSID to localStorage
      // @namespace      name@company.com
      // @description    Saves the DSID cookie in localStorage
      // @include        https://some.company.com/dana-na/*
      // ==/UserScript==
      (function() {
          var sessid = document.cookie.match(/DSID=([^;]+)/);
          if (sessid) {
              var oldSID = localStorage.DSID;
              if (sessid != oldSID) localStorage.DSID = sessid[1];
          }
      })();
      EOF
      
    • In Firefox: Menu -> File -> Open File -> Choose “jsessionid.user.js”, and enable the script in the pop up window.
    • Install sqlite:
      mkdir ~/bin/
      cd ~/bin/
      wget http://www.sqlite.org/2014/sqlite-shell-linux-x86-3080500.zip
      unzip sqlite-shell-linux-x86-*.zip
      
    • Create script to get DCID value from webappsstore.sqlite and launch ncui (replace some.company.com):
      cat > ~/bin/juniper_vpn.sh <<"EOF1"
      
      sudo touch  /etc/jnpr-nc-resolv.conf
      sudo touch  /etc/jnpr-nc-hosts.bak
      
      dsid1=`~/bin/sqlite3 ~/.mozilla/firefox/*.default/webappsstore.sqlite <<EOF | grep "https:443|DSID" | awk -F"|" '{print $3}'
      select * from webappsstore2;
      EOF`
      echo dsid1=$dsid1
      
      cd ~/.juniper_networks/network_connect/
      ./ncui -h some.company.com -c DSID=$dsid1 -f ssl.crt
      
      EOF1
      
      chmod 700 ~/bin/juniper_vpn.sh
      
    • Run the script, it will prompt for the AD password. (Ctrl+C to end vpn connection)
    • In another window check if VPN interface is active:
      ifconfig | grep tun
      

    And if so – You are on VPN!.. Time to get to real work:)

    p.s. And use sun java.- Open jdk has old known problems with juniper vpn.

Posted in Linux, Systems, Ubuntu | Tagged , , , , , , , , , , , , | 9 Comments

Lost control file, So what?

Recently got involved in couple conversations regarding common “RULE OF THUMB” about backing up control file, and double backup of control file backup etc..

I guess its good thing to do, its does not take much space anyway, but if you lost it or control file got corrupted, and do not have backup, how big deal is it?

If you remember your database character set and have access to filesystem/ASM to find where datafiles and redo logs are – not really big deal.

STARTUP NOMOUNT;

CREATE CONTROLFILE REUSE DATABASE "DBNAME" NORESETLOGS ARCHIVELOG
 MAXLOGFILES 16
 MAXLOGMEMBERS 3
 MAXDATAFILES 1000
 MAXINSTANCES 8
 MAXLOGHISTORY 292
LOGFILE
 GROUP 1 '/path/to/redo/file' SIZE 512M BLOCKSIZE 512, ...
DATAFILE
 '/path/to/data/file',...
CHARACTER SET <DB character set>

RECOVER DATABASE;

ALTER DATABASE OPEN;

No need to even RESETLOGS if redo logs are still available, so – ZERO data loss!

Some rman metadata could be lost, but most of it could be registered back by:

RMAN> CATALOG START WITH '/directory/where/backups/are/'

Same thing for registering back archive logs:

RMAN> CATALOG START WITH '/directory/where/archlogs/are'

Used it couple times, worked as a charm.

The only restrictions: all files mentioned “create control file” statement has to exists, as oracle probe their headers to get SCN and other metadata. If that is not met, better bet on restore of control file from binary backup.

Posted in Oracle, SQL | Tagged , , , , , , , | Leave a comment

ORA-01555 on standby

If you see “ORA-01555: snapshot too old” from Select what is been run on Physical standby, but same select runs on primary just fine, and all relevant init.ora parameters and undo tablespace are exactly identical between primary and standby,- its odd.  But its easy to explain.

First check if DB is below 11.2.0.3 DB it could be bug 10320455.

If not – it most probably just undo retention problem.

On Primary instance Increase undo_retention to value (in seconds) higher than time query runs, and it should fix the problem. (You can increase the parameter on standby too, but it really take effect on primary only)

Why with same undo_retention problem does not show up on primary?

There is underscore parameter _undo_autotune. With it been set to TRUE (default in 11g) Oracle would track oldest running query, and will try to keep undo from time of start of that oldest query, even if its older than undo_retention. If undo space affords.  So it basically overwrites undo_retention and can be much higher than undo_retention. The overwritten/actual undo retention can be seen in tuned_undoretention column in v$undostat.

But primary DB have no vision on what queries are running on standby, so even if long running query running on standby – undo autotune does not kick in, and real undo retention on primay (and so standby) equal to what is set in undo_retention parameter.

So setting undo_retention on Primary to higher value would fix the problem. And note it had to be set on primary. For standby DB undo tablespace is Read-Only so it has no control on it.

Setting undo_retention higher should not hurt much. It can grow your tablespace to what autoextend allows, but if it hits the limit and more undo been generated what is really needed by uncommitted transactions, it will purge old undo what is only needed by selects. It is soft limit.

Posted in Oracle, SQL | Tagged , , , , , | 4 Comments

VIP is not reachable after failover

If VIP was failed over from one host in subnet to another via manually running “ifconfig down” on one host and “ifconfig up” on another, you can see case when VIP become unreachable from outside of subnet. the problem should clear. but it can take up to several hours, depending on the platform.
If you have similar symptoms run on machines where VIP failed over to:

[root@linuxtest ~]# arping -U your_vip_ip

It will send ARP response, Gratuitous ARP, aka Unsolicited ARP. What would broadcast new MAC for the VIP and so update ARP caches on machines in same subnet, including router what routes traffic to the subnet from external world.
If not done, ARP caches will be updated but it would take time:
– for some cisco devices – 4 hours,
– for brocade – default is 10 minutes,
– for Linux – default is 60 seconds,
– for Windows – up to 10 minutes.

So cleaning arp cache would help for new connections to avoid waiting for arp timeout.
Also it will help for clients which had existing TCP connection before failover: they will immediately get a reset packet sent. This results in clients getting errors immediately rather than waiting for the TCP timeout (up to 15 minutes, depending on platform).

Btw, Ability to send Unsolicited ARP is the only reason Oracle RAC need VIPs.

Posted in Linux, Systems | Tagged , , , , , | Leave a comment

High CPU usage on DB host?

Its Unix 101 statement, but I have heard it wrong so many time, so decided to put a blog for it.

So:
If in OLTP environment, per ‘top’ %wa (IOWAIT) is major contributor for CPU busy, adding CPU would not help. There is no need for more CPU. !Period!
%wa IS PERCENTAGE WHAT COUNT TO CPU IDLE ******************

Simple test.

Push some IO:

user1@myhost:~$ dd if=/dev/zero of=/tmp/file1 conv=notrunc  bs=1000 count=3000000 &
[1] 31240
user1@myhost:~$ dd if=/dev/zero of=/tmp/file2 conv=notrunc  bs=1000 count=3000000 &
[2] 31241
user1@myhost:~$ dd if=/dev/zero of=/tmp/file3 conv=notrunc  bs=1000 count=3000000 &
[3] 31242
user1@myhost:~$ dd if=/dev/zero of=/tmp/file4 conv=notrunc  bs=1000 count=3000000 &
[4] 31243

top looks:

user1@myhost:~$ top -b -i

top - 23:05:42 up  8:37, 12 users,  load average: 4.36, 3.91, 6.28
Tasks: 239 total,   5 running, 230 sleeping,   0 stopped,   4 zombie
Cpu(s):  3.1%us, 20.5%sy,  0.0%ni, 12.9%id, 63.3%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   4080460k total,  3809420k used,   271040k free,     1580k buffers
Swap:  4145148k total,   104240k used,  4040908k free,  1824928k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
31240 amoseyev  20   0  4376  588  496 D    2  0.0   0:12.29 dd
31241 amoseyev  20   0  4376  588  500 D    2  0.0   0:12.32 dd
31242 amoseyev  20   0  4376  592  500 D    2  0.0   0:12.38 dd
31243 amoseyev  20   0  4376  592  500 D    1  0.0   0:11.50 dd

%wa is high.

iostat consistently giving write performance about 44MB/sec:

user1@myhost:~$ iostat 1 1000

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.76    0.00   12.09   50.13    0.00   36.02

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda            2111.00      8352.00     45668.00       8352      45668

As shown above, dd process what causes I/O load almost always in “D” state, what is “uninterruptible sleep”.
“uninterruptible” comes from fact what they cant be killed, as process is in kernel mode (IO call has to be done in kernel mode).
Its uninterruptible but still SLEEP. Its idle process. It does not block CPU. If any other thread would require CPU (either for number cranching or for another I/O call), schedule would put it on CPU while dd is in sleep.
But if no other CPU load is available, top counts next idle CPU cycle as %wa.

Now push some real CPU load:

user1@myhost:~$ cat /dev/urandom  > /dev/null &
[1] 31224
user1@myhost:~$ cat /dev/urandom  > /dev/null &
[2] 31225
user1@myhost:~$ cat /dev/urandom  > /dev/null &
[3] 31229
user1@myhost:~$ cat /dev/urandom  > /dev/null &
[4] 31231

user1@myhost:~$
user1@myhost:~$ top -b -i

top - 23:19:16 up  8:50, 12 users,  load average: 7.84, 7.15, 7.10
Tasks: 239 total,   6 running, 229 sleeping,   0 stopped,   4 zombie
Cpu(s):  0.8%us, 98.1%sy,  0.0%ni,  0.0%id,  0.5%wa,  0.0%hi,  0.6%si,  0.0%st
Mem:   4080460k total,  3838860k used,   241600k free,     2168k buffers
Swap:  4145148k total,   104240k used,  4040908k free,  2264144k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
31279 amoseyev  20   0  4220  544  456 R   96  0.0   0:16.22 cat
31281 amoseyev  20   0  4220  544  456 R   87  0.0   0:13.96 cat
31280 amoseyev  20   0  4220  544  456 R   83  0.0   0:15.20 cat
31278 amoseyev  20   0  4220  540  456 R   80  0.0   0:16.01 cat
31241 amoseyev  20   0  4376  588  500 D    2  0.0   0:14.29 dd
31242 amoseyev  20   0  4376  592  500 D    2  0.0   0:14.26 dd
31240 amoseyev  20   0  4376  588  496 D    1  0.0   0:14.17 dd
31243 amoseyev  20   0  4376  592  500 D    1  0.0   0:13.34 dd

%wa went to almost 0. %sy is close to 100%.
So when CPU spends its cycles on real load, it does not addup to %wa. And same time, I/O throughput did not change with CPU been 100% busy:

user1@myhost:~$ iostat sda 1 1000

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.50    0.00   98.50    0.00    0.00    0.00

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda            1415.00      5376.00     44500.00       5376      44500

What again proves what OLTP I/O does not need that much CPU.

The “D” process also add up to “load avarage”, so its its also not the best value to judge how busy CPU is.

All said above would be applicable for for OLTP, A lot of small I/Os. NAS, SAN. RAW/block devices, or file system. All of them.

If we are talking about some crazy 1+GB/sec full table scans in OLAP/dw world, CPU probably would be affected, especially if its NFS (and not direct NFS). But it will be mostly on %sy and %si (not %wa), as ethernet traffic would be handled thru soft interrupts, and with high throughput, its CPU intensive. Context switches may also add up to CPU on some platforms, where they still used for switching between user/kernel modes.

Posted in Linux, Oracle, Systems, Uncategorized | Tagged , , , , , , , , , , , | Leave a comment