In this blog entry, we will explore the wonderful world of SCNs and how Oracle database uses SCN internally. We will also explore few new bugs and clarify few misconceptions about SCN itself.
What is SCN?
SCN (System Change Number) is a primary mechanism to maintain data consistency in Oracle database. SCN is used primarily in the following areas, of course, this is not a complete list:
- Every redo record has an SCN version of the redo record in the redo header (and redo records can have non-unique SCN). Given redo records from two threads (as in the case of RAC), Recovery will order them in SCN order, essentially maintaining a strict sequential order. As explained in my paper, every redo record has multiple change vectors too.
- Every data block also has block SCN (aka block version). In addition to that, a change vector in a redo record also has expected block SCN. This means that a change vector can be applied to one and only version of the block. Code checks if the target SCN in a change vector is matching with the block SCN before applying the redo record. If there is a mismatch, corruption errors are thrown.
- Read consistency also uses SCN. Every query has query environment which includes an SCN at the start of the query. A session can see the transactional changes only if that transaction commit SCN is lower then the query environment SCN.
- Commit. Every commit will generate SCN, aka commit SCN, that marks a transaction boundary. Group commits are possible too.
SCN is a huge number with two components to it: Base and wrap. Wrap is a 16 bit number and base is a 32 bit number. It is of the format wrap.base. When the base exceeds 4 billion, then the wrap is incremented by 1. Essentially, wrap counts the number of times base wrapped around 4 billion. Few simple SQL script will enumerate this better:
In the SQL statement below, we use dbms_flashback package call to get the current system change number, we also convert that number to hex format to breakdown the SCN.
col curscn format 99999999999999999999999 select to_char(dbms_flashback.get_system_change_number,'xxxxxxxxxxxxxxxxxxxxxx'), dbms_flashback.get_system_change_number curscn from dual; TO_CHAR(DBMS_FLASHBACK. CURSCN ----------------------- ------------------------ 280000371 10737419121
Here, hex value of the SCN is 0×280000371 and decimal format is 10737419121. Let’s review the hex value 0×280000371, this value can be split in to two components, better written as 0×2.80000371, where 0×2 is the wrap and 0×80000371 is the hex representation of base. To verify the base and wrap, we can put them back together to get the SCN value. Essentially, multiply wrap by 4 billion and add base to get the SCN in number format. Script shows the output and see that these two numbers are matching.
col n2 format 99999999999999999999999 select to_number(2,'xxxxxxx') * 4 * power(2,30) + to_number(80000371,'xxxxxxxxxxxxxxxxxxxxxx') n2 from dual N2 ------------------- 10737419121
If you continue the discussion logically, then maximum value of the wrap defines the maximum value of SCN i.e. maximum value of wrap*4 billion = 65536* 4* power(2,30) = 281,474,976,710,656 = 281 trillion values.
Does each change increment SCN?
Not necessarily. The SCN increment is not for every change. For example, in the script below, we will change the table 1000 times, but the generated SCN will be very few.
create table rs.dropme (n1 number , n2 number); test_case_scn.sql: --------------cut -------------- col curscn format 99999999999999999999999 select dbms_flashback.get_system_change_number curscn from dual; begin for i in 1 .. 1000 loop insert into rs.dropme values(i, i); end loop; end; / select dbms_flashback.get_system_change_number curscn from dual; ------------cut ----------------- alter system switch log file; SQL> @test_case_scn CURSCN ------------------------ 10737428262 PL/SQL procedure successfully completed. CURSCN ------------------------ 10737428271 SQL> alter system switch logfile; System altered.
Even though there were 1000 changes to the table, just 9 SCNs increased. If we dump the redo record using the script dump_last_log.sql then we can see redo records have both SCN and SUBSCN below too. Many REDO records are having same SCN and SUBSCN combo.
REDO RECORD - Thread:1 RBA: 0x000010.0000001c.018c LEN: 0x00fc VLD: 0x01 SCN: 0x0002.8000fb87 SUBSCN: 1 01/19/2012 09:14:27 REDO RECORD - Thread:1 RBA: 0x000010.0000001d.0098 LEN: 0x00fc VLD: 0x01 SCN: 0x0002.8000fb87 SUBSCN: 1 01/19/2012 09:14:27 REDO RECORD - Thread:1 RBA: 0x000010.0000001d.0194 LEN: 0x00fc VLD: 0x01 SCN: 0x0002.8000fb87 SUBSCN: 1 01/19/2012 09:14:27 REDO RECORD - Thread:1 RBA: 0x000010.0000001e.00a0 LEN: 0x00fc VLD: 0x01 SCN: 0x0002.8000fb87 SUBSCN: 1 01/19/2012 09:14:27
Database link and SCNS
Database link based transactions can cause SCN increases too. For example, let’s say that, three databases db1, db2, and db3 participate in a distributed transaction and let’s say that their current SCN is 1000, 2000, 5000 respectively in these databases. At commit time, a co-ordinated SCN is needed for the distributed transaction and maximum SCN value from all participating databases is chosen; SCN value of these three databases will be increased to 5000.
Can you run out of SCN?
As you saw earlier, maximum SCN hard limit is 281 trillion. In addition to that, there is also a soft limit imposed by Oracle code as a protection mechanism. If the next SCN is more than the soft limit, ORA-600 is emitted and the operation cancelled. For example, in the case of database link based distributed transaction, if the co-ordinated SCN is greater than the soft limit ORA-600 emitted.
This soft limit is calculated using the formula (number of seconds from 1/1/1988) * 16384. As the number of seconds from 1/1/1988 is continuously increasing, soft limit is increasing at the rate of 16K per second continuously. Unless, your database is running full steam generating over 16K SCNs, you won’t run in to that soft limit that easily. [ But, you could create ORA-600 by resetting your server clock to 1/1/1988].
Problem comes if many interconnected databases each generating at higher rate in kind of round-robin fashion.DB1 generates 20K SCNs per second in the first 5 minutes, DB2 generates 20K SCNs per second in the next 5 minutes, DB3 generates 20K SCNs per second in the next 5 minutes etc. In this case, all three Databases will have a sustained 20K SCNs per second rate. Database is slowly catching up to soft limit (1 second per every 4 second exactly) and again, it will take many years for them to catch up to the soft limit assuming the databases are active, continuously. But, there is that infamous, hated by my client, hot backup bug.
(BTW, To reach hard limit, it will take 544 years to run out of SCN at 16K rate normally (65536*4*1024*1024*1024 / 16384 / 60/60/24/365)).
Here is an example of ORA-600  error. In this example lines printed below, 2838 is the SCN wrap and 395527372 is the SCN base. If we convert this to decimal SCN it is in the 12 Trillion range. Database link based connection was trying to increase the SCN over 12 Trillion value, but it was rejected by the database as the SCN was exceeding the soft limit.
ORA-00600: internal error code, arguments: , , , , , , , , , , , 
BTW, in 10g, this 16K per second was hard coded. But, 11gR2, this limit is controlled by an underscore parameter _max_reasonable_scn_rate defaulting to 32K.
Hot backup bug
Most DBAs use RMAN to do backup. But, still, there are few databases that use hot backup mode, primarily because of disk mirror based backups. It is a common behavior to see higher SCN rate if the database is altered to hot backup mode. A SGA variable array keeps track of the backup mode at file level. When you alter the database out of backup mode, SGA variables are reset and the higher SCN rate goes back to normal. Due to a bug (12371955), that SGA variable is not reset leaving the database to think that it is still in hot backup mode. Database generates SCN at higher rate. (if you recycle the database later, of course, the variable is reset to normal rate). There is way to dump the SGA variable to check if the database currently thinks if it is in hot backup mode or not.
Due to this bug, an highly active database can create increased SCN rate over 16K. Over a long period of time (in fact, it probably will take many years) the SCN catches up to the soft limit. Once soft limit is reached, next SCN update will throw ORA-660 errors. Of course, this SCN growth is propagated to other databases over database link. As the soft limit calculation is time based, time zone of the server is also important. For example, if the values are close enough to soft limit, then the databases running in US Eastern time zone will have an higher soft limit by (4*60*60*16384 =235 million ) then the databases running in Pacific Time Zone.
Salient points of the bug are:
- There is no corruption danger, sessions might die or the databases might throw ORA-600 errors. In rare cases, databases have to be kept down for few hours or distributed transaction removed from the database so that the head room between the soft limit and the current SCN is widen.
- This bug affects only if you use ‘ALTER DATABASE’ command. If you use, ‘ALTER TABLESPACE’ command for backup, you are not affected by this bug.
- SCN rate is also directly relevant to activity. If the database has lower activity, SCN rate is also lower, even when the database is altered to backup mode with this bug.
There is a script released by Oracle that can tell you how close your database is to the soft limit,aka SCN headroom. So, first check if your database is having any SCN issue or not, that script is available as bug 13498243 and tells you how many days of SCN headroom you have.
How to check SCN rate?
There are multiple ways to check SCN rate in your database.
smon_scn_time keeps track of the mapping between time and SCN at approximately 5 minutes granularity. That can be used to measure SCN rate, see code below. Although, this is easier to check, remember that there is no easy way to identify if the SCN increase is due to intrinsic activity in the database or is it due to an external database increasing the SCN by a distributed transaction activity. We will discuss this differentiation later.
with t1 as( select time_dp , 24*60*60*(time_dp - lag(time_dp) over (order by time_dp)) timediff, scn - lag(scn) over(order by time_dp) scndiff from smon_scn_time ) select time_dp , timediff, scndiff, trunc(scndiff/timediff) rate_per_sec from t1 order by 1 / TIME_DP TIMEDIFF SCNDIFF RATE_PER_SEC -------------------- ---------- ---------- ------------ 19-JAN-2012 15:23:21 315 2931 9 19-JAN-2012 15:25:46 145 708 4 19-JAN-2012 15:28:00 134 1268 9 19-JAN-2012 15:30:48 168 597 3 19-JAN-2012 15:35:51 303 4148 13 19-JAN-2012 15:36:47 56 103 1 19-JAN-2012 15:42:14 327 671 2
v$log_history also can be used to check the SCN rate of the database. In this code below, you can see the SCN rate per second queried from v$log_history. Even if you are running in RAC, query against v$log_history is sufficient as it holds the archive logs from all threads. If there is a SCN spike, say from a remote database, then you will see a SCN spike in the output of this query below.
alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS'; col first_change# format 99999999999999999999 col next_change# format 99999999999999999999 select thread#, first_time, next_time, first_change# ,next_change#, sequence#, next_change#-first_change# diff, round ((next_change#-first_change#)/(next_time-first_time)/24/60/60) rt from ( select thread#, first_time, first_change#,next_time, next_change#, sequence#,dest_id from v$archived_log where next_time > sysdate-30 and dest_id=1 order by next_time ) order by first_time, thread# / THREAD# FIRST_TIME FIRST_CHANGE# NEXT_CHANGE# SEQUENCE# DIFF RT ---------- -------------------- --------------------- --------------------- ---------- ---------- ---------- 2 12-JAN-2012 16:10:30 25995867 26026647 308 30780 0 1 17-JAN-2012 14:05:00 26026649 26028427 555 1778 1 1 17-JAN-2012 14:05:00 26026649 26028427 555 1778 1 2 17-JAN-2012 14:05:00 26026647 26028432 309 1785 1 2 17-JAN-2012 14:05:00 26026647 26028432 309 1785 1 1 17-JAN-2012 14:27:21 26028427 1073743815 556 1047715388 814076 2 17-JAN-2012 14:48:48 26028157 26028230 1 73 3 2 18-JAN-2012 14:22:23 26076103 10737418303 3 1.0711E+10 7448778 1 18-JAN-2012 14:22:24 26076106 10737427850 5 1.0711E+10 1458319 1 18-JAN-2012 16:24:49 10737427850 10737427884 6 34 2 1 18-JAN-2012 16:25:03 10737427884 10737428252 7 368 1
In the output above, there was a SCN jump by 10 Billion between 14:27 and 14:05. You can’t differentiate if that increase came from external systems or is it due to intrinsic activity easily. In this specific case, because this is an extreme SCN increase, and I would guess that it came from external systems. ( But usually this level of SCN increase will not happen in your production site and my example is to just explain the concept).
What happens in RAC?
In RAC, instance that receive the update from external system will increase the SCN of the database SCN to the new higher SCN. When other instances query for next SCN, immediately that SCN increase will be propagated to other instances too.
Can two threads get same SCN?
Obvious answer is No. Correct answer is yes. For example, redo records from two threads shows that they have exact same SCN and subSCN. This is not a problem or concern, as the buffer changes are protected by GCS layer code, and the row changes are protected by locking mechanism.
node 1: REDO RECORD - Thread:1 RBA: 0x000010.0000007f.0114 LEN: 0x0138 VLD: 0x01 SCN: 0x0002.8000fb91 SUBSCN: 1 01/19/2012 09:14:27 node 2: REDO RECORD - Thread:2 RBA: 0x000007.00000003.0010 LEN: 0x0068 VLD: 0x05 SCN: 0x0002.8000fb91 SUBSCN: 1 01/19/2012 09:14:27
Intrinsic vs Extrinsic SCN growth
There is a statistic that can also guide us to determine if the SCN increase is intrinsic or extrinsic or not. Statistics ‘calls to kcmgas’ gives an approximate number of calls to allocate SCNs. This statistics is an estimate only, not an absolute count of generated SCNs. We will understand this stats with a script and an helper function.
create or replace function get_my_statistics (l_stat_name varchar2)
return number as
select ses.value into l_value
from v$sesstat ses , v$statname stat
where stat.statistic#=ses.statistic# and
ses.sid=(select sid from v$mystat where rownum and stat.name = l_stat_name;
alter system switch logfile;
host sleep 5
create table rs.dropme (n1 number , n2 number);
col curscn format 99999999999999999999999
select dbms_flashback.get_system_change_number curscn , get_my_statistics('calls to kcmgas') kcmgas from dual;
for i in 1 .. 100000
insert into rs.dropme values(i, i);
select dbms_flashback.get_system_change_number curscn , get_my_statistics('calls to kcmgas') kcmgas from dual;
alter system switch logfile;
Output of the above script is:
CURSCN KCMGAS ------------------------ ---------- 10737522265 0 PL/SQL procedure successfully completed. CURSCN KCMGAS ------------------------ ---------- 10737523122 826
From the output, we can see that 857 SCN differences vs 826 kcmgas calls form this session. There could be other background processes generating SCN which would explain this difference. Even at instance level, it doesn’t match exactly, but multiplying ‘kcmgas calls’ statistics by 1.1 gives you better estimate. This method can be used to identify if the SCN growth is intrinsic or extrinsic in a database. It can be also used to identify the instance generating more SCNs in a RAC cluster or the database generating more SCNs in a complex interconnected environment.
SCN Vulnerability issue
I am not going to discuss details about this vulnerability issue at all. But, this vulnerability require access to production database. DBAs with security in mind, don’t allow production access that easily anyway. So, In my opinion, it is a problem that must be addressed, but you would need a malicious DBA with expert level knowledge to misuse this vulnerability. Follow Oracle support direction on this one as I usually stay away from talking about security vulnerability issues. Check here for details
I have been holding on publishing this blog entry for many months now. Since this issue is in the public knowledge domain, I can share the knowledge without any repercussions. In a nutshell, understanding SCN generation and intrinsic details about it is important. Armed with scripts, you can review your environment.
Update 1: Correcting some formatting issues, sorry
Update 2: Correcting verbatim to read as “Essentially, multiply base by 4 billion and add wrap to get the SCN in number format”