Henry Xie 's blog: 2012

Monday, December 17, 2012

How To Check Dead Process in Linux

use lsof command to check if any files deleted when processes are running

* lsof -p PID

use strace to check the pids are not processing anything

* time strace -p PID

use top then H, to see all the threads

or use -m in ps

ps -efwm|grep httpd

strace -o /tmp/strace.output -p PID

how to check which process is taking a port
for example, apache can't start on port 80

lsof -i TCP:80

Sunday, December 02, 2012

How to Recover Datafile From Standby DB

Goal

Problem Statement

How to recover the primary database's datafile using a copy of a standby database's datafile.

and

How to recover the standby database's datafile using a copy of a primary database's datafile.
Solution
Recovering the Primary's Datafile

How to recover the primary database's datafile using a copy of a standby database's datafile.

1) copy the standby database's datafile

2) ftp the file to the primary site

3) catalog the datafile copy on the primary site

4) on the primary site restore and recovery the datafile copy

This procedure will work for all file systems - cooked, raw or ASM.
Example

Through this example we will be using datafile 25.

1) On standby database, copy datafile from ASM to a cooked file system:

       9i: RMAN> copy datafile 25 to '/tmp/df25.cpy';

       10g: RMAN> backup as copy datafile 25 format '/tmp/df25.cpy';

2) FTP the file to primary server on cooked system

On primary database
~~~~~~~~~~~~~~~~~~~
3) catalog this datafile copy:
       SQL> alter database datafile 25 offline;
       RMAN> catalog datafilecopy '/tmp/df25.cpy';

4) Confirm that datafile exists:
       RMAN> list copy of datafile 25;

5) Restore the datafile:
       RMAN> restore datafile 25;

6) Recover the datafile:
       RMAN> recover datafile 25;

recover datafile 7 until time '09-AUG-2010 14:00:00';

recover datafile 7 until time to_date('09-aug-2010 14:00:00','dd-mon-rrrr hh24:mi:ss')

7) Place the datafile online:
       SQL> alter database datafile 25 online;

Recovering the Standby's Datafile

If recovering the standby, reverse the steps. That is:

1) copy the primary database's datafile

2) ftp the file to the standby site

3) catalog the datafile copy on the standby site

4) stop Redo Apply on the Physical Standby Database

    SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

5) on the standby site restore and recovery the datafile copy

Thursday, November 22, 2012

A Useful Sql To Check Tablespace Growth

To find latest 30 days tablespace growth report:

set heading on
set linesize 5500
set pages 999

SELECT TO_CHAR (sp.begin_interval_time,'DD-MM-YYYY') days
   , ts.tsname
   , max(round((tsu.tablespace_size* dt.block_size )/(1024*1024),2) ) cur_size_MB
   , max(round((tsu.tablespace_usedsize* dt.block_size )/(1024*1024),2)) usedsize_MB
   FROM DBA_HIST_TBSPC_SPACE_USAGE tsu
, DBA_HIST_TABLESPACE_STAT ts
    , DBA_HIST_SNAPSHOT sp
    , DBA_TABLESPACES dt
    WHERE tsu.tablespace_id= ts.ts#
   AND tsu.snap_id = sp.snap_id
   AND ts.tsname = dt.tablespace_name
   AND ts.tsname IN ('TEST1','TEST2')
   AND sp.begin_interval_time between sysdate -30 and sysdate
   GROUP BY TO_CHAR (sp.begin_interval_time,'DD-MM-YYYY'), ts.tsname
   ORDER BY ts.tsname, days;

To find specific date of tablespace report: ie 01sep2012 or 11nov0212

SELECT TO_CHAR (sp.begin_interval_time,'DD-MM-YYYY') days
   , ts.tsname
   , max(round((tsu.tablespace_size* dt.block_size )/(1024*1024),2) ) cur_size_MB
   , max(round((tsu.tablespace_usedsize* dt.block_size )/(1024*1024),2)) usedsize_MB
   FROM DBA_HIST_TBSPC_SPACE_USAGE tsu
, DBA_HIST_TABLESPACE_STAT ts
    , DBA_HIST_SNAPSHOT sp
    , DBA_TABLESPACES dt
    WHERE tsu.tablespace_id= ts.ts#
   AND tsu.snap_id = sp.snap_id
   AND ts.tsname = dt.tablespace_name
   AND ts.tsname IN ('TEST1','TEST2')
   AND TO_CHAR (sp.begin_interval_time,'DD-MM-YYYY') = '01-09-2012'
   GROUP BY TO_CHAR (sp.begin_interval_time,'DD-MM-YYYY'), ts.tsname
   ORDER BY ts.tsname, days;

then you can easily figure how much space growth between 01sep2012 and 11nov0212

Sunday, November 18, 2012

How To View Oracle Text Query Exec Plan

Oracle Text Query is not like normal sql which we can get sql plan via "explain plan for" "select * from table(dbms_xplan.display());".....

It has specific API to get them:

Creating the Explain Table
To create an explain table called test_explain for example, use the following SQL
statement:
create table test_explain(
explain_id varchar2(30),
id number,
parent_id number,
operation varchar2(30),
options varchar2(30),
object_name varchar2(64),
position number,
cardinality number);

Running CTX_QUERY.EXPLAIN
To obtain the expansion of a query expression such as comp% OR ?smith, use CTX_QUERY.EXPLAIN as follows:
ctx_query.explain(
index_name => ''TEST_IDCTEXT2',
text_query => 'comp% OR ?smith',
explain_table => 'test_explain',
sharelevel => 0,
explain_id => 'Test');

exec ctx_query.explain(index_name => 'TEST_IDCTEXT2',text_query => '((((DEFINESCORE((test), RELEVANCE * .1)) ))) and (((((EMPL%) WITHIN dDocAccount)) or (((CATEST1%) WITHIN dDocAccount)) or (((GIT/GLOBAL#0023IT#0023ONLY%) WITHIN dDocAccount)) or (((GIT/ALL%) WITHIN dDocAccount)) or (((idcnull) WITHIN dDocAccount))))',explain_table => 'test_explain',sharelevel => 0,explain_id => 'Test');

Retrieving Data from Explain Table
To read the explain table, you can select the columns as follows:

select lpad(' ',level-1)||operation||' '||options||' '||object_name "Plan"
from test_explain
connect by prior id = parent_id
start with id=1
order siblings by position

Wednesday, November 14, 2012

How Exadata Offload Some RMAN Incremtal Backup Load To Cell

It's well known that from 10g, oracle db has a new feature to enable block change tracking. It does a great job to avoid RMAN to scan the whole data files while only track the data that have been changed since the last backup.

At that time, exadata was not out yet. With new Exadata with storage cell storage, It enhances the block change tracking and offload some incremental backup load to Cell. Here is how it works

The key is on _bct_chunk_size which is hidden parameter ,default is 0 .
However CTWR(Change Tracking Writer) uses 32K(a chunk).It can't be 0 though in reality.
If we have DB Block set 8K, it means the minimum track unit of CTWR is 4 db blocks.
If one of 4 is changed, all 4 are tracked in tracking files

Before Exadata, RMAN need some work to pick the changes blocks from the chunk. Now with Exadata, this work can be offloaded into Cell. On Exadata, oracle recommend AU size to be 4M.
4M can hold quite a few 32k(chunk) . It gives enough space for Cell to filter or inspect blocks in the AU, then return the only changed blocks to Database, thus offload some load for RMAN.

To check ASM AU size is simple:
select ALLOCATION_UNIT_SIZE,NAME from v$asm_diskgroup;

How To Delete Backupset Via BP Key

Sometimes we have 2 copies of 1 backupset, they are with the same tag.
If we wanna delete 1 and keep only 1 copy, we can't delete it via tag like -- delete backupset tag '***'

In this case we need to use " change backuppiece <BP_KEY> delete;"

Monday, November 12, 2012

How To Use NETSTAT To Find Interrupted Connections

This note is taking ldap connection for example. We use can the same method for other type of connections
Below example is to show ldap connections are interrupted by firewall. Application side has no idea what happen ,thus spinning on ldap search.

apps% netstat -ap | grep ldap| grep 24512
(Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.)
tcp        0      0 test.us.test.com:24512 bigip-testldap-stage.us.:ldap ESTABLISHED 27714/java

And I worked with LDAP engineer to get the corresponding netstat output on the ldap server:

ldap$ netstat -ap | grep 24512
(Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.)
tcp        0      0 ldap.us.test.com:ldap test.us.test.com:24512 ESTABLISHED 1442/oidldapd

So the connection was established on both sides and thus login to WebLogic Admin had no issue.

After around 30mins. Did the same netstat test and only apps server had showed an output while the connection on the ldap server was already gone/closed.

apps% netstat -ap | grep ldap| grep 24512
(Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.)
tcp        0      0 test.us.test.com::24512 bigip-testldap-stage.us.:ldap ESTABLISHED 27714/java

ldap$ netstat -ap | grep 24512
    (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.)

...nothing

So the connections are interrupted by firewall.
In this case, I got the spinning (Ldap stuck) issue trying to logon to the WebLogic Admin console.
The solution is simple. Add a regular test to make connections alive.

Sunday, November 11, 2012

Error When Enabling Block Change Tracking For RMAN

Here is the common error you see when you enable it:

SQL> alter database enable block change tracking;
alter database enable block change tracking

ERROR at line 1:
ORA-19773: must specify change tracking file name

The error is due to we need set both db_create_file_dest

If not,
alter system set db_create_file_dest='+DATA' scope=both sid='*';

or manually use location:

alter database enable block change tracking using file 'location';

All these parameters are dynamic, no need to bounce DB

Thursday, November 01, 2012

A few Tips To Save The Database Statistics

For baselines and reference: export AWR
        ~rdbms/admin/awrextr.sql
        ~rdbms/admin/awrload.sql

For diagnostics: always get AWR and ASH reports
        ~rdbms/admin/ashrpt
~rdbms/admin/awrrpt

For emergencies and “last gasp”: get ASH dump(*)
         oradebug ashdump

Wednesday, October 31, 2012

A Few CHM/OS Command For CRS

CHM/OS is installed by default after CRS 11.2.0.2
By default the location is on $GI_HOME/crf/

To check CHM/OS status:
crsctl stat res –t –init

To gather last 1 hour statistics of CHM/OS

oclumon dumpnodeview -allnodes -v -last "01:00:00" > chmos_data.txt

To gather certain time of statistics
start on "2012-10-28 21:05:00"
end on "2012-10-28 21:15:00"
oclumon dumpnodeview -allnodes -v -s "2012-10-28 21:05:00" -e "2012-10-28 21:15:00"

More hints on
oclumon showobjects -h
oclumon dumpnodeview -h
oclumon manage -h
oclumon version -h
oclumon debug -h

Sunday, October 28, 2012

Steps To Change DBNAME DBID

To reset the DBID and DBNAME, here are some top level steps

alter system set cluster_database=false scope=spfile;
shutdown immediate;
startup mount;
exit;
nid TARGET=SYS DBNAME="your DB Unique Name"
sqlplus / as sysdba
create pfile=’dbid_pfile.ora’ from spfile ='spfile_dbname.ora';
exit;
Edit dbid_pfile.ora
SET *.db_name parameter =’ <DB_UNIQUE_NAME>’
SET *.cluster_database=true
Save your changes.
sqlplus / as sysdba
create spfile ='spfile_dbname_new.ora' from pfile=’dbid_pfile.ora’;
startup mount;
alter database open resetlogs;

Refer : http://docs.oracle.com/cd/B28359_01/server.111/b28319/dbnewid.htm

Thursday, October 25, 2012

Linux HugPages With SGA

Normally when your Linux OS (no matter 32bit or 64bit)has more than 16G RAM while DB SGA is more than 8G, we recommend to set hugpages for SGA.
Just remember we can't use Automatic Memory Management (AMM) and HugePages together.
The main benefit is to avoid swap in OS and save the PIN RAM,thus overall memory performance improves

The calculation is simple:

# Hugepages/bigpages values are strongly tied to SGA size.
# vm.nr_hugepages = ((1+3%)*SGA_SIZE)/2MB

# vm.hugetlb_shm_group = <oracle process group id>
# vm.hugetlb_shm_group =1001

Main Parameters When Tuning Oracle NET

DEFAULT_SDU_SIZE
RECV_BUF_SIZE
SEND_BUF_SIZE
TCP.NODELAY

The whole list of 11gR1 is here
http://docs.oracle.com/cd/B28359_01/network.111/b28317/sqlnet.htm

Wednesday, October 24, 2012

A few DataGuard Sqls

Just share the sqls I often use to monitor DataGuard

Monitor log transport sql (primary):
SELECT DB_NAME, HOSTNAME, LOG_ARCHIVED, LOG_APPLIED,APPLIED_TIME,
LOG_ARCHIVED-LOG_APPLIED LOG_GAP
FROM
(
SELECT NAME DB_NAME
FROM V$DATABASE
),
(
SELECT UPPER(SUBSTR(HOST_NAME,1,(DECODE(INSTR(HOST_NAME,'.'),0,LENGTH(HOST_NAME),
(INSTR(HOST_NAME,'.')-1))))) HOSTNAME
FROM V$INSTANCE
),
(
SELECT MAX(SEQUENCE#) LOG_ARCHIVED
FROM V$ARCHIVED_LOG WHERE DEST_ID=1 AND ARCHIVED='YES'
),
(
SELECT MAX(SEQUENCE#) LOG_APPLIED
FROM V$ARCHIVED_LOG WHERE DEST_ID=2 AND APPLIED='YES'
),
(
SELECT TO_CHAR(MAX(COMPLETION_TIME),'DD-MON/HH24:MI') APPLIED_TIME
FROM V$ARCHIVED_LOG WHERE DEST_ID=2 AND APPLIED='YES'
);

Monitor recovery (standby)
select inst_id, process, status, count(*)
from gv$managed_standby
group by inst_id, process, status
order by inst_id, process, status;

Gap sql (standby):
set linesize 200
column max_redo_seq format a30
column max_recovery_seq format a30
column gap_seq format a30

SELECT thread,
       max(redo) || ' : ' || max(next_scn_redo) as max_redo_seq,
       max(recovery) || ' : ' ||   max(next_scn_recovery) as max_recovery_seq,
       (max(redo) - max(recovery)) || ' : ' || (max(next_scn_redo) - max(next_scn_recovery)) as gap_seq
FROM (
select thread# as thread,
       max(sequence#) as redo,
       max(next_change#) as next_scn_redo,
       0 as next_scn_recovery,
       0 as recovery
from v$archived_log
group by thread#
union all
select thread# as thread,
       0 as redo,
       0 as next_scn_redo,
       max(next_change#) as next_scn_recovery,
       max(sequence#) as recovery
from v$log_history
group by thread# )
group by thread
order by thread
/

SELECT PROTECTION_MODE, PROTECTION_LEVEL, DATABASE_ROLE FROM V$DATABASE

ALTER DATABASE REGISTER logfile '/oraarch/testdb/testdb_628319983_1_43377.arc';

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

Tips on BigIP iRule to Filter Certain Urls

We have requirement to redirect certain urls to another website, here are example of how to deal with it on F5 Bigip iRule

Case 1:
Any urls contains below 3 strings will be redirected
"/content/idcplg/webdav*"   "/_dav/content/idcplg/webdav*"   "/*sscontributor=true*"

.....
else {
         switch -glob [string tolower [HTTP::uri]] {
            "/content/idcplg/webdav*" -
            "/_dav/content/idcplg/webdav*" -
            "/*sscontributor=true*" {
               HTTP::redirect "http://our-stage.testadc.com/"
               event disable all
            }
...

Case 2:
Detect if cookie wcm.contributor.mode exists and value is true, we redirect urls
.......
elseif { [HTTP::cookie value "wcm.contributor.mode"] eq "true" } {
         HTTP::redirect "http://our-stage.testadc.com/"
         event disable all
      }
......

Case 3:
Detect if cookie wcm.contributor.mode exists and value is true, we remove the cookie
.....
elseif { [HTTP::cookie value "wcm.contributor.mode"] eq "true" } {
          HTTP::cookie remove wcm.contributor.mode
                      }

Case 4
Any urls from right contains "?SSContributor=true" . It will be trimmed
As "?" is special character, we need to trim twice
.....
"/*sscontributor=true*" {
               HTTP::redirect [string trimright [string trimright [HTTP::uri] "SSContributor=true"] \?]
.....

Wednesday, October 17, 2012

A Sql To Get Sid Of Blocking Session

There are many sql to get this information. They are all good. I just share mine.

SELECT blocking_sid, num_blocked FROM ( SELECT blocking_sid, SUM(num_blocked) num_blocked
    FROM ( SELECT l.id1, l.id2, MAX(DECODE(l.block, 1, i.instance_name||'-'||l.sid,    2, i.instance_name||'-'||l.sid, 0 )) blocking_sid,
   SUM(DECODE(l.request, 0, 0, 1 )) num_blocked
   FROM gv$lock l, gv$instance i
WHERE ( l.block!= 0 OR l.request > 0 ) AND
   l.inst_id = i.inst_id
GROUP BY l.id1, l.id2)
GROUP BY blocking_sid
   ORDER BY num_blocked DESC) WHERE num_blocked != 0 ;

Tuesday, October 16, 2012

How To Get Oracle DB Object Definition

Normally we can use dbms_metadata.get_ddl to get most of object definition

examples:

SELECT dbms_metadata.get_ddl('FUNCTION', 'WHOAMI') FROM dual;

SELECT dbms_metadata.get_ddl('TABLESPACE', 'DATA') FROM dual;

select dbms_metadata.get_ddl('INDEX','REVISIONS_CLASSIDSTATE') from dual;

select dbms_metadata.get_ddl('DB_LINK','TEST.US.ORACLE.COM') from dual;
select dbms_metadata.get_ddl('USER','MONITOR') from dual;

select dbms_metadata.get_ddl('PROCEDURE','COMBINE_FOLDER_STRUCTURE') from dual;

select dbms_metadata.get_ddl('FUNCTION','TEST') from dual;

select dbms_metadata.get_ddl('TABLE','MYTEST' ) from dual;

However for Oracle Text , we need to use specific ones:
examples:

set long 500000
set pages 999
select ctx_report.create_index_script('FT_MYTEXT2') from dual;

set long 50000
set pages 9999
select dbms_metadata.get_ddl('TABLE', 'DR$FT_MYTEXT2MI') from dual;
select dbms_metadata.get_ddl('TABLE', 'DR$FT_MYTEXT2$I') from dual;

Friday, October 12, 2012

How To View Exadata Cell Offload On Enterprise Manager Grid Control

Sometimes we would like to review a SQL performance gain on Exadata. There are many ways to compare, here I would point out a view on Enterprise Manager Grid Control.

Login Enterprise Manager Grid Control --> find your DB --> click performance tab
-->click SQL monitoring -->click SQL ID you are interested in

If your DB is running on Exadata, you will see "Cell Offload Efficiency" on up right.
Move your mouse on the percentage number, you will numbers of "Bytes from disk" "Bytes returned by Exadata"

"Bytes from disk" means how much bytes DB is expecting from Disk no matter exadata or not
"Bytes returned by Exadata" means Exadata Cells return how much data to DB on interconnect layer
By smart scan,filter,projection, Cell would return a subset data thus offload much traffic to DB layer.

Keep it in mind, sometimes the metric can be negative because of EHCC, ASM mirror...etc
Don't be surprise if you see something like -32%

How To DrillDown ZFS Latency By Client

In some cases, UCM cluster will compete a shared file on ZFS shared mount point, sometimes we need to know latency when UCM nodes do a IO on ZFS shares.

Here is how we do it from ZFS UI. There are lots of metric on ZFS UI, you can play with them to suite your need.

Login ZFS UI -->click Analytics -->add statistics
-->choose "Protocol: NFSv3 operations per second broken down by latency"
-->right click the top latency -->drilldown by client or share or file name...

You can see which client , which share have the top latency....to help you dig down root cause.

see screenshot below

Wednesday, October 10, 2012

How To Backup Table/Index When It has Blocks Corruption

You should have ora600 error in your alert logs
There are many notes of how to deal with ora600 ,see note 1088018.1 68013.1 556733.1
My note here is how to backup the table/index when there is block corruption.
In this case, you can claim back blocks which are not corrupted and keep them safe.

Step 1: How to get corrupted rowid from block number:
You can get block number from alert or trace files
Use sql below to get rowid:
select rowid from ucm_search.idctext2 where DBMS_ROWID.ROWID_BLOCK_NUMBER(rowid) = 527989

Step 2: After we find the rowid of table, we can select most of rows into backup table
The sql we use is like

create table ucm_search.idctext2_bak from select * from ucm_search.idctext2 where rowid not in ('AAAuMEAAjAACA51AAB','AAAuMEAAjAACA51AAA');

Step 3: You can rename it to original name to keep system running
Sql like:
ALTER TABLE idctext2_bak rename to idctext2;

How To Handle Error When You Remove DB From CRS

This is a common error when you have multiple DB HOME. You did some DB upgrades and CRS version is higher than DB

Error like:
CRS % srvctl remove database -d TESTPRD
PRCD-1027 : Failed to retrieve database TESTPRD
PRCD-1027 : Failed to retrieve database TESTPRD
PRKP-1088 : Failed to retrieve configuration of cluster database TESTPRD
PRKR-1078 : Database TESTPRD of version 11.0.0.0.0 cannot be administered using current version of srvctl. Instead run srvctl from /u01/app/oracle/product/10.2.0.3/TESTPRD

srvctl is not working from TESTPRD home because of compatible issues

Solution is:
Use srvctl from another DB HOME (ie MYTESTDB home) to remove the TESTPRD targets

Java Installation No space Error

When you run command below:
java -d64 -jar wls1036_upgrade_generic.jar

You may hit this error like:

Fatal error encountered during self-extraction. [No space left on device]

The reason is /tmp is too small

You can't enlarge /tmp easily in some systems as it was predefined.
workaround I have is to use -Djava.io.tmpdir
java -d64 -Djava.io.tmpdir=/u02/installs/11.1.1.6_setup/WebLogic/ -jar wls1036_upgrade_generic.jar

Tuesday, October 09, 2012

Tips of Handling Oracle Text Optimization Issues

Example of Sql to run optimization

begin
ctx_output.start_log('opt_rebuild_TEST_IDCTEXT1');
ctx_output.add_event(ctx_output.event_opt_print_token);
ctx_ddl.optimize_index(idx_name=>'TEST_IDCTEXT1',optlevel=>'REBUILD');
ctx_output.end_log;
end;
/

If proc error out straight away, logs show it started and ended without rows in between, it indicates it may have shadow index, need to drop it
To drop shadown index: exec ctx_ddl.drop_shadow_index('TEST_IDCTEXT2');

Check idx_options
select idx_name, idx_status, idx_option from ctxsys.dr$index where idx_name='TEST_IDCTEXT1';
to check idx_option (this is assuming TEST_IDCTEXT1 is current index)
if idx_option=C....then
conn ctxsys
update dr$index set idx_option='CY' where idx_name='TEST_IDCTEXT1';

How to add debug information during optimziation
begin
ctx_output.start_log('opt_reb_TEST_IDCTEXT1_'||to_char(sysdate,'DD-MON-YY_HH24MISS')||'.log');
ctx_output.add_event(ctx_output.event_opt_print_token);
ctx_output.add_event(ctx_output.EVENT_DRG_DUMP_ERRORSTACK,ctx_output.DRG_DUMP_ALL_ERRORS);
ctx_ddl.optimize_index(idx_name=>'TEST_IDCTEXT1',optlevel=>'FULL');
ctx_output.end_log;
end;

How to use sql to rebuild index online
ctx_output.start_log('TEST_IDCTEXT1_Rebuild14july2012.log');
alter index UCM_SEARCH.TEST_IDCTEXT1 rebuild online parameters('replace memory 200M');
ctx_output.end_log;

Examples of Dumping Redo Blocks From Archivelogs

More details see MOS note 960780.1
Here are pure examples for reference:

ALTER SYSTEM DUMP LOGFILE '/archive/<dbname>/<instancename>/ArchiveOnLine/<dbname>_709323090_1_8601.arc' DBA MIN 28 21649 DBA MAX 28 21649;
ALTER SYSTEM DUMP LOGFILE '/archive/<dbname>/<instancename>/ArchiveOnLine/<dbname>_709323090_1_8601.arc' DBA MIN 28 144259 DBA MAX 28 144259;
ALTER SYSTEM DUMP LOGFILE '/archive/<dbname>/<instancename>/ArchiveOnLine/<dbname>_709323090_1_8601.arc' DBA MIN 28 145056 DBA MAX 28 145056;

dump file#28 block: 21649
dump file#28 block: 144259
dump file#28 block: 145056

How to check hidden init DB parameters

how to check the implicit hidden init parameters such as "_disable_cell_optimized":

Use below sql:

select a.ksppinm "Parameter", b.ksppstvl "Session Value",
c.ksppstvl "Instance Value" from x$ksppi a, x$ksppcv b, x$ksppsv c
where a.indx = b.indx and a.indx = c.indx and a.ksppinm like '%_disable_cell_optimized%';

Examples of Finding then Deleting in LInux

Delete files in linux older than 1 year:

find /u01 -type f -mtime +365 -ls -exec rm -f -- {} \;

Delete files in linux in last 3 days:

find /u01 -type f -mtime -3 -ls -exec rm -f -- {} \;

Delete folders in linux older than 1 year:

find /u01 -type d -mtime +365 -ls -exec rm -rf -- {} \;

Delete trace files in linux older than 2 days

find . -type f -name "*.tr*" -mtime +2 -ls -exec rm -f -- {} \;

Oracle DB 11gR2 AWR Global Report Generation

Before 11gR2, the awrrpt.sql under rdbms/admin only generates awr report for local instance. You have to collect awr report for each of RAC node.

In 11gR2 we have two new scripts awrgrpt.sql & awrgdrpt.sql for RAC

awrgrpt.sql -- AWR Global Report (RAC) (global report)
awrgdrpt.sql -- AWR Global Diff Report (RAC)

Plus below for your reference
spawrrac.sql -- Server Performance RAC report
awrsqrpt.sql -- Standard SQL statement Report
awrddrpt.sql -- Period diff on current instance
awrrpti.sql -- Workload Repository Report Instance (RAC)

Monday, October 08, 2012

How to ssh into Exadata Cell host

Exadata Cell Host is with Oracle Linux
Normally we can not ssh into it via ethenet network.
We build trust host in exadata DB host, we ssh into DB host first, then ssh into Cell hosts

Exadata Cell init and trace files location version 11.2.3.1.1

Exadata Cell init file: version 11.2.3.1.1
This is just an example:
/opt/oracle/cell11.2.3.1.1_LINUX.X64_120426/cellsrv/deploy/config

Exadata Cell trace files: version 11.2.3.1.1
This is just an example:
/opt/oracle/cell11.2.3.1.1_LINUX.X64_120426/log/diag/asm/cell/<cell hostname>/trace

How to update init files and gather trace:
0. offline disks
1. update cellinit.ora with event trace
2. offline cells
3. stop/start celld/check trace enable
    service celld stop
    service celld start
or
CellCLI> alter cell shutdown services all;
CellCLI> alter cell startup services all;

    cd $CELLTRACE
    view alert*log
4 online disks

A Few Commands to manage OCR & Voting Disk

use ocrcheck to see information of ocr locations

to replace or move ocr:
   touch the ocr file first
   i.e touch /oracommon1_ocr/oracrs/ocrmirror/ocr1
   as root   ./ocrconfig -replace /oracommon2_ocr/oracrs/ocr -replacement /oracommon1_ocr/oracrs/ocrmirror/ocr1
No need to bounce CRS

Manage Voting disk
No need to bounce CRS: as root:
./crsctl add css votedisk /oracommon1_vote/oracrs/vdsk1
./crsctl delete css votedisk /oracommon2_vote/oracrs/vdsk
./crsctl query css votedisk

CRS alert from Oracle Enterprise manager

We have agent installed on our RAC DB. Oracle Enterprise Manager grid control is monitoring CRS on it
The alert is like :

Cluseterware has problems on all hosts of this cluster. Verifying CRS integrity ,,Checking CRS integrity...,,ERROR: ,PRVF-4037 : CRS is not installed on any of the nodes,Verification cannot proceed,,,CRS integrity check failed,,Verification of CRS integrity was unsuccessful on all the specified nodes. ,NODE_STATUS:

We can use ocrcheck of CRS Home to verify it manually. The manual test is fine.
Then we suspect something affecting agent to run the integrity script.
It turns out we change oraInventory of /etc/oraInst.loc. It confused agent, we rollback the changes, the alert is gone. Good to know agent monitoring is tied with oraInventory.