Henry Xie 's blog: February 2018

Wednesday, February 28, 2018

Ansible Pbrun Issues

Symtom:
     Permission denied (publickey,password,keyboard-interactive).

Solution:
    Use flag -vvv to get detailed debug info
   -k to get into interactive password input

Symtom:
     [WARNING]: Could not match supplied host pattern, ignoring:

Solution:
    Need to add your hostname into /etc/ansible/hosts
    see github link

Symtom:
     "pbrun9.3.0-07[12461]: Invalid command\r\n", ???

Solution:
-vvv to get details ssh command
ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o ControlPath=/home/hxie/.ansible/cp/1369acbeec -tt acmsdbv3084.us.oracle.com '/bin/sh -c '"'"'pbrun content -u oracle '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-gfjkhmsxjbzyupwkwnusqvetgeoinkwf; /usr/bin/python /tmp/ansible-tmp-1519878480.59-177960971664277/setup.py'"'"'"'"'"'"'"'"' && sleep 0'"'"''

Test run and dig deeper on the target host for exec command
   debug1: Sending command: /bin/sh -c 'pbrun content -u oracle '"'"'echo BECOME-SUCCESS-gfjkhmsxjbzyupwkwnusqvetgeoinkwf; /usr/bin/python /tmp/ansible-tmp-1519878480.59-177960971664277/setup.py'"'"' && sleep 0'
    Got the same error

So it is related how the pbrun server policy is setup. Some pbrun policies do not allow to run ssh command like above. Please contact your pbrun admin

GitHub addressed other similar issue

Monday, February 19, 2018

Archiver Process cache TNS Entries for Data Guard Broker

When you setup Standby DB or Data Guard Broker, you may see this error
ORA-16737: the redo transport service for standby

Dataguard Broker trace logs shows
ORA-12154: TNS:could not resolve the connect identifier specified
ORA-16737: the redo transport service for standby database "testdg" has an error
2018-02-20 01:05:59.926 00001000 56049214 Operation HEALTH_CHECK canceled during phase 1, error = ORA-16778

However tnsping testdg is fine
sqlplus sys/***@testdg as sysdba ---- is also fine

Actions:

Set dgmgrl tracelevel to support
Remove and Add DB into DG broker configuration with different connect identifiers(no luck)
Modify tnsnames.ora entry for scan and vip(no luck)
Stop and Start DG broker (no luck)
Modify log_archive_dest_3 not to use TNS. (no luck)

alter system set log_archive_dest_3='SERVICE="(DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = test-adc-scan)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = WWWABCUDG.US.ORACLE.COM)))" ASYNC VALID_FOR=(ONLINE_LOGFILES,ALL_ROLES ) DB_UNIQUE_NAME=WWWABCDG';

Reason:

After trying all kinds of ways, we find it's related to archiver process which will not read the NEW entries from tnsnames.ora file. The new DB entries were unknown to archiver. That's why it keeps reporting ORA-12154

Solution:

Bounce the DB to refresh archiver processes which needs outages

Kill the archiver processes one by one. Use "alter system switch logfile" to spawn them back

Above 2 methods to force archiver processes to read tnsnames.ora and refresh entries

Sunday, February 18, 2018

About "SQL*Net break/reset to client" Event of Oracle DB

Users reported that his application was very slow when it exec a "insert" sql again oracle DB in the cloud

User has limited skills to turn on trace from apps

Actions:

Get AWR report from Cloud EM express
Use sqlt to get more detailed sql report
Ask user to set sql trace or turn on 10046 event
Tkprof the trace file

Reason:

The top event from the insert sql is"SQL*Net break/reset to client". CPU IO are OK, no contention found. It appears to be network related. However other sql were running well. When you check oracle doc of this event, it could be application related. Specially when pl/sql loop is involved. It would generate quite many such events on db side.

Solution:

Ask user to involve developers to tune the sql ,check how they handle error exception and check apps logs

Saturday, February 17, 2018

How To Use Incremental Backup to Fix DataGuard Gap

Top Steps

Get SCN of Primary and Standby
Backup incremental from scn ******** database;
alter database create standby controlfile as '/u02/backup/***_standby.ctl';
Copy backup and controlfiles to Standby Host
Startup nomount
RESTORE STANDBY CONTROLFILE FROM '/u03/backup/****_standby.ctl';
Alter database mount
Alter system set standby_file_management=manual;
Change datafile location (example commands)
alter database rename file '/u10/oradata/test/users01.dbf' to '+DATA/test/DATAFILE/USERS.623.831847799';
Change temp file location(example commands)
alter database rename file '/u10/oradata/test/temp01.dbf' to '+DATA/test/tempfile/TEMP.849.824255731';
Change temp file location(example commands)
alter database rename file '/u10/oradata/test/temp01.dbf' to '+DATA_test/lglholdu/tempfile/TEMP.849.824255731';
Change redo log location (example commands)

alter database clear logfile group 6;
alter database drop logfile group 6;
alter database add logfile thread 1 group 6;
select group#,member from v$logfile;
select group#, thread#, sequence#, status from v$log;

Catalog start with '/u03/backup';
Recover database;
Start MRP
Switch logfile in Primary to make current redo log group of standby switch as well
Drop and recreate last current redo logs