— objawy: wysypała się replikacja Data Guarda (DB 11.2.0.4), brak błędow (wisi od kilku dni), rozwiązanie: zaaplikowanie incrementala (bo „uszkodzonego” archa brak – ale to wyjdzie później)
— wysypala sie replikacja DG:
db_primary – Primary database
Error: ORA-16724: cannot resolve gap for one or more standby databases
db_standby – Physical standby database
— w alercie na standby widac, ze jest ok, czyta archa:
Tue Mar 16 10:39:37 2021
Recovery of Online Redo Log: Thread 1 Group 1 Seq 135334 Reading mem 0
Mem# 0: /u02/dbdata/db_standby/onlinelog/group_1.393.982755799
Mem# 1: /u02/dbdata/db_standby/onlinelog/group_1.448.982755803
Media Recovery Log /u02/dbdata/db_standby/archivelog/2021_03_12/o1_mf_2_83498__1615550507481734_.arc
— plik naturalnie jest
[oracle@nod_standby ~]$ ls -la /u02/dbdata/db_standby/archivelog/2021_03_12/o1_mf_2_83498__1615550507481734_.arc
-r–r—–+ 1 oracle asmadmin 3196416 Mar 12 13:20 /u02/dbdata/db_standby/archivelog/2021_03_12/o1_mf_2_83498__1615550507481734_.arc
— i MRP aplikuje archa
SQL> select process,status,thread#,sequence# from v$managed_standby;
PROCESS STATUS THREAD# SEQUENCE#
ARCH CONNECTED 0 0
ARCH CONNECTED 0 0
ARCH CONNECTED 0 0
ARCH CLOSING 2 83790
ARCH CLOSING 2 83792
ARCH CONNECTED 0 0
ARCH CONNECTED 0 0
ARCH CONNECTED 0 0
MRP0 APPLYING_LOG 2 83498
RFS IDLE 0 0
— sprawdzmy laga
SQL> set linesize 200
SQL> select name, value, unit, time_computed from v$dataguard_stats where name in (’transport lag’,’apply lag’);
NAME VALUE UNIT TIME_COMPUTED
transport lag +03 21:44:17 day(2) to second(0) interval 03/16/2021 10:44:51
apply lag +03 21:44:28 day(2) to second(0) interval 03/16/2021 10:44:51
— i jaki jest ostatni checkpoint (z tego wyzej mozna tez wyczytac)
alter session set nls_date_format=’dd-mm-yy hh24:mi:ss’;
select max(checkpoint_time) from v$datafile_header;
MAX(CHECKPOINT_TIME
12-03-21 13:00:22
— zatrzymujemy replikacje i odpalamy ponownie (czesto pomaga DB v11)
edit database db_standby set state=apply-off;
edit database db_standby set state=apply-on;
— albo:
alter database recover managed standby database cancel;
–alter database recover managed standby database using current logfile disconnect;
— czyli patrzac wyzej: transport archow jest, w alercie brak info o bledzie, a jedynie, ze czyta archa, w bazie to samo MRP aplikuje to sprawdzamy aplikowanego archa ze standby na primary w alercie (Seq: 135334):
Fri Mar 12 12:59:34 2021
LGWR: Standby redo logfile selected for thread 1 sequence 135334 for destination LOG_ARCHIVE_DEST_2
Thread 1 advanced to log sequence 135334 (LGWR switch)
Current log# 6 seq# 135334 mem# 0: +DATAC1/db_primary/onlinelog/group_6.422.982755755
Current log# 6 seq# 135334 mem# 1: +RECOC1/db_primary/onlinelog/group_6.1005.982755759
…
Fri Mar 12 13:01:03 2021
ORA-16198: LGWR received timedout error from KSR
LGWR: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16198)
LGWR: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Error 16198 for archive log file 6 to 'db_standby’
ORA-16198: LGWR received timedout error from KSR
LGWR: Error 16198 disconnecting from destination LOG_ARCHIVE_DEST_2 standby host 'db_standby’
LGWR: Failed to archive log 6 thread 1 sequence 135334 (16198)
Thread 1 advanced to log sequence 135335 (LGWR switch)
Current log# 9 seq# 135335 mem# 0: +DATAC1/db_primary/onlinelog/group_9.592.1013471479
Current log# 9 seq# 135335 mem# 1: +RECOC1/db_primary/onlinelog/group_9.21516.1013471483
— i jak widac wyzej mamy niefarta bo zaliczył timeouta – czyli jakąś część archa przekopiował, ale nie do końca (w rezultacie DB standby probowal aplikowac, ale nie dało rady i co ciekawe nie walnął żadnym błędem oprócz informacji z dgmrl-a, ze utworzyl sie gap)
— wylaczamy replikacje
alter database recover managed standby database cancel;
Database altered.
z alerta:
Tue Mar 16 11:50:24 2021
Errors with log /u02/dbdata/db_standby/archivelog/2021_03_12/o1_mf_2_83498__1615550507481734_.arc
MRP0: Background Media Recovery cancelled with status 16037
Errors in file /u01/app/oracle/diag/rdbms/db_standby/db_standby1/trace/db_standby1_pr00_59425.trc:
ORA-16037: user requested cancel of managed recovery operation
— zmieniamy mu nazwe niech walnie bledem (dla sportu może odpyta jeszcze raz o archa)
[oracle@nod_standby 2021_03_12]$ mv /u02/dbdata/db_standby/archivelog/2021_03_12/o1_mf_2_83498__1615550507481734_.arc /u02/dbdata/db_standby/archivelog/2021_03_12/o1_mf_2_83498__1615550507481734_.arc_
— odpalamy aplikowanie logow:
alter database recover managed standby database using current logfile disconnect;
Database altered.
— elegancko pojawil sie blad:
Tue Mar 16 11:53:12 2021
Recovery of Online Redo Log: Thread 1 Group 1 Seq 135334 Reading mem 0
Mem# 0: /u02/dbdata/db_standby/onlinelog/group_1.393.982755799
Mem# 1: /u02/dbdata/db_standby/onlinelog/group_1.448.982755803
Media Recovery Log /u02/dbdata/db_standby/archivelog/2021_03_12/o1_mf_2_83498__1615550507481734_.arc
Error opening /u02/dbdata/db_standby/archivelog/2021_03_12/o1_mf_2_83498__1615550507481734_.arc
Attempting refetch
Completed: alter database recover managed standby database using current logfile disconnect
Media Recovery Waiting for thread 2 sequence 83498
Fetching gap sequence in thread 2, gap sequence 83498-83498
— próbujemy odtworzyć brakującego archa (czyli tego co nie przesłal się do końca)
restore archivelog FROM SEQUENCE 83498 UNTIL SEQUENCE 83498;
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 03/16/2021 11:54:52
RMAN-20242: specification does not match any archived log in the repository
Tue Mar 16 11:55:07 2021
FAL[client]: Failed to request gap sequence
GAP – thread 2 sequence 83498-83498
DBID 3080213620 branch 982755744
FAL[client]: All defined FAL servers have been attempted.
— NAPRAWA
— pomimo tego, ze na primary i standby mamy ten sam seq to nie mozemy archa odtworzyć bo juz nie ma na backupie (zfs: „incremental / continuous backup”)
SQL> select max(sequence#) from v$archived_log where archived=’YES’;
MAX(SEQUENCE#)
135870
— wylaczamy transport logow
DGMGRL> show database db_primary
Database – db_primary
Role: PRIMARY
Intended State: TRANSPORT-ON
…
DGMGRL> edit database db_primary set state=’transport-off’;
Succeeded.
DGMGRL> show database db_primary
Database – db_primary
Role: PRIMARY
Intended State: TRANSPORT-OFF
— wyłączamy wrzucanie logow z primary na standby
alter system set log_archive_dest_state_2=’DEFER’ scope=both;
–sprawdzamy scn na standby (select current_scn from v$database;)
CURRENT_SCN
6932123446457
— na primary lecimy z incrementalem:
backup incremental from SCN 6932123446457 database tag=’standby_odswiezenie’ format '/export/warsztat/database_%d_%u_%s’;
— plik controlny z primary
SQL> select name,open_mode,database_role from v$database;
SQL> alter database create standby controlfile as '/export/warsztat/stby_control.ctl’;
alter database create standby controlfile as '/export/warsztat/stby_control.ctl’
*
ERROR at line 1:
ORA-00245: control file backup failed; target is likely on a local file system
— wrzucamy controla na ASM-a i pozniej kopiujemy na locala
alter database create standby controlfile as '+RECOC1/stby_control.ctl’;
ASMCMD> cp +recoc1\stby_control.ctl /tmp/
mv /tmp/stby_control.ctl /export/warsztat/
chown oracle:asmadmin /export/warsztat/stby_control.ctl
— to nizej robimy w przypadku gdy mamy inne lokalizacje plików bazy danych (primary na asm -> standby local)
— sprawdzamy w jakich lokalizacjach sa pliki bo jak podlozymy controla to sie rozleca lokalizacje
— pliki bazy
select 'alter database rename file ”’ || p.file_name || ”’ to ”’ || s.file_name || ”’;’ from dba_data_files p left join dba_data_files@wiech_db_standby s on p.file_id = s.file_id;
— logi (lepiej wykreowac je ponownie żeby jakiś śmietnik nie pozostał)
select 'alter database rename file ”’ || p.member || ”’ to ”’ || s.member || ”’;’ from v$logfile p left join v$logfile@wiech_db_standby s on p.group# = s.group#;
— sprawdzamy gdzie jest control file na standby, kladziemy db i kopiujemy we wszystkie lokalizacje
show parameter control
NAME TYPE VALUE
control_file_record_keep_time integer 20
control_files string /u02/dbdata/db_standby/CONTROLFILE/control01.ctl, /u02/dbdata/db_standby/CONTROLFILE/control02.ctlmv /u02/dbdata/db_standby/CONTROLFILE/control01.ctl /u02/dbdata/db_standby/CONTROLFILE/control01.ctl_usunac
mv /u02/dbdata/db_standby/CONTROLFILE/control02.ctl /u02/dbdata/db_standby/CONTROLFILE/control02.ctl_usunac
cp /export/warsztat/stby_control.ctl /u02/dbdata/db_standby/CONTROLFILE/control01.ctl
cp /export/warsztat/stby_control.ctl /u02/dbdata/db_standby/CONTROLFILE/control02.ctl
— gdyby zabraklo czegos to wracamy
–cp /u02/dbdata/db_standby/CONTROLFILE/control01.ctl_usunac /u02/dbdata/db_standby/CONTROLFILE/control01.ctl
–cp /u02/dbdata/db_standby/CONTROLFILE/control02.ctl_usunac /u02/dbdata/db_standby/CONTROLFILE/control02.ctl
— montujemy DB
srvctl start database -d db_standby -o mount
— zatrzymujemy mrp (upewniamy sie bo w mouncie…)
recover managed standby database cancel;
— zmieniamy zarzadzanie plikami na manual
alter system set standby_file_management=manual;
alter database rename file…z tych sql-i wyzej
— pozniej włączamy:
alter system set standby_file_management=auto;
— restor incrementala
rman target /
catalog backuppiece '/export/warsztat/database_db_primary_j2vpsn2m_305762′;
catalog backuppiece '/export/warsztat/database_db_primary_j3vpsn2n_305763′;
catalog backuppiece '/export/warsztat/database_db_primary_j4vpsn2o_305764′;
catalog backuppiece '/export/warsztat/database_db_primary_j5vpsn2o_305765′;
catalog backuppiece '/export/warsztat/database_db_primary_j6vpsn2o_305766′;
catalog backuppiece '/export/warsztat/database_db_primary_j7vpsn2o_305767′;
catalog backuppiece '/export/warsztat/database_db_primary_j8vpsn2o_305768′;
catalog backuppiece '/export/warsztat/database_db_primary_j9vpsn2o_305769′;
catalog backuppiece '/export/warsztat/database_db_primary_javpsn2o_305770′;
catalog backuppiece '/export/warsztat/database_db_primary_jbvpsn2o_305771′;
catalog backuppiece '/export/warsztat/database_db_primary_jcvpsn2p_305772′;
catalog backuppiece '/export/warsztat/database_db_primary_jdvpsn2p_305773′;
catalog backuppiece '/export/warsztat/database_db_primary_jevpsn2p_305774′;
catalog backuppiece '/export/warsztat/database_db_primary_jfvpsn2p_305775′;
catalog backuppiece '/export/warsztat/database_db_primary_jgvpsn2p_305776′;
catalog backuppiece '/export/warsztat/database_db_primary_jhvpsn2p_305777′;
catalog backuppiece '/export/warsztat/database_db_primary_jivpsn2p_305778′;
catalog backuppiece '/export/warsztat/database_db_primary_jjvpsn2p_305779′;
catalog backuppiece '/export/warsztat/database_db_primary_jkvpsn2p_305780′;
catalog backuppiece '/export/warsztat/database_db_primary_jlvpsn2p_305781′;
catalog backuppiece '/export/warsztat/database_db_primary_jmvpso77_305782′;list backup tag standby_odswiezenie;
— rman restor
recover database from tag standby_odswiezenie noredo;
— odpalamy mrp (bedzie blad ale sie nie przejmujemy)
alter database recover managed standby database disconnect from session;
— na primary mozemy zobaczyc:
Tue Mar 16 17:16:14 2021
FAL[server, ARC0]: FAL archive failed, see trace file.
ARCH: FAL archive failed. Archiver continuing
ORACLE Instance db_primary1 – Archival Error. Archiver continuing.
— a to znaczy, ze nie zatrzymalismy transportu logow na dest_2
— na primary:
— alter system set log_archive_dest_state_2=’DEFER’ scope=both; — jak zapomnielismy wylaczyc wyzej to teraz wylaczamy i wlaczamy
alter system set log_archive_dest_state_2=’ENABLE’ scope=both;
— dgmgrl wlaczamy transport logow:
EDIT DATABASE db_primary SET STATE=’TRANSPORT-ON’;
— na primary switch loga (ale nie trzeba jak jest ruch)
alter system switch logfile;
— czyscimy na standby:
RMAN> delete backup tag = 'standby_odswiezenie’;