Tuesday, November 3, 2015

Steps to Analyze AWR Report in Oracle

Recommendations before getting an AWR Report.


1. Collect Multiple AWR Reports: It's always good to have two AWR Reports, one for good time (when database was performing well), second when performance is poor. This way Remote DBA can easily compare good and bad report to find out the culprit.

2. Stick to Particular Time: "Database is performing slow" will not help anymore to resolve performace issues. We have to have a specific time like Database was slow yesterday at 1 Pm and continue till 4Pm. Here, DBA will get a report for these three hours.

3. Split Large AWR Report into Smaller Reports: Instead of having one report for long time like one report for 4hrs. it's is better to have four reports each for one hour. This will help to isolate the problem.

In case of RAC env. generate one report for each instance. Once, you have generated AWR report. Now, it's time of analyze the report. Since, AWR report is a huge report and area to look into AWR is also depends on problem to problem. Here, I am list most common area for a DBA to look into which will give a clear picture of the issue.

Steps to Analyze AWR Report

1. Database Details:

After getting an AWR Report This is first and Top part of the report. In this part cross check for database and instance and and database version with the Database having performance issue.This report also show RAC=YES if it's an RAC database.

 

2. Host Configuration:

This will give you name, platform CUP, socket and RAM etc. Important thing to notice is number of cores into the system. In this example there are 12 CUP's in Cores.



3. Snap Shot Detail:

This are the detail about snap shot taken, Snap start time and end time. Difference between them is as "Elapsed". Here is a new term "DB Time"

DB Time= session time spent in database.

DB Time= CPU Time + Non IDLE wait time.

You can find, DB time is very large as compared to Elapse time, which is not a concern. Check if you have taken a report for the time having performance problem. If yes fine, other wise take a report for performance problem time.

Next is Cache Sizes, which is just detail about SGA components.

4. Load Profile:

Here are few important stats for a DBA to look into. Fist is "DB CPU(s)" per second. Before that let's understand how DB CUP's work. Suppose you have 12 cores into the system. So, per wall clock second you have 12 seconds to work on CPU.


 So, if "DB CPU(s)" per second in this report > cores in (Host Configuration (#2)).

means env is CPU bound and either need more CPU's or need to further check is this happening all the time or just for a fraction of time. As per my experience there are very few cases, when system is CPU bound.

In this case, machine has 12 cores and DB CPU(s) per second is 6.8. So, this is not a CPU bound case.

Next stat to look at are Parses and Hard parses. If the ratio of hard parse to parse is high, this means Database is performing more hard parse. So, needs to look at parameters like cursor_sharing and application level for bind variables etc.

5. Instance Efficiency Percentages:


In these statistics, you have to look at "% Non-Parse CPU". If this value is near 100% means most of the CPU resources are used into operations other than parsing, which is good for database health.


6. Top 5 Timed Foreground Events:

This is another most important stats to consider while looking at AWR Report for any database performance related issue. This has a list of top 5 foreground wait events.


Here, first of all check for wait class if wait class is  User I/O , System I/O,  Others etc this could be fine but if wait class has value "Concurrency" then there could be some serious problem. Next to look at is Time (s) which show how many times DB was waiting in this class and then Avg Wait (ms). If Time(s) are high but  Avg Wait (ms) is low then you can ignore this. If both are high or Avg Wait (ms) is high then this has to further investigate.

In the above screen shot, most of the resource are taken by DB CPU = 64% DB time. Taking resource by DB CUP is a normal situation.

Let's take an example,  In which event is "log file switch (checkpoint incomplete) " which has high waits, huge Time (s) and large values in Avg Wait (ms) and wait class is configuration. So, here you have to investigate and resolve log file switch (checkpoint incomplete).

Host CPU, Instance CPU and Memory Statistics are self explanatory.  Next is RAC Statistics, I did not find any issue in these stats most of the time.

7. Time Model Statistics:

This is a detailed explanations of system resource consumptions. Stats are order by Time (s) and % of DB Time.
 
A noticeable result Sum of all  % of DB time is > 100%. why is this ?

Because this is cumulative time i.e. In this case SQL execute elapsed time is taking 89% of DB time, which includes it sub parts like parse time elapsed, hard parse elapsed time etc. So, if you find Hard parse time elapsed is taking more %. So investigate further so on and so forth.

DBA has to look for stat which is taking abnormal % of DB time. 

8. Operating System Statistics - Detail:

This is the information related to OS, what is the load status on System shown here.


This report shows, system is 62 and 70% idle at time of report taken, So, there is no resource crunch at system level. But if, you found very high busy, user or sys % and indeed this will led to low idle %. Investigate what is causing this. OS Watcher is the tool which can help in this direction.

Next, very crucial part of AWR report for a DBA is SQL Statistics. Which has all sql query details executed during report time interval.

We will explore few of them, To understand, how to analyzed these reports. Let's start with

9. SQL Ordered by Elapsed Time:

As explained by name itself, this lists SQL queries ordered by Elapsed time into reported time interval.


In this report, look for query has low executions and high Elapsed time per Exec (s) and this query could be a candidate for troubleshooting or optimizations. In above report, you can see first query has maximum Elapsed time but no execution. So you have to investigate this.

In Important point, if executions is 0, it doesn't means query is not executing, this might be the case when query was still executing and you took AWR report. That's why query completion was not covered in Report.

10. SQL Ordered by CUP Time:

In this report, SQL queries are listed on the basis of CPU taken by the query i.e. queries causing high load on the system. The top few queries could be the candidate query for optimization.








From above stat, look for queries using highest CPU Times, If a query shows executions 0, this doesn't means query is not executing. It might be same case as in SQL queries ordered by Elapsed time. The query is still executing and you have taken the snapshot.

However, There are so many other stats in AWR Report which a DBA needs to consider, I have listed only ten of them but these are the most commonly used stats for any performance related information.

Please share you view about this article, Does it helps you to understand, How to analyze AWR Report.

Oracle RAC basics

Own SGA: Each instance has it owns SGA (shared pool, buffer cache, log buffer, large pool, etc)

Background processes: each instance has it owns main background process (dbwr, lgwr, ckpt, smon, etc). RAC has additional BG process like lms, lmon, diag, etc.

 Data and control files are shared: Control files are shared between all instances of the same database. Datafiles the same, with the exception of UNDO and TEMP datafiles.

Own redo logs: each instance has it owns online redo logs to save it owns changes, but they are shared for all instances.

Own set of rollback/undo tablespaces: each instance has it owns UNDO tablespace, but they are shared for all instances.

TEMP tablespace is shared for all instance, I would recommend to use tablespace TEMP groups.
Each instance has it owns Password file
I would recommend that each RAC node has it owns ORACLE_HOME for Clusterware and Oracle Database (including ASM)
Parameter file could be shared and stored in ASM


what is the meaning of followings

 Cache fusion: RAC needs to be aware of the oracle blocks used by each instance

Guarantees cache coherency: RAC needs to guarantee cache coherency between instances, if not we can have data loss or corruption because of having multiple instances accessing same database.

No disk I/O costs: RAC will try to access all oracle blocks from memory. For example if Instance 2 wants to read some Oracle blocks and RAC detects that they are on instance 1...Instance 2 will ask for that blocks to instance 1 (memory and interconnect I/O, instead of disk I/O)

Thursday, September 17, 2015

Being a DBA

Being a DBA
Now that I have been a DBA myself and looking for a new opportunitues, I have some thoughts about
what makes a good DBA and a good DBA manager.
This is purely what I think and all suggestions are welcome.
You are accountable for company database, and that includes.
Database Availability
Backup/Recovery
Database Performance
Database Security
Database Maintainability
Database Scalability / Capacity planning.
And you can always add more………………
That said being a manager you have additional responsibilities.
1. Keeping everyone busy, each person in a team should always have all fingers dirty, means atleast 5 things to do at one time.
They can multitask and even do more than 5 easily and learn to priortize their work based on importance of work.
I have seen sometimes there is no work for days and this is really poor management.
2. Always have good communication and tools to support it too.
Like we have Lync in our office and we can chat and call anyone anytime.
3. Use yourself as the minimum acceptable standard for your team.
For example try to hire people better than you.
4. How to keep team current with technology changes.
Encourage the sharing of new knowledge, we can always train each other, send emails for anything new achieved
and taking it forward we can manage and keep an updated wiki page.
5. Always give feedback, everyone loves to get a pat on the back now and then and also taking little feedback yourself will always help.
6. Establish a productivity baseline, so that you will know when you make it better or worse.
At the end of the day Service is the king, keep the system running and available.
Will be updating this as I go along.
Vijay Tarun

Oracle Trace Utility (10053 Trace and 10046 Trace).

conn hr/hr

-- Set these initialization parameters for your trace session to guarantee the integrity of the trace file

alter session set max_dump_file_size=unlimited;
ALTER session SET timed_statistics = true;
alter session set STATISTICS_LEVEL = ALL ;
alter session set “_rowsource_execution_statistics” = true

-- in order to seperate your produced trace file easily from the others at user_dump_dest folder of Oracle
alter session set tracefile_identifier = SQL_Trace ;

-- also if you are after 10g you can produce session level ASH reports by the help of CLIENT_ID filter
-- call dbms_session.set_identifier('an_identifier') and use this identifier name as a filter

-- start tracing from this session
Alter session set SQL_Trace = true ;
-- ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';
-- ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

-- also these can be used for tracing
-- execute DBMS_SESSION.SET_SQL_TRACE ({true|false});
-- execute DBMS_SYSTEM.SET_SQL_TRACE_IN_SESSION (session_id, serial_id, {true|false});
-- execute DBMS_SYSTEM.SET_EV(session_id, serial_id, 10046, level, '');
-- for more information; http://www.ubtools.com/ubtools/products/itrprof/itrprof_user_manual.html

-- Run the application that you want to trace, any SQL(s) or any PL/SQL block(s)
select sysdate, user from dual;

-- stop tracing
Alter session set SQL_Trace = false ;
-- ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';
-- ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';

-- Than go to USER_DUMP_DEST to pick up the trace file. A query that gives your trace file name generated from your session
-- in windows without the tracefile_identifier would be as follows
--
-- select c.value || '\' || d.instance_name || '_ora_' || a.spid || '.trc' trace_file_is_here
-- from v$process a, v$session b, v$parameter c, v$instance d
-- where a.addr = b.paddr
-- and b.audsid = userenv('sessionid')
-- and c.name = 'user_dump_dest' ;
--
-- Format the trace file with TKPROF and interpret the output.
-- $ tkprof tracefile.trc output.txt [set the options as you like]
-- tkprof D:\oraclexee\app\oracle\admin\XE\udump\xe_ora_2220_bind.trc D:\temp\xe_ora_2220_bind.txt explain=hr/hr sys=yes waits=yes sort=prscpu

To find the TOP PGA USER

REM  Locate the top PGA user

set lines 75
set pages 999
set serveroutput on

spool topuser.out

declare a1 number;
            a2 number;
            a3 varchar2(30);
            a4 varchar2(30);
            a5 number;
            a6 number;
            a7 number;
            a8 number;
            blankline varchar2(70);

cursor code is select pid, spid, substr(username,1,20) "USER" , substr(program,1,30) "Program",
PGA_USED_MEM, PGA_ALLOC_MEM, PGA_FREEABLE_MEM, PGA_MAX_MEM
from v$process where pga_alloc_mem=
(select max(pga_alloc_mem) from v$process
where program not like '%LGWR%');

begin
  blankline:=chr(13);
  open code;
  fetch code into a1, a2, a3, a4, a5, a6, a7, a8;
   
  dbms_output.put_line(blankline);
  dbms_output.put_line('               Top PGA User');
  dbms_output.put_line(blankline);

  dbms_output.put_line('PID:   '||a1||'             '||'SPID:   '||a2);
  dbms_output.put_line('User Info:           '||a3);
  dbms_output.put_line('Program:            '||a4);
  dbms_output.put_line('PGA Used:            '||a5);
  dbms_output.put_line('PGA Allocated:        '||a6);
  dbms_output.put_line('PGA Freeable:             '||a7);
  dbms_output.put_line('Maximum PGA:            '||a8);

end;
/

set lines 132
col value format 999,999,999,999,999

select * from v$pgastat;

spool off

Helpful daily use commands Part 1.

1. Killing Oracle database user sessions manually.

    Login to the database.

     select 'alter system kill session ''' || sid || ',' || serial# || ''';' from v$session where username =    '<your_schema>'

     Run the generated SQL.

     In multi instance RAC - execute it on all instances or use gv$sessions.

2. SQL To find if your database is using pfile for spfile.

    select decode(count(*), 1, 'spfile', 'pfile' ) from v$spparameter where rownum=1 and              isspecified='TRUE' ;

3. SQL to find size of a schema.

    select OWNER,sum(bytes)/1024/1024/1000 "SIZE_IN_GB" from dba_segments where owner like 'schema_name' group by owner order by owner;

4. SQL to check if supplemental logging is enabled and enable if not.

    SQL> SELECT SUPPLEMENTAL_LOG_DATA_MIN, SUPPLEMENTAL_LOG_DATA_PK,    SUPPLEMENTAL_LOG_DATA_UI FROM V$DATABASE;
SUUPLEME SUP SUP
---------------- ----- ------
NO NO NO
SUUPLEME must be YES

To Update:
SQL> ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;

SQL> SELECT SUPPLEMENTAL_LOG_DATA_MIN, SUPPLEMENTAL_LOG_DATA_PK, SUPPLEMENTAL_LOG_DATA_UI FROM V$DATABASE;
SUUPLEME SUP SUP
---------------- ----- ------
YES NO NO

5. SQL to get the SCN number in database.

  SQL > set num 24
  SQL > select dbms_flashback.get_system_change_number from dual;

    GET_SYSTEM_CHANGE_NUMBER
    ------------------------
             17987351734

6. Sequence details

   select SEQUENCE_OWNER,SEQUENCE_NAME,CACHE_SIZE,LAST_NUMBER from    dba_sequences where SEQUENCE_OWNER in ('Schema_name');

7.To generate the DDL

set pagesize 0
set long 90000
set feedback off
set echo off
spool ddl_object.sql
SELECT DBMS_METADATA.GET_DDL('SEQUENCE',a.SEQ_TP_REGION_ID) from user_objects a where a.OBJECT_TYPE ='SEQUENCE' order by a.OBJECT_NAME;
spool off;

exit

8. Find the long running operations (like backup or export).

  select opname, sofar,totalwork from v$session_longops where  sofar/totalwork <> 1 and totalwork<>0;

9. Active database sessions/connections connections in RAC database.

  select inst_id, username, status, service_name, count(*) from gv$session
group by inst_id, username, status, service_name order by inst_id, service_name, status;

select username, status, machine, service_name, count(*)
      from v$session
group by username, status, machine, service_name
order by  service_name, status;

10.  Query for Encryption Wallet open and close.

    select * from gv$encryption_wallet;

    go to specifis instance and then execute
   ALTER SYSTEM SET ENCRYPTION WALLET CLOSE ;
   ALTER SYSTEM SET ENCRYPTION WALLET OPEN IDENTIFIED BY "password";

Some useful RAC 11G commands are documented below.

 STOP SEQUENCE

START SEQUENCE


OTHER USEFUL COMMANDS
11gR2 Clusterware and Grid Home – What you need to know [ID 1053147.1] [ POST VIEWS : 9871 ]