Thursday, May 29, 2014

Limitations ( internal )

How to monitor the memory limitation of the Performance Analyzer agent on an AIX server.

Some of the commands.

prtconf
will tell the memory size of the server.


System Model: IBM,9117-MMA
Machine Serial Number: 105BD0D
Processor Type: PowerPC_POWER6
Processor Implementation Mode: POWER 6
Processor Version: PV_6_Compat
Number Of Processors: 4
Processor Clock Speed: 4208 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
LPAR Info: 7 va10tuvtdw001
Memory Size: 8192 MB
Good Memory Size: 8192 MB
Platform Firmware level: EM350_063
Firmware Version: IBM,EM350_063
Console Login: enable
Auto Restart: true
Full Core: false




PA can handle upto 10,000 agents on a 32 or a 64 bit platform.
(62+803+392+1814+7265) + 201 [vmware] = 10537 = PA meets the threshold.  (note: this is far more than the 2k, and just above the 10K test agents)

On the other hand:
(62+803+392+1814+7265) + 0[vmware] = 10336 =  PA can handle but almost there.


tacmd listsystems

to get the list of all configured agents.

tacmd login -s `hostname` -u sysadmin                                  
                                                                       
tacmd listSystems ( I believe you have to run with a flag  -v option )



bootinfo -k    

# bootinfo -k
3            


LDR_CNTRL=MAXDATA=0x80000000  This allows upto 2GB of heap space.      
KPA_JAVA_ARGS=-Xms16m -Xmx500m                                          
                 


$ grep -i ldr_cntrl /opt/IBM/ITM/config/pa.ini
LDR_CNTRL=MAXDATA=0x80000000
$ grep -i kpa_java  /opt/IBM/ITM/config/pa.ini
KPA_JAVA_ARGS=-Xmx512m                                                      
                         


ulimit -a  

# ulimit -a
time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         131072
stack(kbytes)        unlimited
memory(kbytes)       unlimited
coredump(blocks)     unlimited
nofiles(descriptors) unlimited
threads(per process) unlimited
processes(per user)  unlimited


ps -ef |grep kpacma <== get the pid of the process                      
                                                                       
svmon -P <pid of kpacma process >  -O summary=basic,unit=GB            
                                                                       
(this will tell how much of the memory above is used by kpacma.)  


# ps -ef |grep kpacma
    root 14745648        1   0   Jun 30      - 105:36 /opt/IBM/ITM/aix533/pa/bin/kpacma -d -f /opt/IBM/ITM/aix533/pa/config
    root 27262998 26542232   0 16:16:43  pts/2  0:00 grep kpacma
 
root@va10puvtdw001 [/root]
# svmon -P 14745648 -O summary=basic,unit=GB
Unit: GB

-------------------------------------------------------------------------------
     Pid Command          Inuse      Pin     Pgsp  Virtual
14745648 kpacma            1.85     0.03     0.02     1.88



->lsconf | grep Memory
Memory Size: 65536 MB
Good Memory Size: 65536 MB
+ mem0                                                            Memory <========

df
Interested in "% used column.


bootinfo -y (  if it says 64 or 32 bit )
64  <===============


swap -l
device               maj,min        total       free
/dev/hd6              10,  2      1024MB      1019MB  <================



Production ITM Environment

1 Hub server
10 RTEMS where WPA is installed
1 Administrative TEPS
1 R/O TEPS
1 TDW server where DB2, SPA and TPA are installed
1 TCR/Cognos server

Test ITM Environment

1 Hub server where Administrative TEPS is installed as well
1 RTEMS where WPA is installed
1 TDW server where DB2, SPA, TPA and TCR/Cognos are installed


2. Number of Oracle and DB2 agents in ITPA.

We are collecting data for 98 Oracle agents out of 391

We are collecting DB2 data for all 61 DB2 agents


--

pa_id = `ps -ef |grep kpacma|grep -v grep |awk '{print $2 }'`
svmon -P $pa_id -O unit=auto

( 0,5,10,15,20,25,30,35,40,45,50,55 * * * * /tmp/getsvmon.sh>>/tmp/getsvmon.out )<= this is set to 5 mins apart.


Here's what I found in my enviroment , that the "InUse" Memory starts to build up from 0 ...500 M .......1 Gb......... all the way upto 2 GB ~ and when all the  "Available memory  gets used up, the itpa dies

( 0,5,10,15,20,25,30,35,40,45,50,55 * * * * /tmp/getsvmon.sh>>/tmp/getsvmon.out )<= this is set to 5 mins apart.


Here's what I found in my enviroment , that the "InUse" Memory starts to build up from 0 ...500 M .......1 Gb......... all the way upto 2 GB ~ and when all the  "Available memory  gets used up, the itpa dies

( 0,5,10,15,20,25,30,35,40,45,50,55 * * * * /tmp/getsvmon.sh>>/tmp/getsvmon.out )<= this is set to 5 mins apart.


Here's what I found in my enviroment , that the "InUse" Memory starts to build up from 0 ...500 M .......1 Gb......... all the way upto 2 GB ~ and when all the  "Available memory  gets used up, the itpa dies




Thursday, May 8, 2014

Explanation of Confidence field being shown on Tivoli Performance Analyzer panel

The Tivoli Performance Analyzer calculates amongst other things - something called Confidence of data during the course of it's computation.

Data for the TPA (Tivoli Performance Analyzer) is gathered from the Tivoli Data Warehouse and here I will write about how the Confidence value shows up on the TEPS GUI when looking at CPU Utilization for Linux OS Agent.

Confidence is the measure of how well the data being calculated is close to each other.

To begin with, the definition

The status output attributes are:
Confidence
Calculates the correlation co-efficient and multiplies it by 100 (R2 * 100) to give
an indication of how accurate the approximated trended value is. This
calculation is a product of the Least Squares Regression method and creates a
number between 0 and 100 where 0 is no confidence and 100 is a perfectly
approximated function. The number gives you a level of confidence in the
trended value calculated, and can help reduce the number of false positives in
situations.

More information can be found at :
http://publib.boulder.ibm.com/infocenter/tivihelp/v15r1/index.jsp?topic=%2Fcom.ibm.itm.doc_6.2.3fp1%2Fitpa%2Fitm_pauser.htm

Let's look at an example where I will discuss about the "CPU Utilization " on the Linux OS Agent.

If the data ( i.e AVG_CPU_Usage_Moving_Average column) in   itmuser."Linux_CPU_Averages" table is consistent- then  we can expect a good confidence level

i.e 100 %

Here I have data for 21 days and the CPU is consistently showing a 100% usage - then  it means the confidence level is high. so I' m 100% confident that the CPU Usage was pegged high all the time.





If I peeked into the TDW database, this is what I found.

AVG_CPU_Usage_Moving_Average
---------------------------------
                           100.00
                           100.00
                           100.00
                           100.00
                           100.00
                           100.00
                           100.00
                           100.00
                           100.00
                           100.00
                           100.00
                           100.00
                           100.00
                           100.00
                           100.00
                           100.00
                           100.00
                           100.00
                           100.00
                           100.00
                           100.00

  21 record(s) selected.


--

Second scenario:

Next, Let's say the CPU was pegged at  35% all the time ?

Here too - I can find that the confidence will be high, since I have seen a good consistency.






AVG_CPU_Usage_Moving_Average
---------------------------------
                            35.00
                            35.00
                            35.00
                            35.00
                            35.00
                            35.00
                            35.00
                            35.00
                            35.00
                            35.00
                            35.00
                            35.00
                            35.00
                            35.00
                            35.00
                            35.00
                            35.00
                            35.00
                            35.00
                            35.00
                            35.00

  21 record(s) selected.

--

3rd scenario:

Lets say I have varying CPU Usage then the confidence drops - as they are not consistent ( or close to each other )

( same samples count, i.e 21 days )







And the backend TDW  is showing that the data is varying ( or not around each other )

AVG_CPU_Usage_Moving_Average
---------------------------------
                            91.83
                            88.25
                            93.95
                            95.57
                            93.18
                            94.44
                            94.07
                            86.73
                            94.92
                            84.38
                            91.34
                            89.53
                            86.69
                            88.48
                            85.30
                            87.13
                            91.86
                            93.69
                            89.88
                            91.66
                            92.33

  21 record(s) selected.


Hope this small tutorial helped you when you are looking at the confidence being rendered on the TEPS GUI.