Performance Testing, LoadRunner Tips&Tricks

This site is moving to a bigger space @ LoadRunner TnT

General: Planning for Load Testing - Monitor Setup

The setting up of monitors poses some challenge and I advised you to cater one or two days to implement the monitors. LoadRunner features agentless monitoring (which other products do too). However, in order to achieve such feature it needs the SUT (System Under Test) to be configured properly.

This “configure properly” becomes sort of “out-sourced” to the system admin team and if problem arises, the blame of product issue will be reduced. Depending on the type of monitors you are implementing, you will require time and effort to clear ports and justify your case to the security or the network team before the actual implementation is possible.

This is where the initial gathering of information is important in Soliciting Requirements. With the architecture diagram, you can relate to the security or network team on the team of installation of agents or opening of ports. But of course this will come under scrutiny if you do not provide sufficient convincing information to them.

This is also the part of load test where you start to work with even more people from other teams, such as, system, network, database, security or whatever titles they are given.

Also, take note that for different applications or servers, configuration has to be done before the tool can collect monitoring information from them. Examples like perfmon for Windows, rstatd for UNIX, JMX or JMS for WebLogic.

Point to take note at this stage, every organization has different requirements in their infrastructure and for this, you need to respect that and be sensitive to it.


Related Topics

Labels: , ,


Bookmark this article now! AddThis Social Bookmark Button



technorati del.icio.us reddit digg

General: Planning for Load Testing - Protocols

Load testing is about capturing the traffic sent by the application to the server. And using the captured traffic, the tool scripts out APIs to generate the traffic, which is been emulated during replay and sent back to the server.

Through understanding the Application Design, you should be able to determine the type of protocol that is been communicated. As I pointed in previous articles , LoadRunner focuses a lot on protocol and therefore the need to know this information is critical.

To be able to read the logs and find meaningful information out of it, I would recommend you to acquire as much knowledge as possible. For example, HTTP, the replay log merely displays the transmission of the HTTP requests in text. From there, you will know information such as authentication headers or the body that was sent.

I cannot emphasize again that do not associate functional testing as load testing. Therefore, if you are from the functional testing background, please be sure to acquire more knowledge on the protocol that you are working on before proceeding.
Related Topics

Labels: , ,


Bookmark this article now! AddThis Social Bookmark Button



technorati del.icio.us reddit digg

Understanding Memory: Virtual Addressing (Illustration)

The following illustration explains the process of virtual addressing when a program is loaded into memory. For detailed explanation, refer to "Understanding Memory: Virtual Addressing".





  1. Executable program image file loads into memory.
  2. Logical memory address range of the application is divided into fixed-size chunks caled pages.
  3. Page mapping in Virtual Memory is dynamic.
  4. Frequently referenced logical addresses tend to reside in physical memory.
  5. Infrequently referenced pages are relegated to paging files on secondary disk storage.

Related Topics

Content Page - General
Virtual Addressing
Page Fault Resolution
Page Fault Resolution (Illustration)
Performance Concerns
Virtual Memory Shortage Alerts
Available Bytes
LRU
System Working Set
Detecting Memory Leaks
Measuring Memory Utilization

Labels: , ,


Bookmark this article now! AddThis Social Bookmark Button



technorati del.icio.us reddit digg

General: Planning for Load Testing - Application Design

This is the point that the load tester must understand how the application works. When we refer to how it works, it’s not about just clicks on links and buttons, or the calendar will pop up. It goes deeper such as the following but not limited to the list:

  1. What is the type of communication between the client and server?
  2. Is the application client-side?
  3. Is there client-side activities involved in the entire application?
  4. What is the authentication mechanism?
  5. Does the application allow multiple login of a single user?
  6. How does the application maintain a session?
  7. How does the load balancer distribution the load?
  8. What are the parameters been sent to the server?

Basically, this
is actually gathering the technical details of the application unlike the initial Soliciting Requirements which comprises of the “big picture” information.

Without knowing how it works and you start clicking the links and buttons, you are bound for a failing load test at the script level. Also if you were intending to seek assistance from the paid support, chances is that they will ask you if you know the application well enough such as protocol used, client-side activities, and so on so forth.

This information should be consulted with the application developer as he/she is the one that developed it and will be the most knowledgeable amount the performance test team.

However, take note that you may bump into an inexperienced developer of the application, which might end up providing wrong information and wasting everyone’s time and effort. My advice is to be sure of the technology you are working with before approaching them.

Another thing to note, from a consultant or vendor company perspective, I do recognize that the load tester may have trouble conveying such messages or questions to your client’s development team. The time for them to answer your questions might not be as fruitful as writing another two modules to them. But bare minimum is that you must be able to translate and convince them the importance for the need to know the application design.


Related Topics

General: Planning for Load Testing
General - Content Page


Labels: , ,


Bookmark this article now! AddThis Social Bookmark Button



technorati del.icio.us reddit digg

General: Planning for Load Testing - Soliciting Requirements

Prior to this stage, I always give a “sales talk” of what we do for a living and also to get them excited (if they are not). Gathering and soliciting accurate requirements will save you lots of unnecessary work.

Be sure to get hold of the requirements to monitor such as acceptable Transaction Response Time or server utilization. Clients need to be clear of what they want before you can proceed to achieve (unless is you who proposes the benchmark). For example, if CIO wants the response time for a search to be fewer 10 seconds will be a good metric to achieve. This is also the exit criteria as per se for the load test project.

They may also like to know the utilization of the servers under load and not really concerned with the response time. Usually, these are applications that are already problematic and it’s what they are hiring you to dig the cause of the bottleneck.

Always get an overview of the architecture which includes all components that inhabitant it. You can leave out the number of switches or routers if they do not provide much usage except routing and separating networks. It is important to get information like number of servers and what is housed in it, load balancers, firewall and databases. This will aid you in discussion and knowing what machines to be monitored.

Assist the clients in understanding which business process may raise performance issues or they see it’s criticality in the business competitive edge. With this, it can translate into the scripts that will be used in the load test.


Related Topics

General: Planning for Load Testing
General - Content Page



Labels: , ,


Bookmark this article now! AddThis Social Bookmark Button



technorati del.icio.us reddit digg

General: Planning for Load Testing

After writing a couple of posts for the past few months, I would like to touch on the basic understanding of using LoadRunner in Performance Testing. This will provide an overview how you should facilitate the entire load testing process and be aware of the requirements that surfaces. If course, this is not limited to other load testing tools because the concepts are the same.

There are few areas to be aware or trained in order to have the load test proceed with minimal difficulties. I’ve broken them down into the following: namely, soliciting requirements, application design, protocols, monitor setup, monitoring, analyzing and recommendations.

Related Topics

Labels: , ,


Bookmark this article now! AddThis Social Bookmark Button



technorati del.icio.us reddit digg

Understanding Memory: Virtual Addressing

Virtual memory is a feature supported by most advanced processors. Hardware support for virtual memory includes hardware mechanism to map from logical (virtual) memory address that application programs reference to physical (real) memory hardware addresses.

When an executable program’s image file is first loaded into memory, the logical memory address range of the application is divided into fixed-size chunks called pages. These logical pages are them mapped to similar-sized physical pages that are resident in real memory.

This mapping is dynamic in that frequently referenced logical addresses tend to reside in physical memory (also know s RAM, real memory, or main memory), while infrequently referenced pages are relegated to paging files on secondary disk storage. The active subset of virtual memory pages associated with a single process’s address space currently resident in RAM is known as the process’s working set, because they are the active pages referenced by the programs as it executes.

Virtual addressing is designed to be transparent to application programs, allowing them to be written without regard to specific real memory limitations of this or that computer. Virtual addressing even makes it possible for an executing program to reference an address space that is larger than the amount of physical memory installed on a particular computer. Virtual addressing allows a programmer to exploit what looks like a virtually infinite-sized computer memory where each individual process can address up to 4 GB of virtual addresses. As of the above, performance issues still exists when the OS attempts to reference more memory locations than can actually fit inside real memory.

Virtual memory system works well because executing programs seldom require all the associated code and data areas they allocate to be resident in physical memory concurrently in order to run. With virtual memory, only the active pages associated with a program’s current working set remain resident in real memory. On the other hand, virtual memory systems can run very poorly when the working sets of active processes greatly exceed the amount of physical RAM that the computer contains. It is important to understand the logical and physical memory usage to diagnose performance problems arising from real memory being over-committed.

The above was extracted from the book, Windows 2000 Performance Guide by Mark Friedman & Odysseas Pentakalos.


Related Topics

Content Page - General
Page Fault Resolution
Page Fault Resolution (Illustration)
Performance Concerns
Virtual Memory Shortage Alerts
Available Bytes

LRU
System Working Set
Detecting Memory Leaks
Measuring Memory Utilization

Labels: , ,


Bookmark this article now! AddThis Social Bookmark Button



technorati del.icio.us reddit digg

Illustration of Page Fault Resolution. To better understand how Page Fault occurs with respect to the illustration, refer to article, "General: Understanding Memory - Page Fault Resolution".


Fig 1: Page Fault Resolution

Page Fault Resolution, page fault, memoryThe sequence of Page Fault resolution as follows.

  1. Thread attempts to reference page in memory.
  2. Page not resident in real memory thus not found.
  3. Hardware interrupt occur to resolve page fault.
  4. ISR gains control to validate the referenced address.
  5. Locate page on secondary storage.
  6. Copy page into available free page in real memory.
  7. Resume thread execution cycle.

The time taken for Page Fault Resolution thus equivalent from Step [3] to [6]. Having mentioned that, if memory is the performance bottleneck with the number of Page Faults increases over time, the application/system may experience delay due to the time needed to perform Page Fault Resolution pointed in Step [3] to [6].


Related Topics

Page Fault Resolution
Performance Concerns
Virtual Memory Shortage Alerts
Available Bytes

LRU
System Working Set
Detecting Memory Leaks
Measuring Memory Utilization

Labels: , , ,


Bookmark this article now! AddThis Social Bookmark Button



technorati del.icio.us reddit digg

Basics: Network Bottlenecks

At the network level, many things can affect performance. The bandwidth (the amount of data that can be carried by the network) tends to be the first culprit checked. Assuming you have determined that bad performance is attributable to the network component of an application, there is more likely cause of bad network performance than network bandwidth. The most likely cause of bad network performance is the application itself and how it is handling distributed data and functionality.


The overall speed of a particular network connection is limited by the slowest link in the connection chain and the length of the chain. Identifying the slowest link is difficult and may not even be consistent: it can vary at different times of the day or for different communication paths. A network communication path lead from an application through a TCP/IP stack (which adds various layers of headers, possibly encrypting and compressing data as well), then through the hardware interface, through a modem, over a phone line, through another modem, over to a service provider’s router, through many heavily congested data lines of various carrying capacities and multiple routers with different maximum throughputs and configurations, to a machine at the other end with its own hard interface, TCP/IP stack and application. A typical web download route is just like this. In addition, there are dropped packets, acknowledgements, retries, bus contention, and so on.

Because so many possibilities causes of bad network performance are external to an application, one option you can consider including in an application is a network speed testing facility that reports to the user. This should test the speed of data transfer from the machine to various destinations: to itself, to another machine on the local network, to the Internet Service Provider, to the target server across the network, and to any other destinations appropriate. This type of diagnostics report can tell users that they are obtaining bad performance from something other than your application. If you feel that the performance of your application is limited by the actual network communication speed, and not by other (application) factors, this facility will report the maximum possible speeds to your user.

Latency

Latency is different from the load-carrying capacity (bandwidth) of a network. Bandwidth refers to how much data can be sent down the communication channel for a given period of time and is limited by the link in the communication chain that has the lowest bandwidth. The latency is the amount of time a particular data packet takes to get from one end of the communication channel to the other. Bandwidth tells you the limits within which your application can operate before the performance become affected by the volume of data being transmitted. Latency often affects the user’s view of the performance even when bandwidth isn’t a problem.

In most cases, especially Internet traffic, latency is an important concern. You can determine the basic round-trip time for a data packets from any two machines using the ping utility. This utility provides a measure of the time it takes a packet of data to reach another machine and be returned. However, the time measure is for a basic underlying protocol (ICMP packet) to travel between the machines. If the communication channel is congested and the overlying protocol requires re-transmissions (often the case for Internet traffic), one transmission at the application level can actually be equivalent to many round trips.

It is important to be aware of these limitations. It is often possible to tune the application to minimize the number of transfers by packing data together, caching and redesigning the distributed application protocol to aim for a less conversational mode of operation. At the network level, you need to monitor the transmission statistics (using the ping and netstat utilities and packet sniffers) and consider tuning any network parameters that you have access to in order to reduce re-transmissions.

TCP/IP Stacks

The TCP/IP stack is the section of code that is responsible for translating each application-level network request (send, receive, connect, etc.) through the transport layers down to the wire and back up to the application at the other end of the connection. Because the stacks are usually delivered with the operation system and performance-tested before delivery (since a slow network connection on an otherwise fast machine and fast network is pretty obvious), it is unlikely that the TCP/IP stack itself is a performance problem.

In addition to the stack itself, stacks include several tunable parameters. Most of these parameters deal with transmission details beyond the scope of the book. One parameter worth mentioning is the maximum packet size. When your application sends data, the underlying protocol breaks the data into packets that are transmitted. There is an optimal size for packets transmitted over a particular communication channel, and the packet size actually used by the stack is compromise. Smaller packets are less likely to be dropped, but they introduced more overhead, as data probably has to be broken up into more packets with more header overhead.

If your communication takes place over a particular set of endpoints, you may want to alter the packet sizes. For a LAN segment with no router involved, the packets can be big (e.g. 8KB). For a LAN with routers, you probably want to set the maximum packet size to the size the routers allow to pass unbroken. (Routers can break up the packets into smaller ones; 1500 bytes is the typical maximum packet size and the standard for the Ethernet. The maximum packet size is configurable by the router’s network administrator.) If your application is likely to be sending data over the Internet and you cannot guarantee the route and quality of routers it will pass through, 500 bytes per packet is likely to be optimal.

Network Bottlenecks

Other causes of slow network I/O can be attributed directly to the load or configuration of the network. For example, a LAN may become congested when many machines are simultaneously trying to communicate over the network. The potential throughput of the network could handle the load, but the algorithms to provide communication channels slow the network, resulting in a lower maximum throughput. A congested Ethernet network has an average throughput approximately one third the potential maximum throughputs. Congested networks have other problems, such as dropped network packets. If you are using TCP, the communication rate on a congested network is much slower as the protocol automatically resends the dropped packets. If you are using UDP, your application must resend multiple copies for each transfer. Dropping packets in this way is common for the Internet. For LANs, you need to coordinate closely with network administrators to alert them to the problem. For single machines connected by a service provider, suggesting improvements. The phone line to the service provider may be noisier than expected: if so, you also need to speak to the phone line provider. It is also worth checking with the service provider, who should have optimal configurations they can demonstrate.

Dropped packets and re-transmissions are a good indication of network congestion problems, and you should be on constant lookup for them. Dropped packets often occur when routers are overloaded and find it necessary to drop some of the packets being transmitted as the router’s buffer overflow. This means that the overlying protocol will request the packets to be resent. The netstat utility lists re-transmission and other statistics that can identify these sorts of problems. Re-transmissions may indicate that the maximum packet size is too large.

DNS Lookup

Looking up network address is an often-overlooked cause of bad network performance. When your application tries to connect to a network address such as foo.bar.somthing.org (e.g. downloading a webpage from http://foo.bar.something.org), your application first translates foo.bar.somthing.org into a four-byte network IP address such as 10.33.6.45. This is the actual address that the network understands and uses for routing network packets. The is this translation works is that your system is configured with some seldom-used files that can specify this translation, and a more frequently used Domain Name System (DNS) server that can dynamically provide you with the address from the given string. DBS translation works as follows:

  1. The machine running the application sends the text string of the hostname (e.g. foo.bar.something.org) to the DNS server.
  2. The DNS server checks its cache to find an IP address corresponding to that hostname. If the server does not find an entry in the cache, it asks its own DNS server (usually further up the Internet domain-name hierarchy) until ultimately the name is resolved. (This may be by components of the name being resolved, e.g. first .org, then something.org, etc, each time asking another machine as the search request is successively resolved.) This resolved IP address is added to the DBS server’s cache.
  3. The IP address s returned to the original machine running the application.
  4. The application uses the IP address to connect to the desired destination.

The address lookup does not need to be repeated once a connection is established, but any other connections (within the same session of the application or in other session s at the same time and later) need to repeat the lookup procedure to start another connection.

You can improve this situation by running a DNS server locally on the machine, or on a local server if the application uses a LAN. A DNS server can be run as a “caching-only” server that resets its cache each time the machine is rebooted. There would be little point in doing this if the machine used only one or two connections per hostname between successive reboots. For more frequent connections, a local DNS server can provide a noticeable speedup to connections. Nslookup is useful for investigating how a particular system does translations.



Related Topic


Content Page - General



Labels: , , , ,


Bookmark this article now! AddThis Social Bookmark Button



technorati del.icio.us reddit digg

Tools: Monitoring Unix Resource Script ver 1.0

Due to the past experiences working in my organization, having the problem of not able to monitor the Unix system resources due to whatever reasons (rstatd configuration, dynamic ports, security reasons or lazy administrators) had always been bothering us.


Our team came up with the simple initiative to monitor using the same concept as perfmon. We requested (outsourced) one of our interims to search for unix monitoring scripts and modify it to collect data periodically. And here we are, special thanks to the interim, we finally got the first-cut of the scripts.


Copy and paste the below script starting from here ==================================================================




#!/bin/bash
# (C) 2006 Mark Boddington,
http://www.badpenguin.co.uk/


# (C) 2007 Modified by Hwee Seong, http://loadrunnertnt.blogspot.com
# Licensed under the GNU GPL Version 2.
# ***** Configuration *****
# set LOG to the directory you want to write the performance data to.
# set HDD to the number of had disks in your machine.
# NOTE : Do not change the variable LIMIT
LOG=/home/Admin/report
HDD=2
LIMIT=0
genStat()
{
test=$1
SLEEP=$2
filename=$(date +%Y%m%d%H%M%S)
for((num=0;num> ${LOG}/vm.${filename}.log &
sleep $SLEEP
done
if [[ LIMIT -eq test ]]
then
#VM csv
echo Date,Time,Procs in Run Queue,Procs Blocked,Virtual Memory Used,Idle Memory,Buff Memory Used,Cache Memory Used,Swap In,Swap Out,Blocks Received,Blocks Sent,System Interrupt,Context Switches/sec,Cpu User Time,Cpu System Time,Cpu Idle Time,Cpu Waiting Time> ${LOG}/vm.${filename}.csv
cat ${LOG}/vm.${filename}.log awk '{for(l=1;l<17;l++)>> ${LOG}/vm.${filename}.csv
fi
}
case $1 in
run)
genStat $2 $3
;;
*)
echo -e ":::: Usage ::::"
echo -e "$0 run 5 10 : Collect stats for 5 times with 10 seconds interval"
echo -e "Note : Change the first parameter for the number of stats"
echo -e " Change the second parameter for the number of seconds between each stats"
;;
esac



==================================================================


The original script was developed by Mark Boddington and could be obtained from http://www.badpenguin.co.uk


The modified script allows the interval between each collection and the number of times the collection is made. However, this script lacks the ability to set the output path and it is created based on vmstat. Also, you will have to estimate the completion time for the scenario to complete.


To use the script, copy the quoted content (above) and name it MonitorPerformance.sh. Then in the directory that the script exists, type the following:



\MonitorPerformance.sh 5 10



Where 5 is the number of times the collection is made and 10 is the interval between each collection in seconds.


Once the collection is complete, import the file in analysis. Our settings in Analysis are as follows.




This is a minimal start for us and we intend to further enhance the script to allow the following (provided we can squeeze some time out):


1. Changing of the output path


2. Include iostat and netstat.


Please feel free to use the script and we welcome suggestions, modifications and enhancement to the scripts to make it a better tool.



(simple tools making lifes easier)




Related Topics


Tools: Duplicating Files Prior Load Test


Tools: Determine the Windows patches installed using Windows 2003 Resource Kit



Labels: , , , , , ,


Bookmark this article now! AddThis Social Bookmark Button



technorati del.icio.us reddit digg

Scripts: Starting a new transaction during iterations

Recently one of my load testing projects required us to monitor the time taken to upload a file into the application. This sounds simple and straight forward with the implementation of the lr_start_transaction and the lr_end_transaction. However, the catch is that the action block be iterated 10 times and within the 10 iterations, one of the iteration is to upload a different file of different size (thus causing different response time), which they want to measure.

What we did was implement a if-then-else to check the file and perform a lr_start_transaction when the script is uploading the different file and end the transaction with the lr_end_transaction when the submission completes. The snippet of the code is as followed:


if (!strcmp(lr_eval_string("{file}"), lr_eval_string("{set_file}")))
{
lr_start_transaction("tx_check_upload_different_file");
}
else
{
lr_start_trasaction("tx_tx_check_same_upload_file");
}
...
...
\\other codes
...
...
if (!strcmp(lr_eval_string("{file}"),lr_eval_string("{set_file}")))
{
lr_end_transaction("tx_check_upload_different_file",LR_AUTO);
}
else
{
lr_end_trasaction("tx_check_same_upload_file",LR_AUTO);
}

The code detects the if the uploaded file is different and will start tracking the timing for the upload transaction. Hope it's useful to you.


Related Topics

Scripts: Duplicating Files on the Fly

Scripts: Step Download Timeout

Scripts: VIEWSTATE

Scripts: Auto or Manual Correlation?

Scripts: Remove Think Time

Scripts: Set Debug Mode in Script

Scripts: Replay Failure – Use Full Extended Log!







Labels: , , , , ,


Bookmark this article now! AddThis Social Bookmark Button



technorati del.icio.us reddit digg

Basics: Memory Bottlenecks

Maintaining watch directly on the system memory (RAM) is not usually that helpful in identifying performance problems. A better indication that memory might be affecting performance can be gained by watching for paging of data from memory to the swap files. Most current OS have a virtual memory that is made up of the actual (real) system memory using RAM chips, and one or more swap files on the system disks. Processes that are currently running are operating in real memory. The OS can take pages from any of the processes currently in real memory and swap them out to disk. This is known as paging. Paging leaves free space in real memory to allocate to other processes that need to bring in a page from disk.

Obviously, if all the processes currently running can fit into real memory, there is no need for the system to swap out any pages. However, if there are too many processes to fit into real memory, paging allows the system to free up system memory to run more processes. Paging affects system performance in many ways. One obvious way is that if a process has had some pages moved to disk and the process becomes run-able, the OS has to pull back the pages from dusk before that process can be run. This leads to delays in performance. In addition, both CPU and the disk I/O spend time doing the paging, reducing available processing power and increasing the load on the disks. This cascading effect involving both the CPU and I/O can degrade the performance of the whole system in such a way that it maybe difficult to even recognize that paging is the problem. The extreme version of too much paging is thrashing, in which the system is spending so much time moving pages around that it fails to perform any other significant work. (The next step is likely to be a system crash).

As with run-able queues (see CPU section), a little paging of the system does not affect the performance enough to cause concern. In fact, some paging can be considered good. It indicated that the system’s memory resources are fully utilized. But at the point where paging becomes a significant overhead, the system is overloaded.

Monitoring paging is relatively easy. On UNIX, the utilities vmstat and iostat provide details as to the level of paging, disk activity and memory levels. On Windows, the performance monitor has categories to show these details, as well as being able to monitor the system swap files.

If there is more paging than is optimal, the system’s RAM is insufficient or processes are too big. To improve this situation, you need to reduce the memory being used by reducing the number of processes or the memory utilization of some processes. Alternatively, you can add RAM. Assuming that it is your application that is causing the paging (Otherwise, either the system needs an upgrade, or someone else’s processes may also have to be tuned), you need to reduce the memory resources you are using.

When the problem is caused by a combination of your application and others, you can partially address the situation by using process priorities (see the CPU “section”). The equivalent to priority levels for memory usage is an all-or-nothing option, where you can lock process in memory. This option is not available on all systems and is more often applied to shared memory than to processes, but nevertheless, it is useful to know. If this option is applied, the process is locked into real memory and is not paged out at all. You need to be aware that using this option reduces the amount of RAM available to all other processes, which can make overall system performance worse. Any deterioration in system performance is likely occurring at heavy system load, so make sure you extrapolate the effect of reducing the system memory in this way.


Related Topic


Content Page - General



Labels: , ,


Bookmark this article now! AddThis Social Bookmark Button



technorati del.icio.us reddit digg

Products: SilkPerformer (a comparison with LoadRunner)

SilkPerformer is a load testing tool from Borland, originally Segue. SilkPerformer have almost all features that LoadRunner have. Therefore, you may like to read about the product LoadRunner here before proceeding further on this article. The SilkPerformer suite consists of the Workbench, Silk TrueLog and Silk Performance Explorer. The entire load test implementation is the same where the load generators have to be installed at the machines. For the sales talk by the vendor, click here.

What SilkPerformer offer is it handles load test in project approach. Using this approach, each load test is stored as a project with various settings of profiles and workloads. This is advantageous when managing the load test in a customer or project perspective where you can track the projects accordingly. Furthermore, within a project, for every load test run, a result directory for each run is created. This is useful in tracking the number of runs that were conducted. In this way, there is a structural approach in managing the tests that were conducted. Both features mentioned above are not available in LoadRunner.

SilkPerformer have another cool feature, which is resource management of the load generator. By defining a pool of load generators, the load tester is able to have an overview of the resources that is participating in the load test. This is almost similar to the Resource Pool in Performance Center. This is not available in LoadRunner.

During the execution, SilkPerformer allows monitoring of the resources in terms of CPU usage in the Load Generator. This is beneficial when differing problem in resources on the Load Generator or the SUT itself. Furthermore, feature allows capacity planning for load generators. SilkPerformer also allows monitoring on the progress of the vusers and load test in terms of percentage that displays the progress in percentage of the vusers. This is beneficial in estimating the duration of the load test and forecast the end time of the load test.

While performing the evaluation, I found that the parameterization feature offered in SilkPerformer is not as user-friendly as compared to LoadRunner. The manipulation of the parameterization is not as straightforward such as defining the parameter and reusing the parameter unlike LoadRunner where a replace function can easily achieve the parameterization.
For iteration, SilkPerformer have to define the iteration number in the script. For pacing a LoadRunner feature, is not offered in the SilkPerformer.

SilkPerformer provides Rendezvous feature in terms of serialisation and synchronisation. This is useful if a certain load model needs to be emulated during the test (better than LoadRunner). However, the downside is it requires inclusion of the API which is not as convenient as LoadRunner in implementing a Rendezvous point.

For replay, the Silk TrueLog is a powerful tool in analysing the replay as compared to LoadRunner Runtime Viewer. If offers replay of the screens and providing views of the data in and out from the client and server.

SilkPerformer allows monitoring similar to LoadRunner. However, LoadRunner provides more features in the monitoring perspective. SilkPerformer requires additional logging for non-default monitors unlike LoadRunner, which automatically logs the data when the counter is been added into the monitor list. Also, SilkPerformer is restricted to a defined set of custom monitors namely JMX, SNMP, Perfmon and Rexec unlike LoadRunner, which provides more monitors.

SilkPerformer utilises Silk Performance Explorer for analysing. However, it’s capability to merge, manipulate, handle external data is not as flexible and robust as LoadRunner Analysis. The analysing feature is an important component used by the Performance Team to determine problem causes.

SilkPerformer have the similar license mechanism as LoadRunner, which is bounded by a single host. However it features an additional mechanism that allows the option to "check-out" license to the installed application. This is useful in an organisation context in managing our performance test in different environments.
Related Topics

Labels: , ,


Bookmark this article now! AddThis Social Bookmark Button



technorati del.icio.us reddit digg

Basics: Disks Bottlenecks

In most cases, applications can be tuned so that disk I/O does not cause any serous performance problems. But if, application tuning, you find that disk I/O s still causing a performance problem; your best bet may be to upgrade the system disks. Identifying whether the system has a problem with disk utilization is the first step. Each system provides its own tools to identify disk usage (Windows has a performance monitor, and UNIX has the sar, vmstat, iostat utilities.) At minimum, you need to identify whether the paging is an issue (look at disk-scan rates) and assess the overall utilization of your disks (e.g. performance monitor on Windows, output from iostat –D on UNIX). It may be that the system has a problem independent of your application (e.g. unbalanced disks), and correcting this problem may resolve the problem issue.

If the disk analysis does not identify an obvious system problem that is causing the I/O overhead, you could try making a disk upgrade or a reconfiguration. This type of tuning can consist of any of the following:

  • Upgrading to faster disks
  • Adding more swap space to handle larger buffers
  • Changing the disk to be striped (where files are striped across several disks, thus providing parallel I/O. e.g. with a RAID system)
  • Running the data on raw partitions when this is shown to be faster.
  • Distributing simultaneously accessed files across multiple disks to gain parallel I/O
  • Using memory-mapped disks or files


If you have applications that run on many systems and you do not know the specification of the target system, bear in mind that you can never be sure that ant particular disk is local to the user. There is a significant possibility that the disk being used by the application is a network-mounted disk. This doubles the variability in response times and throughput. The weakest link will probably not even be constant. A network disk is a shared resource, as is the network itself, so performance is hugely and unpredictably affected by other users and network load.

Disk I/O

Do not underestimate the impact of disk writes on the system as a whole. For example, all database vendors strongly recommend that the system swap files be placed on a separate disk from their databases. The impact of if not doing so can decrease database throughput (and system activity) but an order of magnitude. This performance decreases come from not splitting I/O of two disk-intensive applications (in this case, OS paging and database I/O).

Identifying that there is an I/O problem is usually fairly easy. The most basic symptom is that things take longer than expected, while at the same time the CPU is not at all heavily worked. The disk-monitoring utilities will also tell you that there is a lot of work being done to the disks. At the system level, you should determine the average peak requirements on the disks. Your disks will have some statistics that are supplied by the vendor, including:

The average and peak transfer rates, normally in megabytes (MB) per seconds, e.g. 5MB/sec. Form this, you can calculate how long an 8K page takes to be transferred from disk, and for example, 5MB/sec is about 5K/ms, so an 8K page takes just under 2ms to transfer.

Average seek time, normally in milliseconds (ms). This is the time required for the disk head to move radially to the correct location on the disk.

Rotational speed, normally in revolutions per minutes (rpm), e.g. 7200rpm. From this, you can calculate the average rotational delay in moving the disk under the disk-head reader, i.e., the time taken for half a revolution. For example, for 7200rpm, one revolution takes 60,000ms (60 seconds) divided by 7200rpm, which is about 8.3 ms. So half a revolution takes just over 4ms, which is consequently the average rotational delay.

This list allows you to calculate the actual time it takes to load a random 8K page from the disk, this being seek time + rotational delay + transfer time. Using the examples given in the list, you have 10 + 4 + 2 = 16 ms to load a random 8K page (almost an order of magnitude slower than the raw disk throughput). This calculation gives you a worst –case scenario for the disk-transfer rates for your application, allowing you to determine if the system is up to the required performance. Note that if you are reading data stored sequentially in disk (as when reading a large file), the seek time and rotational delay are incurred less than once per 8K page loaded. Basically, these two times are incurred only at the beginning of opening the file and whenever the file is fragmented. But this calculation is confounded by other processes also executing I/O to the disk at the same time. This overhead is part of the reason why swap and other intensive I/O files should not be put on the same disk.

One mechanism for speeding up disk I/O is to stripe disks. Disk striping allows data from a particular file to be spread over several disks. Striping allows reads and writes to be performed in parallel across the disks without requiring any application changes. This can speed up disk I/O quite effectively. However, be aware that the seek and rotational overhead previously listed still applies, and if you are making many small random reads, there may be no performance gain from striping disks.

Finally, note again that using remote disks adversely affects I/O performance. You should not be using remote disks mounted from the network with any I/O-intensive operations if you need good performance.


Clustering Files

Reading many files sequentially is faster if the files are clustered together on the disk, allowing the disk-head reader to flow from one file to the next. This clustering is best done in conjunction with defragmenting the disks. The overhead in finding the location of a file on the disk (detailed in the previous section) is also minimized for sequential reads if the files are clustered.

If you cannot specify clustering files at the disk level, you can still provide similar functionality by putting all the files together into one large file (as is done with the ZIP file systems). This is fine if all the files are read-only files or if there is just one file that is writable (you place that at the end). However, when there is more than one writable file, you need to manage the location of the internal files in your system as one or more grow. This becomes a problem and is not usually worth the effort. (If the files have a known bounded size, you can pad the files internally, thus regaining the single file efficiency.)

Cached File Systems (RAM Disks, tmpfs, cachefs)

Most OS provide the ability to map a file system into the system memory . This ability can speed up reads and writes to certain files in which you control your target environment. Typically, this technique has been used to speed up the reading and writing of temporary files. For example, some compilers (of languages in general, not specifically Java) generate many temporary files during compilation. If these files are created and written directly to the system memory, the speed of compilation is greatly increased. Similarly, if you have the a set of external files that are needed by your application, it is possible to map these directly into the system memory, thus allowing their reads and writes to be speeded up greatly.

But note that these types of file systems are not persistent. In the same way the system memory of the machine gets cleared when it is rebooted, so these file systems are removed on reboot. If the system crashes, anything in a memory-mapped file system is lost. For this reason, these types of file systems are usually suitable only for temporary files or read-only versions of disk-based files (such as mapping a CD-ROM into a memory-resident file system).

Remember that you do not have the same degree of fine control over these file systems that you have over your application. A memory-mapped file system does not use memory resources as efficiently as working directly from your application. If you have direct control over the files you are reading and writing, it is usually better to optimize this within your application rather than outside it. A memory-mapped file system takes space directly from system memory. You should consider whether it would be better to let your application grow in memory instead of letting the file system take up that system memory. For multi-user applications, it is usually more efficient for the system to map shared files directly into memory, as a particular fule then takes up just one memory location rather than duplicate in each process. Note that from SDK 1.4, memory-mapped files are directly supported from the java.nio package. Memory-mapped files are slightly different from memory-mapped file systems. A memory-mapped file uses system resources to read the file into system memory, and that data can then be accessed form Java through the appropriate java.nio buffer. A memory-mapped file system does not require the java.nio package and, as far as Java is concerned, files in that file system are simply files like any others. The OS transparently handles the memory mapping.

The creation of memory-mapped file systems is completely system-dependent, and there is no guarantee that it is available on any particular system (though most modern OS do support this feature). On UNIX system, the administrator needs to look at the documentation of the mount command and its subsections on cachefs and tmpfs. Under Windows, you should find details by looking at the documentation on how to setup a RAM disk, a portion of memory mapped to a logical disk drive.

In a similar way, there are products available that pre-cache shared libraries (DLL) and even executables in memory. This usually means only that an application starts quicker or loads the quicker, and so may not be much help in speeding up a running system.

But you can apply the technique of memory-mapping file systems directly and quite usefully for applications in which processes are frequently started. Copy the Java distribution and all class files (all JDK, application, and third-party class files) onto a memory-mapped file system and ensure that all executions and classloads take place from the file system. Since everything (executables, DLLs, class files, resources, etc.) is already in memory, the startup time is much faster. Because only the startup (and class loading) time is affected, this technique gives only a small boost to applications that are not frequently starting processes, but can be usefully applied if startup time is a problem.


Disk Fragmentation

When files are stored on disk, the bytes in the files are note necessarily stored contiguously: their storage depends on file size and contiguous space available on the disk. This non-contiguous storage is called fragmentation. Any particular file may have some chunks in one place, and a pointer to the next chunk that may be quite a distance away on the disk.

Hard disks tend to get fragmented over time. This fragmentation delays both reads from files (including loading applications into computer memory on startup) and writes to files. This delay occurs because the disk header must wind on to the next chunk with each fragmentation, and this takes time.

For optimum performance on any system, it is a good idea to periodically defragment the disk. This reunites files that have been split up so that disk heads do not spend so much time searching for data once the file-header locations have been identified, thus speeding up data access. Defragmenting may not be effective on all systems, however.

Disk Sweet Spots

Most disks have a location from which data is transferred faster than from other locations. Usually, the closer the data is to the outside edge of the disk, the faster it can be read from the disk. Most hard disks rotate at constant angular speed. This means that the linear speed of the disk under a point is faster the farther away the point is from the center of the disk. Thus, data at the edge of the disk can be read from (and written to) at the faster possible rate commensurate with the maximum density of data storable on disk.

This location with faster transfer rates usually termed the disk sweet spot. Some
(Commercial) utilities provide mapped access to the underlying disk and allow you to reorganize files to optimize access. On most server systems, the administrator has control over how logical partitions of the disk apply to the physical layout, and how to position files to the disk sweet spots. Experts for high-performance database system sometimes try to position the index tables of the database as close as possible to the disk sweet spot. These tables consist of relatively small amounts of data that affect the performance of the system in a disproportionately large way, so that any speed improvement in manipulating these tables is significant.

Note that some of the latest OS are beginning to include “awareness” of disk sweet spots, and attempt to move executables to sweet spots when defragmenting the disk. You may need to ensure that the defragmentation procedure does not disrupt your own use of the disk sweet spot.

The above is taken from the publication, "Java Performance Tuning" written by Jack Shirazi. I would recommend to read this book as it provides not just tuning and bottleneck concepts bounded by Java.


I would also recommend visiting the site author by Jack himself and a couple of his mates. It is a very resourceful site for Java performance-related information. Click here to access it.

Related Topic


Content Page - General

Labels: , , , , ,


Bookmark this article now! AddThis Social Bookmark Button



technorati del.icio.us reddit digg

Basics: CPU Bottlenecks

Java provides a virtual machine runtime system that is just that: an abstraction of a CPU that runs in software. (Note that this chapter is taken from the "Java Performance Tuning" written by Jack Shirazi and therefore alot of discussions circled around Java technologies.) These virtual machines run on a real CPU, and in this section the book discuss the performance characteristics of those real CPUs.

CPU Load

The CPU and many other parts of the system can be monitored using system-level utilities. On Windows, the task manager and performance monitor can be used for monitoring. On UNIX, a performance monitor (such as perfmeter) is usually available, as well as utilities such as vmstat. Two aspects of the CPU are worth watching as primary performance points. These are the CPU utilization (usually expressed in percentage terms) and the run-able queue of processes and threads (often called the load or the task queue). The first indictor is simply the percentage of the CPU (Or CPUs) being used by all the various threads. If this is up to 100% for significant periods of time, you may have a problem. On the other hand, if it isn’t, the CPU is under-utilized, but that is usually preferable. Low CPU usage can indicate that your application may be blocked for significant periods on disk or network I/O. High CPU usage can indicate thrashing (lack of RAM) or CPU contention (indicating that you need to rune the code and reduce the number of instructions being processed to reduce the impact on the CPU).

A reasonable target is 75% CPU utilization (which from what I read from different authors varies from 75% till 85%). This means that the system is being worked toward its optimum, but that you have left some slacks for spikes due to other system or application requirements. However, note that if more than 50% of the CPU is used by system processes (i.e. administrative and IS process), your CPU is probably under-powered. This can be identified by looking at the load of the system over some period when you are not running any applications.

The second performance indicator, the run-able queue, indicates the average number of processes or threads waiting to be scheduled for the CPU by the OS. They are run-able processes, but the CPY has no time to run them and is keeping them waiting for some significant amount of time. As soon as the run queue goes above zero, the system may display contention for resources, nut there usually some value above zero that still gives acceptable performance for any particular system. You need to determine what that value is in order to use this statistics as a useful warning indicator. A simplistic way to do this is to create a short program that repeatedly does some simple activity. You can then time each run of that activity. You can run copies of this process one after the other so that more and ore copies are simultaneously running. Keep increasing the number of copies being run until the run queue starts increasing. By watching the times recorded for the activity, you can graph that time against the run queue. This should give you some indication of when the run-able queue becomes too large for useful responses on your system, administrator if the threshold is exceeded. A guideline by Adrian Cockcroft is that performance starts to degrade if the run queue grows bigger than four times the number of CPUs.

If you can upgrade the CPU of the target environment, doubling the CPU speed is usually better than doubling the number of CPUs. And remember that parallelism in an application doesn’t necessarily need multiple CPUs. If I/O is significant, the CPU will have plenty of time for many threads.

Process Priorities

The OS also has the ability to prioritize the processes in terms of providing CPU time by allocating process priority levels. CPU priorities provide a way to throttle high-demand CPU processes, thus giving other processes a greater share of the CPU. If there are other processes that need to run on the same machine but it doesn’t matter if they were run slowly, you can give your application processes a (much) higher priority than those other processes, thus allowing your application the lion’s share of CPU time on a congested system. This is worth keeping in mind if your application consists of multiple processes, you should also consider the possibility of giving your various processes different levels of priority.

Being tempted to adjust the priority levels of processes, however, is often a sign that the CPU is underpowered for the tasks you have given it.



The above is taken from the publication, "Java Performance Tuning" written by Jack Shirazi. I would recommend to read this book as it provides not just tuning and bottleneck concepts bounded by Java.


I would also recommend visiting the site author by Jack himself and a couple of his mates. It is a very resourceful site for Java performance-related information. Click here to access it.



Related Topic

Content Page - General





Labels: , ,


Bookmark this article now! AddThis Social Bookmark Button



technorati del.icio.us reddit digg

If you control the OS and hardware where the application will be deployed, there are a number of changes you can make to improve performance. Some changes are generic and affect most applications, while some are application-specific. This article applies to most server systems running Java application, including servlets, where you usually specify (or have specified to you) the underlying system, and where have some control over tuning the system. Client and standalone Java programs are likely to benefit from this chapter only if you have some degree of control over the target system, but some tips in the chapter apply to all Java programs.

It is usually best to target the OS and hardware as a last tuning choice. Tuning the application itself generally provides far more significant speedups than tuning the systems on which the application is running. Application tuning also tends to be easier (though buying more powerful hardware components is easier still and a valid choice for tuning). However, application and system tuning are actually complementary activities, so you can get speedups from tuning both the system and the application if you have the skills and resources. Here are some general tips for tuning systems:


  • Constantly monitor the entire system with any monitoring tools available and keep records. This allows you to get a background usage pattern and also lets you compare the current situation with situations previously considered stable.
  • You should run offline work during off-hours only. This ensure that there is no extra load on the system when the users are executing online tasks and enhance performance of both online and offline activities.
  • If you need to run extra tasks during the day, try to slot them into times with low user activity. Office activity usually peaks at 9am and 2:30pm and has a load between noon and 1pm or at shift changeovers. You should be able to determine the user-activity cycles appropriate to your system by examining the results of normal monitoring. The reduced conflict for system resources during periods of low activity improves performance.
  • You should specify timeouts for all process under the control of your application (and others on the system, if possible) and terminate processes that have passed their timeout value.
  • Apply any partitioning available from the system to allocate determinate resources to your application. For example, you can specify disk partitions, memory segments, and even CPUs to be allocated to particular process.


As the entire chapter is lengthy, I've split them into various sections namely the links that followed. Click on the links to access the article.

CPU

Disk

Memory


The above is taken from the publication, "Java Performance Tuning" written by Jack Shirazi. I would recommend to read this book as it provides not just tuning and bottleneck concepts bounded by Java. A simplified version (which is the summary of the chapter can be found here [Comming soon]).

I would also recommend visiting the site author by Jack himself and a couple of his mates. It is a very resourceful site for Java performance-related information. Click here to access it.




Related Topic

Labels: , , , , , , , ,


Bookmark this article now! AddThis Social Bookmark Button



technorati del.icio.us reddit digg



Powered by Google

Enter your email address:

Delivered by FeedBurner


Add to Technorati Favorites


XML

Powered by Blogger

make money online blogger templates

Powered by FeedBurner

Blog Directory

Top Blogs

Software Blogs -  Blog Catalog Blog Directory





© 2007 Performance Testing, LoadRunner Tips&Tricks | Blogger Templates by GeckoandFly.
No part of the content or the blog may be reproduced without prior written permission.
Learn how to make money online | First Aid and Health Information at Medical Health