I really enjoy working
in the performance engineering space because It is a very challenging and non
mundane space .
Creating the load and emulating production
workload is a means to end – you obviously need to create the load before you
can capacity plan or understand the scalability of the deployment. But it is
the skills in performance analysis that are most valuable. The performance
engineer who walks into a project, takes the lead, wastes no time in learning
the environment, creates and/or executes the realistic tests, identifies
current capacity, isolates and alleviates bottlenecks, documents results,
mentors the juniors, and clearly and effectively communicates with everyone
from developers on up to the CIO/CTO’s, is truly a GOLD MINE.
Here are few qualities
that I think really help a performance engineer to grow in this space
Patience and Great
Communication:
Performance
engineering is a complex task that has lot of moving parts and open items .
It is often a very complex task– so many moving parts
within the infrastructure, so many numbers to analyze from so many sources,
data sets of raw test results to turn into understandable formats, so many
people to keep in the loop, so much technical coordination . I could go on and
on. On top of the technical skill set It’s your professional soft skills which will
keep you on the right course. It requires determination go figure it out kind
of attitude to unpeel the layers of an onion and investigate each tier of the
deployment. It requires the knowledge as well as drive to spot trends instead
of pursuing the tangents of anomalies. It requires the dedication and
determination to keep an eye on different
metrics and isolate resource saturation. And it requires the patience to
reproduce scenarios in order to make conclusions based on proof/evidence. And
you need to accomplish all of this while being on top of your communication
skills.
Methodical Approach –
The Constant
Working in
this space for more than a decade here is what I think helps you most of the
times is to be methodical .One thing that a performance engineer has to really be very careful is like a abiding principle is
that always make sure that we are comparing apples to apples that means set up
a realistic test and once it is set up it is set up in stone do not make
changes to any of the run time settings until and unless there is a business
case dictating the same. This is the only way you as a performance engineer can
reproduce and compare results among different builds.
Any deviation within the test case scenario will result in
different throughputs which affect resource patterns. Not following this tip
will surely lead you on a collision course with Analysis Paralysis!
Architectural Diagram
– Identify Potential Bottlenecks by Visualization
Make sure you ask for and receive an
architectural diagram of the entire deployment. Map out business transactions
to resources utilized within the environment. Make sure you understand all the
transaction flows, from front end load balancers down to the shared resource
database. If we do not have an architectural map, your journey will easily end
by the frustration of getting lost in the dark.
Looking for Tuning
Hardware and Software Level Bottlenecks during test
You might have heard this couple of times
that “Performance Tuning is an Tuning is
an Art”. “Tuning is a Science”. Which is it? Hardware servers are restricted by
the physical resources (disk io/memory, cpu). Software servers are much more
configurable and this is where expertise is needed for tuning. Performance
engineers must understand the workings of a “server” in thread pools, caching
policies, memory allocations, connection pooling, etc. Tuning is a balancing
act. It’s the situation where you tune the software servers in order to take
full advantage of hardware resources, without causing a flood. Tuning must be
conservative, weighing all the benefits as well as the consequences.
Produce Reproducible
Results
Typically, a seasoned performance engineer will
tune a layer of the environment only when the results are reproducible. Always we
should focus on trends instead of points in time, mere spikes are not cause for
architectural changes. As a rule of thumb, you should reproduce 3 times before you
make a change. Sometimes this takes a while… So be prepared to be patient.
Tune the First
Occurring Bottleneck
When monitoring a large
complex system, there are many counters to keep in your sights. Don’t jump the
gun and tune a thread pool when you see it becomes saturated, this could
actually be a symptom of the problem, not the root cause. Try to work on getting the root of a
performance issue and tackle that first Correlate (using graphing is easiest)
the point of time of degradation of performance to the first saturation within
the environment. Understandably, there is a ton of information to look at –
keep it simpler by just looking at the free resources based on percentages
(free threads, free cache, and free file descriptors) and this will allow you
to spot a bottleneck quicker. When a free resource runs low, there’s a possible
bottleneck. Understand the resource utilization and free resources will allow
you to understand a bottleneck before it affects the end-user response time. In
other words, watch as the resource becomes utilized. When free gets low, keep
it on your radar for a cause of performance degradation.
Iterative Tuning
Process
Tuning is an iterative process. Know that once
you have alleviated one bottleneck, you will surely encounter another one.All
aspects of servers are limited and since nothing is infinite you will
eventually reach the end. Tuning manipulates the gates, requests which don’t
have a resource are queued and must wait to be serviced. Tuning becomes a
process you must repeat until the workload reaches target capacity with
acceptable response times.
Validation
Validate, validate, validate. Just as important as recreating and
tuning based upon proof is validating that the tuning change had the desired
effect. Did it indeed impact scalability in a positive way? Often, performance
engineers test out theories. And sometimes, the validation stage will
cause a change to be reverted. It’s ok that not every change will make it to
production. The key is to use a very scientific approach in which you prove the
result as well as the requirement.
Soft skill :Communication as well as eagerness
to learn and expand your horizons.