ABSTRACT: How do you collect the metrics to allow you to pick the most appropriate methodology to fit your context?
This paper outlines a study conducted to compare different methodologies using a controlled environment, the same test input documentation and similarly qualified professional testers. Whilst it is a fairly small project the results are striking and contained within this paper some conclusions have been suggested. We also threw in a change of tool and analysed this impact as well!
The methodologies compared were WATERFALL, AGILE and BUG HUNT.
Starting Point
In order to be able to provide and put forward valid ideas, as clinical an initial environment as possible needed to be designed, to be the starting point.
This was –
A Business Requirement document (BRD) which was:
- As close to real life as possible
- A mix of actual business requirements and some functional specifications
- Written as paragraphs of text, not individually identified requirements, which then needed to be extracted by the test team
- Not given priority to the individual requirements by the Business
- Contained some prototypes for some of the screens
The testing team were all test professionals
- Test managers were all ISQTB certified Advanced Test Manager
- Test analysts/senior test analysts were at least ISQTB certified at Foundation level and the seniors at Advanced Test Analyst level
- Those undertaking the Agile methods were Certified Agile testers (CAT)
Application
- Each team had access to the same application on which they could complete their execution
- The application was in the Banking domain in which most staff had some previous experience
- The application was seeded with defects
Tools
- JIRA was used for the traditional methodologies with Planit customised templates to record requirements, test conditions, test cases and defects
- JIRA was used for one team using Agile with Planit customised templates for user stories, session sheets and defects
- HP Accelerator was used for one team using Agile with Planit customised templates for user stories, session sheets and defects (not out of the box functionality)
- The Bug Hunts used Planit customised templates simply using Word and Excel
Team
- Team 1 – Agile using JIRA
- Team 2 – Bug Hunt 1
- Team 3 – Bug Hunt 2
- Team 4 – Traditional (Waterfall)
- Team 5 – Agile using QC Accelerator
It should be noted that no member of staff was included within more than one team as part of this study. Also only team members were given access to the tools that they were using and the project data was locked down. Staff were asked not to discuss what happened within their teams and how many defects or the amount of coverage they were able to achieve outside of their team.
Metrics
There were a number of metrics gathered:
- Defects split by total number and severity
- Coverage of requirements
Also observation techniques were used to gather data which is used in this paper.
Findings
There were a number of findings and these have been split into the following sections to highlight certain aspects.
Defects
Note: where defects are quoted in the paper, it only includes high severity ones. All others have been ignored for the purposes of this paper as the study parameters were to focus on high severity defects. A full analysis of all defects found may be a subject of a further paper.
Swipe to see more
Team
|
Total Defects
|
# High Severity |
% High Severity |
Team 1
|
105
|
29 |
28%
|
Team 2 |
52 |
11 |
21% |
Team 3 |
65 |
15 |
23% |
Team 4 |
99 |
25 |
25% |
Team 5 |
45 |
15 |
33% |
All defects discovered were required to be logged, as the scenario was that the development team was remote to the testers. Whilst this was not the ideal environment for the Agile teams and could have impacted on the productivity, we needed a way to further analyse the quality of defects found.
The Agile team found the most defects and the greatest number of high severity defects of all the teams. It was interesting to discover that there was a divergence in the type of some of the defects found with the Agile team more focused on usability defects whilst the traditional found more in the database.
Another discovery was that both the Agile teams scored the highest percentage of high severity defects when compared to the total number of defects that were found by that team therefore indicating that Agile finds the higher percentage of high severity defects. This would be an indicator of the quality of testing when using an agile methodology.
When comparing the defect leakage Team 4 came out best however Team 1 were only 6% lower and Team 3 was next which differed by another 4%. When you compare the number of man days to get an additional 6% or so, you have to question whether this is value for money.
Test Coverage
The traditional team was told to perform system testing only although was not given any boundaries around functional or non-functional testing. The team did however define the scope of testing in their Test Plan to exclude any non-functional testing and to be fair there were a few non-functional requirements contained within the BRD and they did not have the tools available to do this testing. They did however do testing around the database, using SQL queries, and the tests were written to a much more technical level. Perhaps an interesting further study would be to create teams including non-functional tests and compare those.
The Agile teams of course tested to the acceptance criteria, as demonstrated to the PO as part of the iteration review meeting, and was therefore much more customer focused so they did little database testing and Team 1 was the only team to perform usability testing.
There were 12 major components to provide full coverage and most of the teams did recognise these although they did define them slightly differently. There was a high variance in the number of tests that each team executed.
Of note the two Agile teams looked at coverage slightly differently, in that Team 1 were much more collaborative and due to their numerous inter team discussions found that the system interaction for the same function was slightly different between the team members and therefore increased the usability testing. Team 5 focused only on the acceptance criteria where usability was not specifically defined as criteria or discussed with the Product Owner.
Test Efficiency
Defects found per manday showed that Team 1 was slightly more efficient than Team 3 with Team 2 a close third. The mandays were taken over the whole period of the project as defects can be found in the preparation phases of the project as well.
Swipe to see more
Teams
|
Defects Per Man-Day
|
Team 1
|
1.26
|
Team 2
|
0.91
|
Team 3
|
1.25
|
Team 4
|
0.47
|
Team 5
|
0.60
|
The teams basically covered the same areas of testing although they split them up slightly differently. There was quite a large variation in the number of tests each team completed in order to represent this coverage. There were also differences in the splits between planning, preparation, execution and closure activities across each team. Certainly the teams with less ceremony (process, documentation etc.) did not decrease the coverage and in some cases spent more time testing and therefore finding defects.
Agile Experience
Our starting state was that staff working on the Agile projects were Certified Agile Testers (CAT). This meant that there was no time allowance for learning a new methodology and the teams were able to get straight into Release/Iteration planning. They followed the Agile process as defined within the CAT course (details can be found at www.planit.net.au). Team 1 were more experienced in Agile and had been involved in previous Agile projects. It was marked that they had also changed their mindset to a more collaborative way of working and the personalities worked well together.
Team 5 tended to still revert back to the some of the traditional principles and there were fewer conversations within the team. The Agile principles had not yet been embedded into their way of working. This team would have benefited from some additional coaching which was not given due to the study. This indicated that newly formed teams may require some initial support in order to bed in the practices, which is often not factored into projects. It is all too easy when put under pressure, whether that be real pressure or self-imposed, to revert back to what we know rather than what we have just been taught.
Product Owner Involvement
The Product Owner (PO) or Business representative was available for all teams, although throughout the study we simulated real life as much as possible in that the teams had to pre-book meetings with their Business (PO) and they were not always available to answer questions instantly. The PO contributed and confirmed the acceptance criteria with both Agile teams which then directed the scope of testing.
The exception to this was the two Bug Hunt teams. The process for them was that the Business had an initial scoping meeting with the whole team and then there was no contact again until the Test Summary / Recommendation report at the end of the process. These teams relied on past experience, domain knowledge and defect taxonomies for testing along with the BRD and the initial meeting. This is a short timeboxed activity so relied much more on the quality of the staff rather than the external inputs.
For the traditional team they compiled a list of questions following a formal review of the requirements which they then walked through with the Business. The Business also had opportunity to review both the test conditions and test cases and provide feedback before sign-off. Therefore there was plenty of opportunity for the Business to confirm the coverage was appropriate or agree with the team to add more tests.
In the beginning Team 1 had more opportunity to interact with the PO which they took advantage of which meant that the acceptance criteria were clearly defined. Team 5 however did not get so much initial time with the PO due to his time pressures and this contact did not really increase although they had the opportunity to do so. This did have a significant impact on the quality of the deliverable and in particular the number of defects found by this team.
Collaboration
It was observed that the teams that collaborated well together produced the highest quality of testing both in the form of coverage and the number of defects. Interestingly this seems to be irrespective of the methodology. There were teams which did this better than others and it seemed more to do with the individuals than the methodology they were using.
One interesting fact was the two Agile teams, which one would think would collaborate more closely, were sat in two differently configured rooms. They were in separate rooms where only the team members were sat in that room. One team had a set of desks in the middle of the room facing each other, while the other team had their desks in a “U” shape pushed against the walls, so in effect they had their backs to each other when they were working. This second environment did not promote constant interaction and it was apparent that the results that the team achieved were not as good as the other team. This is something that we are going to explore further, as it was not one of the set metrics we had thought to measure beforehand.
Type of person
One of the greatest findings from the study was the fact that, if you have the right people on the project there is a greater likelihood of a successful outcome. It is hard to define what the “right” person is, as it will depend on the context of the project and the environment in which they work.
However here are some suggestions to look for:
- Passion and enthusiasm for testing
- “Can do” attitude
- Team work mentality, working for the good of the team not the individual
- Excellent communication skills, high level of interaction
- Common-sense in applying the correct testing techniques based on project drivers
We certainly found that those teams that had the greater collaboration within the team had the best success rate. This has highlighted that communication skills and the ability to work well within a team have a significant impact on the quality. This finding held true irrespective of the method used and is something that recruiting managers need to embrace and work out how to assess as part of job interviews.
Another conclusion that we were able to draw was that team leadership seemed to play a large part in accomplishing the objectives. I want to make it clear that I am talking about leadership and not management, which are vastly different. In Agile the leader comes from within the team and this can vary from person to person as the project progresses. Team 1 certainly shared this responsibility and it seemed to shift to whomever had the best knowledge within the area under test or discussion that was happening. This was less evident in Team 5 and leadership responsibilities seemed to stay with the more experienced member of the team.
Length of the project
Again this was not one of the facts that we set out to discover however it did seem to make an impact all be it that, it may have been on the “feel good” factor of the people in the team and their enjoyment of the project experience as a whole, rather than directly on the overall project quality. This is important as if your staff are happy they are more willing to commit to the project even in pressured times, of course, this is only one factor.
The Bug Hunts are a set number of days where the key objective is to find as many high severity defects as possible. Since there is not a great deal of interaction with the business stakeholder, the premise is that the testers rely on their experience in the domain or general testing experience. It is likely that staff used defect taxonomies even though they were not specifically written down. These projects involve high levels of energy and concentration focused through the short timeframe involved.
The Agile teams, whilst they were not set timescales, had an expectation from the staff that they would be fast paced, but with a relatively short timeframe. Although, one of the teams let this drive them, rather than the stated objective of quality which in fact was borne out by their results as they found less defects.
The traditional project knew that they were going to be assigned to this project for a longer period of time which was based on their past experience. Whilst this did not reflect in their enthusiasm or productivity there were some interesting reactions when they knew that the Agile team, which had started at the same time, had completed their project and the traditional team was not even half way through, in fact they had not even touched the system or started execution at this time.
Project drivers
What part does quality play – is the driver to get the project finished – is this now learnt behaviour ?
One of the initial starting points that was stressed to all teams at kick-off was the fact that for this study the key driver was quality. For this study, this was defined as providing full coverage of the product under test and to find the largest number of valid defects as possible with the focus to be on the higher severity defects. All teams were given the same definition and understanding of the ratings for severity.
Each team was given as long as they thought that they would need in order to achieve the above stated goals for quality. This did not apply to the Bug Hunt teams as part of the process for this is that they are only given two days in which to complete this service. It should be noted however that both Bug Hunt teams did overrun by one day each and this has been factored into the metrics.
Interestingly we did see some unenforced team competition between the two Agile teams. Team 1 completed their project before Team 5 had started as we were waiting on staff to fit the criteria we had set and for the tool to be configured as we required. Team 5 self imposed the same timebox as Team 1 even though they were told repeatedly that time was not the driver but quality. As Team 5 had an increased learning curve to use the unfamiliar tool they did not complete as much testing and therefore their coverage and number of defects was significantly lower. This highlights that the Agile teams need to be given time to learn a new tool as well as understand the process. It also raises an interesting question as to whether testers now have learnt behaviour to finish as quickly as possible and that quality is not the key driver – is this in fact learned behaviour due to the continual drive to squeeze testing? This is another area that I would like to do some more investigation on.
Tool Set
Bug Hunt
Both Bug Hunt teams used a Planit defined process for this service. They used template session sheets to record the testing and template Excel sheets for the defects. This was using a light weight process as the methodology.
JIRA
Planit has configured JIRA, with its own test process workflow and pages, to reflect the assets that need to be utilised depending on the methodology undertaken. There were separate versions that were setup for each of Agile and Traditional. These had both been used for previous projects and were therefore proven. Each team that used JIRA had one member of the team that had helped with updating the standard configurations, so were very familiar with the workflows. The rest of the team were given a 2 hour training session on the templates as they were already familiar with how to use the basic JIRA functions.
The Agile version constituted user stories, tasks, session sheets, roadblocks and defects. It allowed the team to have a backlog and move stories into iterations in which they would be actioned. The traditional version aligned to the Waterfall methodology and allowed the creation of requirements, test conditions, test cases, risks and issues and defects. The tool allowed the linking of each asset back to the requirements.
HP Quality Centre Accelerator
Although all of the team had used Quality Centre v10 extensively on previous projects they found using the Accelerator confusing. They considered it a large overhead entering data that they considered un-necessary and the automatic creation of, for example, folders which the tool does in the background far too heavy weight for an Agile project. They had all been using JIRA and considered this a more appropriate tool for light weight methodologies. It should be noted however that they did comment that had the project been longer and more complex there may have been some justification for some of the overheads.
It was evident that the amount of training that we should have put in place for this tool should have been greater and this would have benefitted the team. They did have the same 2 hour training session as the other teams on their tools. This would then have impacted on their iteration zero timebox. Lessons have been learnt from this and a training manual produced and updated with the key questions and answers to help this going forward.
Conclusion
Whilst there were some clear findings lots more questions have been raised. The plan is to extend the study to focus on some of the anomalies found to date and also provide further data to confirm our conclusions.
The Agile team found the most defects, including the largest number of high severity defects. Their focus was much more on the user experience whilst the traditional teams found more at the database level.
The bug hunt teams certainly represent value for money with much shorter timeframes and therefore cost. They give a window into the quality of the project as a whole and can be used to then target further testing from their results.
People are the most important. The most overwhelming discovery from this study is that the quality of the staff seemed to make one of the biggest contributions to the success of the project rather than the methodology used. If you have experienced professional specialists and they can collaborate continuously in a team environment they will find the most high severity defects. It is those that are not good communicators and do not interact well who will contribute to making the team less effective.
The team needs to take care when selecting tools to make sure they are complimentary to the methodology and sufficient training has been put in place. For the study each team was not given a choice as to the tool they thought would be best to support them, but were given the tool. Each team was given the same length of training for the tool. We found that some tools were better suited to the project we were working in. In this case light weight tools were better. Also the amount of time required for training should be carefully considered to ensure that this is factored into the project timescales.
Depending on the testers involved, there may need to be some initial support of a new process in this case Agile. This needs to be considered to ensure that the process and practices are embedded into the team or the team has some early coaching/mentoring for the first few iterations.
Surprisingly seating arrangements could play a very important role in success. For this study the team seating seemed to have a marked impact on the way the team worked together. Where the tables were grouped and the team all sat around the same table facing each other their conversations increased which resulted in more significant defects being found.
Further data is needed to collaborate suggestions and over the coming months the project will be run again a number of times setting the environment into selected states to try and prove or disprove hypotheses.
If you have any similar data please send it Leanne Howard to add to this study.