278 Int. J. Advanced Intelligence Paradigms, Vol. 9, Nos. 2/3, 2017
A novel approach to automate test data generation
for data flow testing based on hybrid adaptive
PSO-GA algorithm
Sumit Kumar*
KIET Group of Institutions,
Ghaziabad, Uttar Pradesh, India
Email: sumitkumarbsr19@gmail.com
*Corresponding author
Dilip Kumar Yadav and Danish Ali Khan
National Institute of Technology,
Jamshedpur, Jharkhand, India
Email: dkyadav.ca@nitjsr.ac.in
Email: dakhan.ca@nitjsr.ac.in
Abstract: The most important and crucial activity to develop good quality
software is software testing. The most important activity in software testing is
to find optimal test suite in input domain to satisfy a certain test adequacy
criteria. So to develop an efficient approach to generate test data is a prime
issue in software testing. This paper proposes a novel approach to generate test
data automatically for data flow testing based on hybrid adaptive PSO-GA
algorithm. The hybrid APSO-GA is developed to conquer the weakness of GA
and PSO algorithms, especially in data flow testing. A new fitness function is
also designed on the basis of the concept of dominance relations, branch weight
and branch distance to guide the search direction more efficiently. The
efficiency of proposed approach is then tested on ten benchmark programs and
four real world programs. The proposed approach is then compared with GA,
PSO, ACO, DE and hybrid GA-PSO on the basis of two performance
parameters, average number of generations and average coverage achieved.
The results show that hybrid adaptive PSO-GA gives better results as compared
to other algorithms that are used in the field of test data generation.
Keywords: software testing; particle swarm optimisation; PSO; genetic
algorithm; hybrid algorithms; data flow testing.
Reference to this paper should be made as follows: Kumar, S., Yadav, D.K.
and Khan, D.A. (2017) ‘A novel approach to automate test data generation for
data flow testing based on hybrid adaptive PSO-GA algorithm’, Int. J.
Advanced Intelligence Paradigms, Vol. 9, Nos. 2/3, pp.278–312.
Biographical notes: Sumit Kumar is working as an Associate Professor in the
Department of Information Technology, KIET group of Institutions Ghaziabad
U.P India. He is also a part-time Researcher at National Institute of
Technology, Jamshedpur, Jharkhand, India. He received his MTech degree in
Information Technology from the GGSIPU Delhi, India in 2009. He has
published many research papers in reputed international journal and
conferences. His current research interests include software testing and soft
computing, His research activity can be observed in Google Scholar Citations.
Copyright © 2017 Inderscience Enterprises Ltd.
, A novel approach to automate test data generation 279
Dilip Kumar Yadav is currently an Associate Professor in the Department of
Computer Applications at National Institute of Technology, Jamshedpur, India.
He received his PhD degree in Software Reliability Engineering from Indian
Institute of Technology, Kharagpur (India) in 2012. He received his BTech
degree in Mechanical Engineering from National Institute of Technology,
Jamshedpur, India, in 1991 and MTech degree in Computer Integrated Design
and Manufacturing from National Institute of Technology, Jamshedpur, India,
in 1994. He has more than 20 years experience in teaching and research. His
current research interests include software reliability and quality modelling,
software security, software project management, software metrics, soft
computing, supply chain management and system optimisation.
Danish Ali Khan is Dean Academics and Associate Professor in the
Department of Computer Applications at National Institute of Technology,
Jamshedpur India. He received his PhD and Master’s degree in Department of
computer Applications at National Institute of Technology, Jamshedpur India.
His research interests include supply chain management and software quality
and economic cost and optimisation technique.
1 Introduction
Software testing is one of the main activities of software development life cycle (SDLC).
More than 50% of the total cost of the project is consumed in software testing phase of
SDLC (Beizer, 1990). The main aim of software testing is to produce better quality and
reliable software. Broadly, software testing is of two types, white box testing and black
box testing. Among the two test approaches towards software testing, white box testing is
the preferred one owing to its ability to recognise and flag the bugs present in the
software system. White box testing is further divided into four sub categories that are
statement testing, branch testing, path testing and data-flow testing. Data flow testing is
the least explored testing technique among all the domains of testing techniques, giving
researchers some extensive research options to explore. As the size of software gets
larger it becomes exponentially difficult to generate optimal test suite (Varshney and
Mehrotra, 2013).
In recent years, there has been an emphasis on generating optimal test suite
automatically using meta-heuristic algorithms such as particle swarm optimisation (PSO)
and genetic algorithm (GA) (Harman and McMinn, 2010). GA is the most preferred
algorithm by the researchers used to generate test data (Pachauri and Srivastava, 2013).
Although GA generates optimal test data accurately it has extremely slow convergence
rate. Recently, PSO has been extensively preferred by researchers in the field of test data
generation due to its simplicity and speed, but its drawback is that it gets stuck in local
minima at later stages. PSO gives better results as compared to GA for test data
generation problem (Mao, 2014).
PSO gives better results as compared to GA for test data generation problem. To
overcome the weakness of GA and PSO hybrid algorithms have been applied for test data
generation (Singla et al., 2011). Hybrid algorithm takes advantage of both algorithms
such as fast convergence rate, ability to find global optima, population diversity and
stability. Although greatly effective, hybrid algorithms have seen relatively less attention
,280 S. Kumar et al.
in the field of test data generations due to their complex nature and large computation
costs. So there is a need to minimise the computation cost.
In this research work, an attempt is made to overcome the weakness of GA and PSO,
a hybrid algorithm with modified fitness function is proposed for test data generation.
Hybrid algorithm takes advantage of both algorithms such as fast convergence rate,
ability to find global optima, population diversity and stability (Chawla et al., 2015).
Although greatly effective, hybrid algorithms have seen relatively less attention in the
field of test data generations due to their complex nature and large computation costs.
1.1 Research gap
Recently, many researchers apply evolutionary and swarm-based algorithms to optimise
the test suite. GA, PSO and ACO have been employed for test data generation and
optimisation problem. Evolutionary algorithms have advantage over traditional methods
but still several shortcomings are associated with these algorithms such as stuck in local
optima, population diversity, etc. Recently, the focus has been on the use of other highly
adaptive swarm intelligence techniques such as PSO and ant colony optimisation. In PSO
and ACO, the knowledge of previous good solutions is retained by all the members of the
current population by means of constructive cooperation among them. PSO has shown
better results for test data generation (Mao, 2014) in comparison to GA and random
search. But, this algorithm suffers from immature convergence. Whereas, GA suffers
with slow convergence problem. Hence, another way to enhance the performance of
above mentioned algorithm is hybridisation. In this study, an attempt is made to hybridise
the PSO and GA algorithms to overcome the shortcomings of both the algorithms.
From the literature survey, it is also observed that only three fitness functions are
applied in data flow testing. Girgis (2005) developed a fitness function for test suite
optimisation problem, but this fitness function only considered the number of paths
covered by a test case. But, the major drawback of this fitness function is not providing
rank of test cases which cover same number of def-paths. Nayak and Mohapatra (2010)
also used the same fitness function for generating test data using PSO. Ghiduk et al.
(2007) developed a fitness function based on dominance relation and included a closeness
level. But this fitness function also suffers from same problem as if two test cases cover
same number of nodes in dominance tree, then both test cases have same rank. Singla
et al. (2011) also used the same fitness function for the generation of test data using
hybrid algorithm. To overcome the above mentioned shortcomings, Varshney and
Mehrotra (2015) developed a new fitness function which includes the concepts of
dominance relation and branch distance for computing the closeness level. In the above
mentioned work, the same weight is taken for conditional operators. Hence, to overcome
the above mentioned shortcomings, an efficient and effective fitness function is proposed
for test suite optimisation problems. The new proposed fitness function is based on
dominance relation, branch distance and includes the branch weight parameter. This
provides a more robust fitness value. This has not been attempted before for test data
generation for data-flow coverage.
1.2 Contribution of work
The main contribution of the work as follows:
, A novel approach to automate test data generation 281
• To propose a hybridise PSO-GA algorithm for automatic generation of test suite.
• To make the proposed algorithm more efficient, reliable and also to overcome local
optima problem by integrating adaptiveness in PSO algorithm.
• An improved fitness function is proposed for test suite optimisation problem.
The results of the proposed approach is compared with GA, PSO, ACO, DE and hybrid
GA-PSO approaches for test data generation, using coverage and number of generations
parameters. On the basis of experimental results, it is stated that the proposed approach
outperforms in comparison to other algorithms being compared. It takes less number of
generations and coverage all def paths for most of programs.
The rest of the paper is organised as follows: Section 2 describes automated software
test data generation process and related work. Section 3 describes data flow analysis,
PSO and GA. Section 4 describes adaptive PSO algorithm, Section 5 describes the
proposed approach for test data generation based on hybrid algorithm. Section 6
experimental results and discussion. Section 7 discusses threats to validity and limitations
of the proposed approach. Section 8 is conclusions.
2 Related work
Even though much work has been done in generating test data by applying meta-heuristic
techniques, but its application in software industry is still very limited. We have limited
ourselves to preliminary studies on test data generation by using meta-heuristic
techniques.
Two decades back GA was applied in the field of test data generation, it was used to
generate test data branch testing by Jones et al. (1996) and Pargas et al. (1999) but their
work was on smaller programs. McMinn (2004) applied GA on larger programs to
validate its efficiency over other meta-heuristics. Ahmed and Hermadi (2008) also
applied GA for test data generation using path testing but it was inefficient for program
shaving loops of infeasible paths. However, data flow coverage remained largely
unexplored.
GA is also preferred in data flow testing techniques but GA has slow convergence
rate (Ghiduk et al., 2007; Mahajan et al., 2012). Other highly adaptive evolutionary
algorithms like PSO and ant colony optimisation are also used in test data generation
problem because of high exploration and simple implementation (Harman, 2007). PSO
and its variants have also been applied for test data generation. Windisch et al. (2007),
Mao (2014) and Jiang et al. (2015) applied for test data generation according to branch
adequacy criteria and their research shows that PSO works efficiently compared to GA.
Nayak and Mohapatra (2010) applied PSO for data flow testing but their fitness
function does not rank the test cases because their fitness function give same fitness value
to all the test cases which cover the same number of def-use paths. Fitness value is set to
0 for the test cases which cover a partial path. Hence, their fitness function was not
efficient for guiding the search.
Varshney and Mehrotra (2015) have proposed GA-based approach to automatically
generate test data. The author chose data flow testing as a test adequacy criteria for
data-flow testing and experimentation was only carried out on smaller programs. The
author did not consider the impact of the complex branch which calculates the branch