Abstract-Software fault localization that identifies locations offaults in a program is tiresome, dull, costly and still crucial for programdebugging. As scale and complexity of software increases, locating faultsmanually becomes infeasible and hence demand for automatic fault localizationtechniques increases, each of which makes fault localization process moreeffective. In this paper, we present an analysis of Spectrum-based faultlocalization and DStar (*) technique. Both these methods are effective faultlocalization techniques and indicate suspicious locations for faultlocalization automatically without the need of any previous information aboutstructure and semantics of program.Introduction-Successful debugging is important for producing good qualitysoftware.
Since the number of bugs to be fixed often outruns the size of thedevelopment team, thus manual debugging becomes error-prone. Therefore, for makingsoftware more feasible and authentic, automatic debugging processes are ingreat demand. Among the wide variety of debugging activities, software faultlocalization is most expensive and important.
- Thesis Statement
- Structure and Outline
- Voice and Grammar
Fault localization techniques includes a large family ofSpectrum-based fault localization techniques (SBFL). SBFL techniques examine program spectra, whichcan be defined as a collection of program traces collected during execution ofa program to show an association between failures and program elements that areliable for these failures. SBFL techniques allocate suspiciousness scores toprogram elements which are then ranked on the basis of these scores.
Finally,the obtained ranked list is handed over to the developers to guide them to theroot cause of these failures.Another effective fault localization technique is DStarMethod 2 which has its origins rooted in binary similarity coefficient-basedanalysis and its a modified version of Kulezynski coefficient.ProgramSpectrum-Based Techniques-Various Spectrum-Based Techniques are motivated by theprobabilistic and statistical-based casuality models. Spectrum of a programencovers the execution information of a program from various perspectives whichinclude execution information for conditional branches or loop-free intra-proceduralpaths. Such spectra can be used for the tracking of program behaviour andsoftware fault localization. When a program does not executes as expected, therelated information can be used to locate suspicious code that is liable forthe error.Notation:P aprogramNCF number of failed test cases that cover a statementNUF number of failed test cases that do not cover a statementNCS number of successful test cases that cover a statementNUS number of successful test cases that do not cover a statementNC total number of test cases that cover astatementNU total number oftest cases that do not cover a statementNS total number of successful test casesNF total number of failed test casesti the ith testcase Techniques:1. Code coverage/ Executable Statement Hit Spectrum(ESHS)- indicates parts of programcovered during an execution and by using this information the componentsinvolved in a failure can be identified.
This process narrows the search forfaulty component that led the program in a failed state. SBFL works by using aset of test cases where one the test cases fails. Better results can beachieved by using both the successful and failed test cases and focusing thecontrast between them. The Set union emphasizes the source code which isexecuted by the failed test but not by any of the successful tests. This typeof code is more suspicious as compared to others. The set intersection keepsthe code executed by failed test and ignore the code executed by all thesuccessful test cases.Nearest neighbour is an ESHS-basedtechnique, which distinguish a failed test with a successful test that most resemblesto the failed one, based on distance between them. If the bug pops up in thedifference set, it is analysed otherwise the method proceed by first developinga program dependence graph (PDG).
Then adjacent un-checked nodes are added andanalysed in the graph step by step until the whole list of nodes in the graphare checked.ESHS-based Similarity Coefficient-basedmeasures are used to compute the closeness of the execution pattern of astatement with the failure pattern of all test cases and this degree ofcloseness can depict the suspiciousness of the statements. Hence, the closerthe execution pattern of a statement is to the failure pattern, the moresuspicious the statement appears to be and vice versa. A popular similaritycoefficient-based technique is Tarantula, which computes the suspiciousness ofeach statement by using coverage and execution results.
Its formula is asfollows: Suspiciousness(Tarantula)= On the basis of suspiciousness computed byTarantula, studies like 5, 6 make use of colours for providing visualmapping of involvement of each program statement while the test suite excutes. Whenmore failed test cases executes a statement, the colour assigned to it becomes brighter.Debroy et al. 7 revised the Tarantula technique by first grouping togetherthe statements executed by same number of failed test cases and then rankingthese groups in descending order on the basis of failed test cases.
Within eachgroup, the statements are ranked by using Tarantula.When compared to previously discussedtechniques such as set union, set intersection, nearest neighbour, Tarantula isa more effective fault localization technique because it investigates less codebefore the first faulty statement is identified.The Ochiai similarity coefficient-basedtechnique is even more effective than Tarantula.Its formula is as follows: Suspiciousness(Ochiai)= Ochiai is different from nearest neighbourmodel as Ochiai utilizes multiple failed test cases while nearest neighbouruses a single failed test case. Also, Ochiai includes all successful test caseswhile nearest neighbour model only focuses the successful test cases that mostcloselyrelates to failed test cases.
Ochiai toextended to Ochiai2, and its formula is as follows: Suspiciousness(Ochiai2)= 2. Program Invariants Hit Spectrum (PIHS)- recordsthe coverage of program invariants, which can be defined as the properties ofprogram that do not change as the program executes. To locate bugs, thesetechniques focuses on finding violations of program properties in failedprogram executions. Invariants can be classified as likely invariants andunlikely invariants where the former ones are properties that hold in some setsof successful executions while the later one may not hold for all possibleexecutions. Automatic identification of necessary program properties requiredfor fault localization is the major obstacle in using these techniques. Toovercome this problem, invariant spectrum of successful executions areconsidered as program properties. 3.
Predicate Count Spectrum (PRCS)- records the executionof predicates and track behaviours of program that are more likely to createfailures. Since the PRCS information is evaluated using statistical methods,these techniques are called statistical debugging techniques. 4. Method calls Sequence Hit Spectrum (MCSHS)-collects information related to the sequence of method calls ( how an object isused ) encountered during program execution. Equal consideration is given toboth incoming and outgoing method calls. 5.
Time Spectrum- records execution time of methodsin successful or failed executions and those collected only from successfulexecutions are used to create observed behaviour models. Deviations from theseobserved models in failed executions are observed and are rated as possiblecauses of failures. 6. Xie et al.
3 compared 30 different SBFLformulas and proved that two families of formulas among them outperform others.They referred these families as ER1 and ER5. Two members of ER1 are ER1a andER1b . Three members of ER5are ER5a, ER5b and ER5c . The formulas forthese members are as follows: ER1a(e) = ER1b(e) = NCF – ER5a(e) = NCF ER5b(e) = ER5c(e) = 7. The effectiveness of a SBFL formula is evaluatedon the basis of EXAM score.
Lower EXAM score denotes better performance. Theformula for calculating this score is as follows: EXAM score= Tien-Duy B. Le 1 compared the EXAMscore of Tarantula, Ochiai, ER1 and ER5 families of formulas and observed thatOchiai has the lowest EXAM score. DStarmethod:W. Eric Wong 2 proposed DStar methodfor effective fault localization which realises the following intuitionsrelated to the suspiciousness of a statement. Firstly, the suspiciousness of astatement increases if it is covered by more failed tests.
Secondly, thesuspiciousness of a statement decreases if it is covered by more successfultests. Thirdly, a statement is considered less suspicious if test cases failwithout covering it. Lastly, more importance is given to the statements coveredby failed tests than those covered by successful tests or those which are notcovered by failed tests.As stated earlier DStar method is anextended version of the Kulezynski coefficient since, the later one accomplishonly first three intuitions while the former one is able to realise each of thefour intuitions and is also capable of handling multiple bugs. The formula forboth these methods are as follows: Suspiciousness( Kulezynski )= Suspiciousness( DStar )= , where * is greater thanor equal to 1.
W. Eric Wong 2 evaluated D* across ninedifferent sets of programs (Siemens suite, Unix suite, gzip, grep, make, Ant,space, flex, sed) that corresponds to 24 subject programs consisting ofmultiple different faulty versions. They further compared D* with 38 othertechniques including 31 similarity coefficient-based techniques and 7 othercontemporary techniques. On the basis of EXAM score, D* method is superior thanother techniques. The effectiveness of D* increases with the value of * andthen it level offs on reaching a critical value.
However, no technique among the widerange of fault localization techniques claims to outperform all others under everycircumstance. Hence, it can be stated that an optimum Spectrum-based techniquefor finding locations of faults is currently not present.Conclusion- References-1. Tien-Duy B.
Le, Ferdian Thung, and DavidLo, “Theory and Practice, Do They Match? A Case With Spectrum-Based Fault Localization,”in 2013 IEEE International Conference on Software Maintenance.2. W.
Eric Wong, Vidroha Debroy, Ruizhi Gao,and Yihao Li, “The DStar Methof for Effective Fault Localization,” in IEEETransactions on Reliability, vol. 63, no. 1, march 2014.3.
X. Xie, T. Chen, F.-C.
Kuo, and B. Xu, “Atheoretical analysis of the risk evaluation formulas for spectrum-based faultlocalization,” TOSEM, 2013.4.
W. Eric Wong, Ruizhi Gao, Yihao Li, RuiAbreu, and Franz Wotawa, “A Survey on Software Fault Localization,” IEEETransactions on Software Engineering, vol. 42, no. 8, august 2016.5. J. A.
Jones, M. J. Harrold, and J.Stasko, “Visualization for fault localization,” in Proc. Workshop Softw.
Vis., 23rdInt. Conf. Softw. Eng., Ontario, BC, Canada, May 2001, pp.
71-75.6. J. A. Jones, M. J.
Harrold, and J.Stasko, “Visualization of test information to assist fault localization,” inProc. Int. Conf. Softw.
Eng., Orlando, FL, USA, May 2002, pp. 467-477.7. V. Debroy, W. E. Wong, X.
Xu, and B Choi,”A grouping-based strategy to improve the effectiveness of fault localizationtechniques,” in Proc. Int. Conf. Softw., Zhangjiajie, China, Jul.