Introduction:In Software Industry there are numerous abstractionsdeveloped by the software engineer to build and maintain a successful software.One of the vital abstraction among them is the software or the systemarchitecture.
This places one of the important role in making up of a completesoftware system.Overview:In this essay, the section I deals with the introduction tothe software architecture and the architectural changes and the process ofreverse engineering. The next section provides an insight of the researchquestion that is being addressed in this summary and in the section III, wediscuss about the different processes to generate the surrogate architecturalviews. In section IV we study about the process of obtaining the groundarchitecture and in the section V we deal with the Bayesian Learning forSoftware Architecture Recovery which is the base for developing automatedarchitectural recovery tool. In the section VI, we deal with the reverseengineering tools which are used currently in the real time.What is SoftwareArchitecture?According to Bass et al, Software Architecture of a programor a computing system is defined as the structure or structures of the systemwhich comprises the software elements, the external visible properties of theseelements and the relationship among them (Bass et al.
2003). The software architectureacts as a base for the software architects to build a solution for the problemwithout getting to the lower level of abstraction such as the source code etc.Software Architecture acts as the roadmap for implementation and themaintenance related activities. A good software architecture is built upon theprinciple of “Separation of Concerns” where different responsibilities and thefunctionalities are assigned to different architectural elements. A Badarchitecture increases the architectural bad smell and increases the complexityof the software system which make it impossible to make changes to it.
Architecturalchanges:The software system evolves over a period through theprocess of software maintenance. It is evident that more the changes are scatteredacross the software components it is more likely to induce bugs into thesoftware system. Sometimes the changes are made to the code are not usuallydocumented for further reference. But the Architectural documentations must beusually updated to adapt the oscillating environments. In most of the cases thearchitectural documents are out of date. So, it becomes very difficult tomaintain the system and make changes to it.
Co Changes:According to Kouroshfar et al, Co Changes are defined the multiple changedfiles which are committed to the same repository (Kouroshfar et al. 2015). It is also evident that Cochanges made across multiple architectural modules induce more defects whencompared to the changes made with a localized module. ReverseEngineering:In the most the time the support engineers work onmaintaining the software system without the knowledge of the underlyingarchitecture.
So, the process of reverse engineering was introduced in the fieldof Software Engineering. According to Eilam, Reverse engineering, also calledback engineering, is the processes of extracting knowledge or designinformation from a product and reproducing it or reproducing anything based onthe extracted information (Eilam 2005).One of the main goal of the software reverse engineering to develop the Softwarearchitecture from the source code.Research Question:The architecture plays a vital role in the maintenance ofthe software system. Due to continuous and rapid evolution of the system, thereis left with no documentation or in some cases the documentations are notconsistent.
So, the research question which is being addressed in this summaryis “Whether it is possible to generate the exact architecture of the underlyingsystem from its implementation details such as the source code using theprocess of reverse engineering”. The different type of the Architectural views that aregenerated by the help of the reverse engineering techniques are (Kouroshfar et al. 2015):1.Module view:This view provides the information about the units of implementation2. Connector andComponent view: It provides the information about the run timebehaviour of the system and the interaction of the components between them.3.
AllocationView: This view provides the relationship between the software entitiesand the non-software elements and their executing environment. Surrogate Model:Since due to lack of consistency in the Architectural documentations, thesurrogate models are generated by using the reverse engineering techniques toobtain an approximation of the software architecture. Some of commonly used reverse engineering methods togenerate the surrogate models are as follows,1. Package View2. Bunch View3. ArchDH View4.
LDA View5. ACDC View 1. Package View:In this method, the packages represent the system architecture’s module.
ForExample, the package structuring in the Java projects represents eacharchitectural module.2. Bunch View:Bunch View is generated by the reverse engineering tool which produces clustersbased upon the dependencies between the classes. Bunch view depends upon theprinciple of source code analysis to convert the source code into directedgraphs which is the representation of the source code artefacts and theirrelationships (Wu et al. 2005). 3.
ArchDH View:Cia et al proposed an Architecture recovery algorithm known as the ArchitectureDesign Rule Hierarchy (ArchDH) (Cai et al. 2013). The steps involved in the ArchDH Algorithm are:1. First the algorithm identifies design rules andallocates them a special position in the architecture. 2.
Then by identifying the source code, there mightbe some parts of the programs may be dependent on the controllers or thedispatchers. The ArchDH Algorithm identifies these controllers or dispatchersand provides them special positions in the algorithm.3.
Then the ArchDH algorithm separates the rest ofthe codes into modules.4. A dependency graph is formed by the rest of thesub system.5. If the sub graph is still large, then the designrules or the controllers separates them within the sub graph recursively.6.
This way the algorithm generates a hierarchywhich is called the design rule hierarchy.4. LDA View: LDA View is generated with the help of theinformation retrieval and data mining techniques such as Latent DirichletAllocation(LDA). LDA analyses the textual similarities between the classes andclusters them into different modules.5. ACDC View:Algorithm for comprehension driven clustering groups program entities based onthe principle of easing comprehension (Tzerpos and Holt 2000).
This algorithm clustersprograms based on the list of system design patterns such as source filepattern, naming pattern etc. After constructing the skeleton, the algorithmthen clusters the left-over elements using the orphan adoption methods. Obtaining Ground TruthArchitecture: Since the above provided Architecture views are just thesurrogate views, the support engineers find it difficult to make chances to thesystem without knowing the complete picture of the system architecture. It isdifficult to maintain the Software Architecture due to the phenomenon of ArchitecturalDrift and Erosion. So, to deal with the Architectural drift and erosion, theground truth architecture is developed. According to Garcia et al, Ground truthArchitecture is defined as the architecture of the system which is verified asaccurate by the system’s architects or developers who have intimate knowledgeof the underlying application and problem domain.
Garcia et al, proposed a framework to recover the groundtruth software architecture. The principles of the framework are also known asthe mapping principles (Garcia et al. 2013).
The mapping principles aresub divided into 4 types. They are: Fig 1. 1. GenericPrinciples: It consists of long standing software engineeringprinciples such as the separation of concerns, isolation of changes, coupling,coherence etc.2. Domain Principles: DomainPrinciples consists of mapping principles based on the domain information.
Domain information consists of data related to the domain of the system inquestion. Example: Retail, Telecom, Banking etc. The Domain principles areobtained from the research literatures, the industry standards or theEngineer’s experience who is working in that domain.
3. ApplicationPrinciples: Application Principles consists of principles that arerelated to application whose architecture is undergoing changes. Applicationprinciples may be obtained from the documentation or the comments from thecode.4. SystemContext: The system context as described in Fig 1 is a grey area whichcontains principles related to the mapping principles and the infrastructure onwhich the application is being built upon.
Process involved in development of the Ground TruthArchitecture (Garcia et al. 2013):Step 1: Use theavailable documentations to get any domain or application specific informationto which can be used to produce the Domain or the Application principles. Step 2: The recovereescan select any of the existing to aid the architecture recovery process. Theuse of the recovery technique induces the generic principles into the recoveryprocess.
Step 3: The nextstep is to extract the implementation level information which is required bythe selected technique.Step 4: In thisstep, the recoverers apply their chosen technique to obtain the initialarchitecture of the system. Step 5: In thisstep, any of the mapping technique obtained in the step 1 can be used to modifythe architecture obtained in the step 4.Step6: Therecoverer must identify any of any of the utility components such as thelibraries, the middleware components and the application frameworks which arebeing used. This is performed because these components affect the quality ofthe recovered architecture of the system. Step 7: By thisstep the recoverer have produced a recovered authoritative architecture thathave been enriched and modified with the help of the different mappingtechniques.
Then the certifier of the system architecture then looks throughthe proposed grouping and may suggest addition of new grouping or splitting upof an existing group in to multiple sub groups or to transfer source codecomponents from one group to another.Step 8: At thispoint the recoverer makes changes to the grouping based on the inputs providedby the certifier. The steps 7 and 8 are repeated by the certifier and therecoverer until both of are satisfied with the results produced.
At the end of the step 8, the ground truth architecture isgenerated for the underlying software system. Since the recovery of the Software architecture is a manualand tiresome process, there are certain automated methods to recover thesoftware architecture from its implementation details. One of the commonly usedautomated method is the Bayesian learning based approach (Maqbool and Babri 2007).
The Bayesian learning basedis used to recover the software architecture of the system automatically wherethere is out of date or the incomplete documentation of the system. Bayesian Learningfor Software Architecture Recovery: According to Maqbooll and Babri, The Bayesian learning takesa probability-based approach to reasoning and inferring results. The NaïveBayes classifier is one of the Bayesian learning method which has beenimplemented to solve many of the practical problems (Maqbool and Babri 2007). According to the Bayesianapproach, the most probable target value vmap, given the attribute values a1,a2, ak.
is given by: Where, f(x) – function which can take value v1,v2,….vjfrom the set V.a1 , a2 ,….
.ak – denotesthe attributes.The Naïve Bayer classifier makes the assumptions simplifiedthat the attribute values are conditionally independent given the target value(i.e) P (a1, a2,….
.,ak|vj)= ? i P(ai,vj)(Maqbool and Babri 2007). So according to the Naïve Bayes classifier, the motsprobable targeted value is given by the equation. ArchitecturalRecovery Tools:There are certain open source tools which are available toautomate the process of the Software Architecture recovery. The three-stepprocess which is used to perform the Architectural recovery are (Armstrong and Trudeau 1998),1.
Extraction: This is the process of extractingthe details of the source model into lower level artefacts such as the classes,variables and functions and the relationship between them. 2. Classification: In this process the lower levelcomponents and their relationships are combined to form more abstractcomponents such as the files, modules and the sub systems. 3. Visualisation: To produce the diagrammaticrepresentations for further analysis. Some of the commonly used architectural recovery tools are,1.
Rigi2. Dali3. The software Bookshelf (PBS)4. CIA (information Abstraction)5.
SNiFF +Rigi: Rigiis a public domain tool from the university of Victoria which is used tounderstand the large information spaces. It can extract, organize, abstract andvisualize components (M et al. 1993).
It consists of a C LanguageParser called the rigiparse and a graphical tool called the rigidit.Dali: Daliis a prototype tool which was developed from the Carnegie Mellon University. Itassists in interpreting the extracted data as architectural information (Kazman et al.
1999). The SoftwareBookshelf (PSB): This tool was developed in the university of Torontoas a prototype reverse engineering tool to work on the legacy systems (Finnigan et al. 1997). It contains 3 differentcomponents namely, a C Language parser called cfx, a relationship abstractiontool called the GORK and the java-based user interface to visualize thearchitectural components called lsedit.
CIA: CInformation Abstraction (CIA) is a relational database developed at AT & TBell Research Laboratory. It is used to extract and store the information aboutthe source code in to the relational database (Chen et al. 1990). CIA contains a tool calledciao, which is used by the programmers to query and visualize the data which ispresent in the CIA database.
SNiFF +: Itwas developed by the Take Five Corporation. It is a extensible and scalableprogramming tool for both C and C++. This tool is used for parsing and information retrieval.Since there is incomplete documentation regarding thesoftware architecture, by using the reverse engineering techniques it might bepossible to obtain the ground truth software architecture. And the automatedreverse engineering tools such as Rigi which is one of the most famous reverseengineering tool which have reduced the manual work in recovering thearchitecture (Armstrong and Trudeau 1998).