-
A Retrospective on Whole Test Suite Generation: On the Role of SBST in the Age of LLMs
Gordon FraserAndrea Arcuri
Keywords:Test pattern generatorsJavaSoftware engineeringSearch problemsSoftware algorithmsSoftware testingOptimizationSystem testingPythonSoftwareTest SuiteLarge Language ModelsTest Suite GenerationSearch-based Software TestingSoftware EngineeringSoftware TestingUnit TestsIEEE TransactionsSearch AlgorithmEvolutionary AlgorithmsFitness FunctionIndividual TestIndustrial SystemsOpen-source ProjectsCall SequencesTest Case GenerationEvoSuiteSBSTLLMPynguinEvoMaster
Abstracts:This paper presents a retrospective of the article “Whole Test Suite Generation”, published in the IEEE Transactions on Software Engineering, in 2012. We summarize its main contributions, and discuss how this work impacted the research field of Search-Based Software Testing (SBST) in the last 12 years. The novel techniques presented in the paper were implemented in the tool EvoSuite, which has been so far the state-of-the-art in unit test generation for Java programs using SBST. SBST has shown practical and impactful applications, creating the foundations to open the doors to tackle several other software testing problems besides unit testing, like for example system testing of Web APIs with EvoMaster. We conclude our retrospective with our reflections on what lies ahead, especially considering the important role that SBST still plays even in the age of Large Language Models (LLMs).
-
A Retrospective of ChangeDistiller: Tree Differencing for Fine-Grained Source Code Change Extraction
Beat FluriMichael WürschMartin PinzgerHarald Gall
Keywords:CodesSource codingSoftware algorithmsSoftwareData miningTaxonomySyntacticsSoftware engineeringHeuristic algorithmsComputer bugsSource CodeDifferencingCode ChangesResearch In The FieldSoftware RepositoriesAbstract Syntax TreeMachine LearningProgramming LanguageMean Absolute ErrorSoftware DevelopmentPrecision And RecallProgram VersionLanguage ModelSoftware EngineeringOpen-source ProjectsRefactoringDependency GraphBug FixesCode ElementsSource code change analysischange typestree editsmining software repositories
Abstracts:In the early development of source code change analysis, methodologies primarily relied on simple textual differencing, which treated code as mere text and identified changes through lines that were added, modified, or deleted. This approach overlooked the rich semantic information embedded within the code, highlighting significant limitations in textual analysis and differencing that required a more precise and language-aware foundation. Our research on ChangeDistiller pioneered the use of abstract syntax trees and associated tree edits for change analysis. We were among the first to introduce a tree-differencing algorithm for source code, enabling a fine-grained examination of modifications. ChangeDistiller has since been widely adopted by researchers in the field of mining software repositories. This paper reflects on the evolution of our technique, its influence on subsequent research, and its role in the advancement of change analysis methodologies. In addition, we explore how contemporary techniques and tools can draw on our foundational work to enhance their effectiveness.
-
A Retrospective on Mining Version Histories to Guide Software Changes
Thomas ZimmermannPeter WeißgerberStephan DiehlAndreas Zeller
Keywords:SoftwareData miningHistorySoftware engineeringControl systemsSoftware development managementConferencesSource codingNavigationCodesSoftware EngineeringIEEE TransactionsMining SoftwareSoftware RepositoriesOpen-sourceEmpirical ResearchEarly 2000sSystem SoftwareSoftware DevelopmentActual KnowledgePhD StudentsData SharingAssociation RulesSoftware QualityCode ChangesOpen-source ProjectsEarly Career ResearchersSoftware ProjectsAssociation Rule MiningCode ReviewVersion Control SystemSoftware ArtifactsBell LabsGraph LayoutMining software repositoriesrecommendation systemretrospective
Abstracts:Twenty years ago we published a paper titled “Mining Version Histories to Guide Software Changes” in the IEEE Transactions of Software Engineering. The paper is considered to be one of the seminal papers of the mining software repositories (MSR) field. In this retrospective, we reflect on the original work, the field of mining software repositories and its community, and its impact on software engineering.
-
The Evolution of Automated Software Repair
Claire Le GouesThanhVu NguyenStephanie ForrestWestley Weimer
Keywords:Maintenance engineeringComputer bugsSoftwareCodesStandardsManualsDebuggingTestingGenetic programmingEvolution (biology)Machine LearningGene Regulatory NetworksSoftware EngineeringDesign DecisionsDebuggingProof Of The ExistenceTraining SetUser ExperienceSoftware DevelopmentRandom MutationsNeutral MutationsSynthesis ToolContinuous IntegrationHuman-in-the-loopAI ModelsBug ReportsMutational RobustnessAutomatic programmingcorrectionstesting and debuggingevolutionary computation
Abstracts:GenProg implemented a novel method for automatically evolving patches to repair test suite failures in legacy C programs. It combined insights from genetic programming and software engineering. Many of the original design decisions in GenProg were ultimately less important than its impact as an existence proof. In particular, it demonstrated that useful patches for non-trivial bugs and programs could be generated automatically. Since the original publication, research in automated program repair has expanded to consider and evaluate many new methods, contexts and defects. As code synthesis and debugging techniques based on machine learning have become popular, it is informative to consider how views on perennial issues in program repair have changed, or remained static, over time. This retrospective discusses the issues of repair quality (including the role of tests), use cases for automated repairs (including the role of humans), and why these approaches work at all.
-
A Reflection on Change Classification in the Era of Large Language Models
Sunghun KimShivkumar ShivajiJim Whitehead
Keywords:Computer bugsCodesSoftwareMachine learningSource codingTrainingHistoryData miningTraining dataSoftware measurementLarge Language ModelsMachine LearningOpen-sourceSource CodeSoftware DefectCode ChangesDefect PredictionBug FixesTraining DataMachine Learning TechniquesIndustrial SettingsCoded TextSoftware ProjectsBug ReportsClassification-based bug predictionJust-in-Time defect predictionAI explainabilityLLM bug prediction
Abstracts:Change classification, today known as Just-in-Time Defect Prediction, is a technique for predicting software bugs at the change level of granularity. Several ideas came together to form change classification: predictions on code changes, using word-level textual features, use of machine learning classifiers, and leveraging open source code repositories. While change classification has led to a robust line of research, it has not yet had significant industrial adoption. A key recommendation is to explore explainability features so developers can better understand why a prediction is being made. We explore how large language models can advance this work by providing prediction explanations and bug fix suggestions.
-
Retrospective: Data Mining Static Code Attributes to Learn Defect Predictors
Tim Menzies
Keywords:CodesMeasurementSoftwareData miningPredictive modelsSoftware engineeringInternetData modelsOptimizationMarket researchData MiningDefect PredictionStatic CodeResearch CommunitySoftware EngineeringTransfer LearningMulti-objective OptimizationStatic AnalysisStatic MeasurementsPromising DataSoftware DefectStrangenessSearch-based software engineeringmulti-objective optimizationsoftware engineering
Abstracts:Industry can get any research it wants, just by publishing a baseline result along with the data and scripts need to reproduce that work. For instance, the paper “Data Mining Static Code Attributes to Learn Defect Predictors” presented such a baseline, using static code attributes from NASA projects. Those result were enthusiastically embraced by a software engineering research community, hungry for data. At its peak (2016) this paper was SE's most cited paper (per month). By 2018, twenty percent of leading TSE papers (according to Google Scholar Metrics), incorporated artifacts introduced and disseminated by this research. This brief note reflects on what we should remember, and what we should forget, from that paper.
-
A Retrospective on How Developers Seek, Relate, and Collect Information About Code
Amy J. KoBrad A. MyersMichael CoblenzHtet Htet Aung
Keywords:CodesNavigationDebuggingCognitive scienceCognitionTechnological innovationMarket researchTestingSoftware algorithmsSoftwareEmpirical StudiesDevelopment Of ToolsSoftware DevelopmentCommand LineSoftware EngineeringWork MotivationIntegrated Development EnvironmentFirst AuthorDecades Of ResearchWork SettingsUser-generated ContentMental ModelsDebuggingVariable NamesCode SearchSoftware evolutionprogramming environments
Abstracts:In the early 2000s, software development was shifting from offline to online, and from command line to IDE. We discuss our 2004 paper examining the impact of this shift on developers’ program comprehension behaviors, our motivation for the work, and it's impact on the last twenty years of empirical studies and developer tool innovations. We end with a discussion of the possible unintended impacts LLMs have on program comprehension in the coming decades.
-
Software Architecture Description Revisited
Nenad MedvidovićRichard N. TaylorEric M. Dashofy
Keywords:Unified modeling languageComputer architectureSoftware architectureConnectorsSoftware systemsAnalytical modelsXMLSystematicsTaxonomyLawSoftware ArchitectureSystem SoftwareAspects Of SystemLanguage ModelUnified Modeling LanguageUnderstanding Of ArchitectureSystem DesignProgramming LanguageSystem ArchitectureModel ArchitectureDesign DecisionsSoftware ComponentsArchitectural StyleEnterprise Resource PlanningView Of ArchitectureSoftware architecturemodelingarchitecture description languageADL
Abstracts:Many languages for modeling various aspects of software systems’ architectures have been proposed over the past three decades. In the late 1990s, we provided the first systematic foundation for understanding, classifying, and comparing the quickly emerging architecture description languages (ADLs). This culminated in a 2000 IEEE TSE publication, which has subsequently been referenced widely. In this paper, we revisit the 2000 framework, consider how it influenced a foundational study of the suitability of the Unified Modeling Language (UML) as an ADL and influenced the development of an extensible ADL (xADL) with a set of highly innovative features. We show how further work with modeling efforts led to a new and deeper understanding of software architecture itself. We conclude by analyzing a series of recent developments that have reshaped the software architecture landscape, posing new questions about the nature of architecture description.
-
Recovering Traceability Links Between Code and Documentation: A Retrospective
Giulio AntoniolGerardo CanforaGerardo CasazzaAndrea De LuciaEttore Merlo
Keywords:SoftwareSource codingSoftware engineeringCodesUnified modeling languageMaintenanceData miningProbabilistic logicComputer bugsVectorsTrace LinksSource CodeNatural LanguageSystem SoftwareInformation RetrievalSoftware EngineeringEmergence Of New TechnologiesDesign DocumentsProbabilistic ModelCut-pointsLanguage ModelPart-of-speechTopic ModelingAdvanced Machine LearningPostageReverse EngineeringRanked ListArtificial Intelligence SystemsTerm Frequency-inverse Document FrequencyRecognition ProgramSoftware ArtifactsBug ReportsEnhancement StrategiesSoftware MaintenanceLife Cycle ProcessesText SimilarityConstant ThresholdProcessing SoftwareText MiningRedocumentationinformation retrievalobject orientationprogram comprehensiontraceability
Abstracts:Software system documentation is almost always expressed informally in natural language and free text. Examples include requirement specifications, design documents, manual pages, system development journals, error logs, and related maintenance reports. In our 2002 seminal paper we proposed a method based on information retrieval to recover traceability links between source code and free text documents. A premise of our work was that programmers use meaningful names for program items, such as functions, variables, types, classes, and methods. The paper paved the way to the adoption of IR in software engineering opening a new perspective. Reflecting on the past twenty years we briefly overview the many results that have been achieved, however, the emergence of new technologies, such as AI, pose unprecedented challenges.
-
“Estimating Software Project Effort Using Analogies”: Reflections After 28 Years
Martin Shepperd
Keywords:SoftwareCognitionTrainingBenchmark testingSoftware engineeringAccuracyCostsSystematic literature reviewReflectionProject managementSoftware ProjectsOpen ScienceSoftware EngineeringEmpirical ValidationRoot Mean Square Error Of Cross-validationMachine LearningTraining SetTraining DataNull HypothesisResearch CommunityDecision TreeStepwise RegressionPrediction SystemAccuracy MetricsEvidence In FavourDigital Object IdentifierCase-based ReasoningPredictionproject managementeffort predictionanalogical reasoningempirical validationreproducibility
Abstracts:This invited paper is the result of an invitation to write a retrospective article on a “TSE most influential paper” as part of the journal's 50th anniversary. The objective is to reflect on the progress of software engineering prediction research using the lens of a selected, highly cited research paper and 28 years of hindsight. The paper examines (i) what was achieved, (ii) what has endured and (iii) what could have been done differently with the benefit of retrospection. While many specifics of software project effort prediction have evolved, key methodological issues remain relevant. The original study emphasised empirical validation with benchmarks, out-of-sample testing and data/tool sharing. Four areas for improvement are identified: (i) stronger commitment to Open Science principles, (ii) focus on effect sizes and confidence intervals, (iii) reporting variability alongside typical results and (iv) more rigorous examination of threats to validity.