Combining data mining and case-based reasoning for intelligent decision support for pathology ordering by general practitioners

https://doi.org/10.1016/j.ejor.2007.11.003Get rights and content

Abstract

Pathology ordering by general practitioners (GPs) is a significant contributor to rising health care costs both in Australia and worldwide. A thorough understanding of the nature and patterns of pathology utilization is an essential requirement for effective decision support for pathology ordering. In this paper a novel methodology for integrating data mining and case-based reasoning for decision support for pathology ordering is proposed. It is demonstrated how this methodology can facilitate intelligent decision support that is both patient-oriented and deeply rooted in practical peer-group evidence. Comprehensive data collected by professional pathology companies provide a system-wide profile of patient-specific pathology requests by various GPs as opposed to that limited to an individual GP practice. Using the real data provided by XYZ Pathology Company in Australia that contain more than 1.5 million records of pathology requests by general practitioners (GPs), we illustrate how knowledge extracted from these data through data mining with Kohonen’s self-organizing maps constitutes the base that, with further assistance of modern data visualization tools and on-line processing interfaces, can provide “peer-group consensus” evidence support for solving new cases of pathology test ordering problem. The conclusion is that the formal methodology that integrates case-based reasoning principles which are inherently close to GPs’ daily practice, and data-driven computationally intensive knowledge discovery mechanisms which can be applied to massive amounts of the pathology requests data routinely available at professional pathology companies, can facilitate more informed evidential decision making by doctors in the area of pathology ordering.

Introduction

Increasing use of clinical pathology services has long been recognized as a worldwide phenomenon in countries with different healthcare systems, and has attracted the attention of researchers, practitioners and governments all over the world. In Australia, general practitioners (GPs) order and manage most of the pathology requests (Cohen et al., 1998). According to the bettering the evaluation and care of health (BEACH) study, a nationwide survey and ongoing program on general practice activity in Australia (Britt et al., 2004), there has been a significant increase in the number of pathology tests ordered per 100 consultations, from 19.7 in 2000–2001 to 35.2 in 2003–2004, representing an increase of almost 20% over the recent 4 years of the BEACH program.

Among various recognized systemic factors influencing the growth of GP pathology utilization (Guibert et al., 2001); an important one is the lack of assurance in the appropriateness of doctors’ decision making when ordering the pathology services (Vining and Mara, 1998, Van Walraven and Naylor, 1998, Lundberg, 1998, Smellie, 2003). Stuart et al. (2002) argue that the wide variation in test ordering, particularly when tests are used for diagnostic purposes, means that some tests may be unnecessary or ordered inappropriately. Smellie et al. (2002) extend this argument to suggest that large differences observed in general practice pathology requesting are accountable for by individual clinical practice and are therefore potentially amenable to change through more consistent and better informed decision making by GPs.

According to Smellie et al. (2005), guidelines available in the area of pathology ordering (such as consensus documents, national policy statements, etc.) are mostly focused on a particular disease and provide (sometimes very) limited advice for specific patient-centric interpretation. The often highly non-trivial task of interpreting the guidelines and matching a particular patient case as specified by the guidelines is frequently left to the individual doctor’s clinical judgment with otherwise very limited decision support. Patient-specific information based on wider professional practice is usually not reflected in guidelines. Overall, current evidence bases in this area are often rather limited and rigid for the purposes of decision support for daily pathology ordering activities dealing with specific patients.

As a consequence, the current inability to routinely generate specific, situationally relevant, and clinically appropriate evidence to support the GPs daily test ordering activities becomes a major obstacle in achieving effective test ordering decision support. This, in turn, hinders the achievement of long-lasting effect of interventions aimed at appropriateness of doctors’ pathology services ordering behavior. Thus, there is a clear need for an effective and robust methodology for generating the required evidence so that this evidence can be used by GPs for decision making.

The objective of this paper is to demonstrate how integrated use of intelligent case base classification and case retrieval methodologies can generate patient-oriented, situationally relevant, and peer-group based evidence in order to facilitate interactive decision support for pathology ordering by GPs.

Specifically, the formal methodologies discussed in this paper are data mining and case-based reasoning (CBR). Data mining is used for discovering and understanding hidden information from complex and large datasets to come up with meaningful patterns (Han and Kamber, 2001). One of the most frequently encountered instances of data mining is data clustering. Clustering involves the process of grouping the data into classes or clusters so that objects within a cluster have high similarity while objects from different clusters are dissimilar. The formal method used in this paper for clustering is Kohonen’s self-organising (feature) maps (SOFM or SOM) (Kohonen, 1982, Kohonen, 1990, Kohonen, 1997). SOM belongs to the class of neural network based tools for unsupervised learning and can be successfully used for data clustering and visualization.

Case-based reasoning can be utilized to solve a new problem by remembering a previous similar situation and by reusing information and knowledge of that situation (Aamodt and Plaza, 1994). Instead of relying on general knowledge of a problem domain, or making associations along generalized relationships between problem descriptors and conclusions, CBR is able to utilize the specific knowledge of previously experienced, concrete problem situations (cases). In medical domains, CBR has mainly been applied to diagnostic and partly to therapeutic tasks (Schmidt et al., 2001).

Data mining techniques, including clustering, have previously been combined with CBR for efficient case retrieval and case base maintenance (Yang and Wu, 2000), automated case generation (Clerkin et al., 2002), and improved case-based classification (Arshadi and Jurisica, 2005). The novelty of this paper is in combining SOM-based data clustering and CBR to facilitate the evidence based, situationally relevant, interactive, and flexible decision support for pathology ordering activities by GPs, specifically addressing complex issues in case base classification and case retrieval. As the topic of the special issue is “Formal Methodologies and Tools for DSS”, the main focus of this paper is on the methodological aspects of the intelligent case base classification and case retrieval for decision support, while the issues of actual systems implementation and performance monitoring are discussed only briefly and are effectively treated as being outside the scope of the paper.

The remainder of this paper is organized as follows: Section 2 presents the decision making context while Section 3 reflects on the strengths and limitations of the decision support tools currently available in the domain; the proposed approach for integrating data mining and CBR methodologies is discussed in Section 4; Section 5 is dedicated to the discussion of tools and techniques for clustering and cluster quality assessment; step-by-step implementation of the proposed approach is described in Section 6; Section 7 concludes the discussion by providing a brief discussion and formulating future research directions.

The content of this paper is partially based on the results reported in Zhuang et al., 2006a, Zhuang et al., 2006b.

Section snippets

Decision making context

For the purposes of the discussion in this paper, we adopt the broad definition of decision support activities (Marakas, 2003) as the set of activities within unstructured or semi-structured decision context that are aimed to support rather than replace the decision maker (DM), facilitate learning on the DM’s behalf, and are using underlying data and models to focus on the effectiveness of the decision making process.

Arguably the main feature of clinical pathology ordering as a decision making

Existing supports for pathology ordering decisions: Strengths and limitations

At present, main information sources available for decision support for pathology ordering include clinical guidelines, general feedbacks from expert pathologists, and the knowledge and experience of the GPs.

Most clinical guidelines have not been developed in a format that allows for straightforward incorporation into computerized clinical decision support systems (Kidd and Mazza, 2000). In practice, clinical guidelines are commonly presented to doctors in paper format or, even when presented

Approach and methodology

With massive amount of test requests data available in pathology laboratories, data mining techniques can be used to discover the requesting patterns in the large repositories held by pathology companies. This can provide GPs with the peer-group evidence of test ordering by other doctors within the system as a comparison to their own ordering behavior. The case-based reasoning approach, on the other hand, is useful in leveraging knowledge encapsulated in previously experienced and resolved

Tools and techniques

In this paper the power of Kohonen’s self-organizing maps (Kohonen, 1982, Kohonen, 1990, Kohonen, 1997) is utilized to mine the data in order to identify homogenous patient groups-based on their demographic information and pathology consumption patterns.

Implementation

In accordance with the integrated approach described in Section 4, the aim of the data mining stage is to discover homogeneous patient clusters from massive pathology ordering data and extract knowledge within the clusters for the use of CBR at later stage. The data provided by XYZ Pathology Company in Australia contain 1,548,122 records of pathology requests by general practitioners (GPs) within the period from 01 May 2003 to 30 April 2004. Each record represents an individual request for one

Discussion and conclusions

In this paper we propose a formal approach that integrates data mining and CBR methodologies to provide intelligent decision support for test ordering by GPs. The rationale for integrating data mining and CBR methodologies is to discover knowledge from past data using data mining, and to retrieve and enable the use of this knowledge through CBR for the purposes of decision support. Table 13 highlights that, as far as practical aspects of decision support are concerned, in comparison with the

References (42)

  • R. Schmidt et al.

    Case-based reasoning for medical knowledge-based systems

    International Journal of Medical Informatics

    (2001)
  • Aamodt, A., Plaza, E., 1994. Case-based reasoning: Foundational issues, methodological variations, and system...
  • M.D. Ahearn et al.

    General practitioners’ perceptions of the pharmaceutical decision-support tools in their prescribing software

    Medical Journal of Australia

    (2003)
  • N. Arshadi et al.

    Data mining for case-based reasoning in high-dimensional biological domains

    IEEE Transactions on Knowledge and Data Engineering

    (2005)
  • M.J.A. Berry et al.

    Mastering Data Mining

    (2000)
  • Bolshakova, N., Azuaje, F., 2003. Improving expression data mining through cluster validation. In: Proceedings of the...
  • Britt, H., Miller, G.C., Charles, J., Knox, S., Valenti, L., Henderson, J., Pan, Y., Bayram, C., Harrison, C., 2004....
  • Clerkin, P., Hayes, C., Cunningham, P., 2002. Automated case generation for recommender systems using knowledge...
  • J. Cohen et al.

    Near-patient testing for serum cholesterol: Attitudes of general practitioners and patients, appropriateness, and costs

    Medical Journal of Australia

    (1998)
  • K. Cox

    Evidence-based medicine and everyday reality

    Medical Journal of Australia

    (2001)
  • Davis, D., Cosenza, R.M., 1993. Business research for decision making. Belmont, Calif.,...
  • G. Deboeck et al.

    Visual Explorations in Finance with Self-Organizing Maps

    (1998)
  • Eudaptics Software: Viscovery SOMine Standard Edition 3.0. Eudaptics Software Gmbh Wien,...
  • Guibert, R., Wicker, S., Horrocks, M., 2001. Background Reading for QUP-GP workshop. The development of a research...
  • J. Han et al.

    Data Mining: Concepts and Techniques

    (2001)
  • M. Kantardzic

    Data Mining: Concepts, Models, Methods, and Algorithms

    (2003)
  • P.G.W. Keen et al.

    Decision Support Systems: An Organizational Perspective

    (1978)
  • R. Kennedy et al.

    Solving Data Mining Problems Through Pattern Recognition

    (1998)
  • M.R. Kidd et al.

    Clinical practice guidelines and the computer on your desk

    Medical Journal of Australia

    (2000)
  • T. Kohonen

    Self-organized formation of topologically correct feature maps

    Biological Cybernetics

    (1982)
  • T. Kohonen

    The self-organizing map

    IEEE Proceedings

    (1990)
  • Cited by (76)

    • Learning method for knowledge retention in CBR cost models

      2018, Automation in Construction
      Citation Excerpt :

      Generally, the CBR problem-solving process consists of the following four steps: retrieval, reuse, revision, and retention [1]. CBR has been broadly applied across industries and used for discovering medical knowledge [10,11,27,39], managerial-decision support [2,33], healthcare management [16], educational applications [15], the diagnosis of power-transformer faults [28], cost estimation [4,6,19,21,22,37,38], international-market selection [25], decision-making support [7,8,24], planning/scheduling [21,30,34,37], safety-hazard identification [12], and the prediction of litigation outcomes [35]. At the conceptual level of the CBR model, appropriate cases are retrieved from a database to solve a problem and reused to treat the problem.

    View all citing articles on Scopus
    View full text