Presented to the Interdisciplinary Studies Program: Applied Information Management and the Graduate School of the University of Oregon in partial fulfillment of the requirement for the degree of Master of Science CAPSTONE REPORT University of Oregon Applied Information Management Program Continuing Education 1277 University of Oregon Eugene, OR 97403-1277 (800) 824-2714 Identifying and Prioritizing Information Quality Dimensions for Assurance in the Pre- Processing Stage of Data Storage for Business Intelligence Hope Angel Information Systems Manager Pacific Star Corporation February 2011 i ii Approved by ________________________________________________________ Dr. Linda F. Ettinger Senior Academic Director, AIM Program iii iv Identifying and Prioritizing Information Quality Dimensions for Assurance in the Pre-Processing Stage of Data Storage for Business Intelligence Hope Angel Pacific Star Corporation Running head: IDENTIFYING AND PRIORITIZING INFORMATION QUALITY iv IDENTIFYING AND PRIORITIZING INFORMATION QUALITY vi Abstract As business intelligence systems increase the amount of information stored in data warehouses, quality of content becomes more critical (Fisher, Lauria, Chengalur-Smith, & Wang, 2008). Selected literature published between 2001 and 2011 is analyzed to define key dimensions of information quality for consideration in the pre-processing stage, before data reach the warehouse, to ensure maximum quality assurance. The goal is to provide a framework to prioritize dimensions that align with business intelligence goals and objectives. Keywords: data mining, business intelligence, information quality, information quality assurance, competitive advantages, knowledge discovery, data analytics, and data warehouse. IDENTIFYING AND PRIORITIZING INFORMATION QUALITY vii IDENTIFYING AND PRIORITIZING INFORMATION QUALITY viii • Table of Contents Abstract .......................................................................................................................................... vi   Table of Contents......................................................................................................................... viii   List of Figures and Tables.............................................................................................................. xi   Purpose...................................................................................................................................... 13   Problem....................................................................................................................................... 4   Significance ................................................................................................................................ 5   Audience/Outcome of Study....................................................................................................... 7   Research Delimitations ............................................................................................................. 10   Time frame............................................................................................................................ 10   Selection criteria.. ................................................................................................................. 11   Outcome/Audience ............................................................................................................... 11   Topic focus............................................................................................................................ 12   Inquiry context. ..................................................................................................................... 13 Data Analysis Plan Preview...................................................................................................... 13   Writing Plan Preview................................................................................................................ 14   Definitions..................................................................................................................................... 16   Research Parameters ..................................................................................................................... 28   Research Design ....................................................................................................................... 28   Research Questions and Sub-questions .................................................................................... 29   Search Strategy Report ............................................................................................................. 30   Search terms.......................................................................................................................... 30   IDENTIFYING AND PRIORITIZING INFORMATION QUALITY ix Search engines.. .................................................................................................................... 31   Search strategies.................................................................................................................... 31   Search Results........................................................................................................................... 32   Evaluation Criteria.................................................................................................................... 33   Documentation Approach ......................................................................................................... 35   Data Analysis Plan.................................................................................................................... 36   Coding process. . .................................................................................................................. 37   Writing Plan.............................................................................................................................. 39   Annotated Bibliography................................................................................................................ 42   Review of the Literature ............................................................................................................... 70   Conclusions................................................................................................................................... 84   References..................................................................................................................................... 88   Appendix A – Search Record ....................................................................................................... 97   Appendix B – References Selected for Coding .......................................................................... 101   IDENTIFYING AND PRIORITIZING INFORMATION QUALITY x IDENTIFYING AND PRIORITIZING INFORMATION QUALITY xi List of Figures and Tables Figure 1: The Relationship of Business Intelligence (BI) to Other Information Systems.............. 1 Figure 2: The Concept of Information Quality as a Trusted Source for Decision Makers to Meet BI Goals and Objectives ........................... .............................................................................. .71 Table 1: Audience Profile indicating Categories, Profession Characteristics, and Titles .............. 9 Table 2: Summary of Search Engine and Database Results ......................................................... 33 Table 3: Summary of Documentations Method and Coding Plan ................................................ 36 Table 4: Summary of Information Quality DImensions and Definitions ..................................... 85 IDENTIFYING AND PRIORITIZING INFORMATION QUALITY xii IDENTIFYING AND PRIORITIZING INFORMATION QUALITY xiii Introduction to the Literature Review Purpose Business intelligence (BI) is defined as a decision support system that uses data mining techniques to extract information from data warehouses and predict future patterns for decision makers (Andersson, Fries, & Johansson, 2008; Negash, 2008). BI can be thought of as “extracting and analyzing relevant information and making it accessible for support in the decision-making process” (Andersson et al., 2008, p. 3). BI extrapolates and captures information from many other systems, such as online analytical processing tools (OLAP), data mining, decision support systems (DSS), and geographic information systems (GIS), among others (Negash, 2008). Figure 1 depicts some of the information systems that are used by BI. Figure 1. The relationship of business intelligence to other information systems (Negash, 2008) According to Negash (2008) and McGilvray (2008), BI converts captured master data (i.e., key operational data) into useful information and, through analysis, into knowledge that is used to gain a competitive advantage. Lupu, Razvan, Sabau, and Muntean (2007) note that BI is the process of getting enough of the right information in a timely manner and in a usable form, and then analyzing the classification and metadata schemas to create a positive impact on the IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 2 integrity of the business and management information systems. In order to best ensure this outcome, Andersson et al. (2008) proposes devoting a significant portion of time in the pre- processing stage to identify and prioritize the dimensions that influence information quality assurance to ensure quality of content for storage in data warehouses. The purpose of this study is to address dimensions identified in selected literature that most influence information quality assurance in the pre-processing stage of data stored in warehouses (Hakim, 2007a; Jafar, 2010). The goal is to identify and prioritize dimensions of information quality that align with BI goals and objectives for use in the pre-processing stage to improve and ensure integrity and consistency before the information reaches the warehouses. Su, Peng, and Jin (2009) describe key information as a vital business asset, and report that “information quality is a critical factor for the successful development of data warehouses and implementation of data mining” (p. 332). Information quality is not linear; identifying and prioritizing multiple dimensions such as accuracy, completeness, consistency, and timeliness, among others, is critical for effective information quality assurance strategies (Kahn, Strong, & Wang, 2002). In order for a company to support BI, Popovic, Coelho and Jaklic (2009) state that tools such as data mining processes and information quality assurance assessments must align with and be embedded into every step of the pre-processing stage. These tools provide accurate and easily retrievable key information to improve decision making for increased performance management and competitive advantages in BI (Watson & Wixom, 2007). According to Lamont (2010) and Negash (2008), competitive advantages are a form of competitive intelligence in BI: constantly analyzing the existing market for any relevant changes, and adapting to those changes. Negash (2008) states that when combined with data mining tools, timeliness, consistency, and IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 3 quality of information improves the decision-making process and creates competitive advantages. Corporate goals and existing strategies provide a basis for analyzing BI resources (Davenport & Harris, 2007). This analysis is necessary to identify competitive advantages and disadvantages, which are the strengths and weaknesses of a corporation relative to its present and likely competitors (Lamont, 2010; Negash, 2008). However, in order to maximize information quality assurance, it is crucial to understand the quantitative and qualitative value of the data available to decision makers (Seng & Chen, 2010). Computer systems, such as document management systems and enterprise content management, can only assist after the quality of information is assured by collecting, managing, storing, and retrieving content to better achieve the aims and goals of a comprehensive BI system (Negash, 2008; Olson, 2003). In one case example, a study conducted by Lupu et al. (2007) observes dimensions that influence information quality assurance of a real-world industry project. Analyses performed on levels of information quality of the data mining process for successful decision-making recommendations focus on the dimensions influencing and affecting project development, and solutions fulfilling dynamic BI requirements (Lupu et al., 2007). This study examines similar techniques to those presented by Lupu et al. (2007) but on a smaller scale, with specific emphasis on identifying and prioritizing dimensions of information quality that align with goals and objectives to assure information quality before the data reaches the warehouses (Cong, Fan, Geerts, Jia, & Shuai, 2007). So while the analysis performed by Lupu et al. (2007) of data mining standards for managing financial resources is indirectly related to BI, it brings clarity and focus to the research problem in this study. The data analysis process in this study is designed to identify dimensions that most influence the quality of key information in the early stages of data preparation, so that data can IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 4 be correctly analyzed by data mining tools (Fayyad & Uthurusamy, 2002; Negash, 2008). Once information quality assurance is in place, Popovic et al. (2009) predict that pragmatic data mining techniques produce competitive advantage information and improve the decision-making process universally. According to English (2005) and McGilvray (2008), investing in information quality assurance is a means of showing benefits in returns on investment (ROI). Thus, the underlying assumption of this study is that establishing effective information quality assures capitalization of advantages and opportunities in the form of increased ROI for BI (Keeton, Mehra, & Wilkes, 2009). Problem Businesses recognize that change is constant, and adapting quickly to new demands is an opportunity to employ competitive business advantages and opportunities (Wixom & Watson, 2001). Data volumes have grown from megabytes to gigabytes to terabytes; some corporate databases are approaching one petabyte, a unit of information equal to one quadrillion bytes of memory (Davenport & Harris, 2007; Klein, 2002). However, while organizations have more data than ever at their disposal, sufficient capture, cleansing, and enhancement processes must be imposed to avoid data decay and de-duplication of the information (McGilvray, 2008; Panin, 2006). Two specific processes are noted in the literature: (a) knowledge discovery in databases (KDD), which is a process of extracting and capturing useful knowledge from increasing volumes of data (English, 2009; Lupu et al., 2007; Web4All, 2010), and (b) effective document management systems such as KDD, that combine data gathering, data mapping, data storage, and knowledge management with analytical tools to track, store, and efficiently extract information from data stored in warehouses (Andersson et al., 2008; Negash, 2008). IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 5 McGilvray (2008) and Klein (2002) suggest that information quality problems are not restricted to any particular entity, and that IT teams are responsible for the quality of the systems that store and move the data, but not the content. Thus, a well-designed management information system is a pivotal performance indicator and starting point to provide timely, effective, and intuitive knowledge for decision makers in BI systems (Gallo, 2010; McGilvray, 2008; Popovic et al., 2009; Su et al., 2009). Seng and Chen (2010) suggest that data mining for business decisions requires an analytical approach to reducing data in order to manage, analyze, and apply it. Extract, transform, and load (ETL) is a common three step approach designed for data transformation and integration; it is used in data mining to extract information, index it, and load it into a target database (Keeton et al., 2009; McGilvray, 2008). In the case of structured data, analysts use Enterprise Resource Planning (ERP) to create BI information by searching, analyzing, and delivering information to the decision maker (English, 2005). The data mining process starts with analysis, and understanding the characteristics of the attributes of the data is critical so the analyst can accurately process and present the results (Jafar, 2010). Accurate information can lead to improved business performance; however, the data mining process can only generate useable patterns from data when information quality assurance is in place in the pre-processing stage (Andersson et al., 2008; English, 2009). Significance Improvements in technology have significantly increased the amount of data that can be stored; however, many organizations struggle with the ability to manage, analyze, and apply it successfully to BI (Davenport & Harris, 2007). Information must be managed as a resource and as an asset; it must be recent, relevant, and an accurate reflection of real-world environments to IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 6 help the business meet its goals (McGilvray, 2008). This study is significant for several reasons: (a) improving information quality assurance and data mining processes are burning issues for businesses, (b) focusing on dimensions that most influence information quality assurance at the beginning allows data mining tools to successfully search through structured information, and (c) enabling smarter decision making techniques ensures BI success, which is the goal of all businesses and organizations (English, 2005; Lee et al., 2006; Popovic et al., 2009; Wang & Wang, 2007). According to Web4All (2010), “the most successful companies are those that can respond quickly and flexibly to market changes and opportunities; the key to this response is the effective, efficient, ease of use of data and information” (p. 1). English (2009) notes that inaccurately defining data in the early stages, mismatching definitions and the simple reality of real-world object changes can produce obscured, misidentified, or incorrectly interpreted trend findings. McGilvray (2008) finds that poor data quality impacts project timelines, hampers data mining processes, and reduces confidence in data analysis results. As a result, BI processes fail when vital information is captured inaccurately (English, 2009; Forcada, Casals, Fuertes, Gangolells, & Roca, 2010). Andersson et al. (2008) find that timely, accurate knowledge contributes to improved business performance. Indeed, for organizations that depend on data for decision-making processes, information quality assurance is one of the key determinants of the quality of their decisions and actions (Stvilia, Gasser, Twidale, & Smith, 2007). According to Halonen and Thomander (2008), the ability to assure quality information prior to data reaching the data warehouse is significantly important to enhancing the decision-making process and identifying competitive advantages for BI. IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 7 Past methods for delivering BI solutions focus on quick turnarounds, rather than embedding the implementation of guidelines into each and every step along the way (Gallo, 2010; IBM, 2010). Lupu et al. (2007) and Gallo (2010) find that simply extracting unstructured or undefined data stored in a warehouse does not provide a viable response to changing business needs. Moreover, failing to address issues of unstructured data reduces information quality both in data collection and definitions (Forcada et al., 2010). English (2005) describes information quality assurance as critical to BI success; in fact, he states that “problems in information definition, data content, data preparation, and misinformation can cause BI processes to fail” (p.1). Lack of information quality assurance compromises data integrity, and is prevalent in companies experiencing inefficient, non-integrated reports and analyses (Fisher, Lauria, Chengalur-Smith, & Wang, 2008; Lupu et al., 2007; Stvilia et al., 2007). In order for BI decision-making processes to be successful, information quality assurance must be in place early on (Gallo, 2010; Popovic et al., 2009; Su et al., 2009). Lupu et al. (2007) find that empowering dynamic analysis and making the right decisions towards a competitive advantage can only be obtained by focusing on dimensions that most influence information quality assurance in the pre-processing stage of data storage. Thus the key to maximizing information quality becomes getting the right set of structured information to the right people at the right time, for their use in decision making to achieve company goals (IBM, 2009; McGilvray, 2008). Audience/Outcome of Study Everyone makes decisions; enabling smarter decision-making techniques from every level of a business is what makes businesses intelligent (IBM, 2010). BI is widely used and has become a strategic initiative recognized by CIOs and business leaders as instrumental in driving IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 8 business effectiveness and innovation (Rodriguez, Daniel, Casati, & Cappiello, 2010; Watson & Wixom, 2007). Cong et al. (2007) suggest that demonstrating the ability to identify dimensions that influence information quality assurance combines knowledge with information, thus producing successful BI. According to IBM (2009), businesses are most likely to reach desired outcomes when they have access to “complete, consistent, and trustworthy information” (p.1). Thus it is critical that analysts have the right information before decisions are sorted out and weighed in order to make full use of BI capabilities (Hakim, 2007b; Keeton et al., 2009). Negash (2008) finds that the demand for BI applications continues to grow even if the demand for IT products does not. It is implied, then, that BI provides actionable information “delivered at the right time, at the right location, and in the right form to assist decision makers” (Negash, 2008, p. 178). However, all businesses and organizations, at one point or another, confront information quality problems (Lee, Pipino, Funk, & Wang, 2006). Knowing the business, its market, its customers, and its competition is a precursor to understanding how information quality is defined for any organization (IBM, 2010). The audience for this study is executives and professionals within organizations who require data analysis as key performance indicators (KPI) to generate analytical solutions for increasing revenue and moving the company to the forefront of the competition (McGilvray, 2008). This group includes knowledge workers, technologists, and professionals (Gallo, 2010; Kriegel, Borgwardt, Kroger, Pryakhin, Schubert, & Zimek, 2007; McGilvray, 2008). They must have access to a reliable system for creating, processing, and enhancing their own knowledge (McGilvray, 2008). In general, this study is designed to be beneficial to key decision makers faced with dynamic corporate goals and demands (Lee et al., 2006). Due to the rapid global expansion of information-based transactions and interactions being conducted via the Internet, IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 9 there is an increased demand for a workforce that is capable of performing these activities (Haag, Cummings, McCubbrey, Pinsonneault, & Donovan, 2006). In fact, Haag et al. (2006) note that knowledge workers are now estimated to outnumber all other workers in North America by at least a four to one margin. Table 1 illustrates the audience profile, characteristics, and professions of those who are most likely to benefit from information quality for decision making in BI. Table 1 Audience Profile indicating Categories, Profession characteristics, and Titles Broad Category of Audience Characteristics Job Title Knowledge Workers Oriented towards research and analysis of data; thus quality is essential to outcome. Chief Information Officer (CIO) Chief Knowledge Officer (CKO) Knowledge Manager (KM) Content Manager Knowledge Steward Program Managers Project Managers Project Team Members Executives, Sales, Marketing Finance, Legal, Human Resources Knowledge Technologists Focus is on developing an increasing value of intellectual capital, gaining insight into customer preferences, and a variety of other important gains in knowledge that aid the business. Computer Analysts Software Designers Software Analysts IT Professionals Administrative Assistants Knowledge Professionals Professionals who are valued for their ability to act and communicate with knowledge within a specific subject area. Teachers, Librarians Lawyers, Architects Practitioners, Physicians, Nurses Engineers, Scientists The intended outcome of this study is a framework that identifies and prioritizes key dimensions of information quality that align with BI goals and objectives to ensure the effectiveness of information quality in the pre-processing stage of data storage. Furthermore, IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 10 dimensions that the selected literature indicates are the most influential for the assurance of information quality in the pre-processing stage of data storage within the context of BI are addressed. Decision makers, such as knowledge workers, executives, and professionals faced with dynamic corporate goals and demands, must consider the dimensions of information quality from the perspective of the users of data (Lee et al., 2006). Thus dimensions are addressed in relation to the goal of successful decision-making to gain competitive advantages for BI, which is identified in the selected literature (Lamont, 2010; Negash, 2008). In this context, competitive advantage refers to the strengths of a corporation relative to its present and likely competitors by constantly analyzing the existing market for any relevant changes, and adapting to the changes quickly (Lamont, 2010; Negash, 2008). The framework for identifying and prioritizing key dimensions is organized around two themes: (a) discussing the role of information quality assurance, and (b) examining information quality dimensions and the effect that they have on information quality assurance in the pre-processing stage of data storage, within the context of BI. Research Delimitations Time frame. Thiesse, Floerkemeir, Harrison, Michahelles, and Roduner (2009) suggest that the large number of publications in recent years could all be potentially viewed as contributing to the field of BI; however, due to recent advances in BI, it is best to focus on references published within the last five to ten years. Thus, the references provided in this study are limited to publishing dates between 2001 through 2011. The focus on this period is seen as covering the rapid changes in information quality assurance and data mining that are shaping BI as it has evolved today (Davenport & Harris, 2007). Older research results are rendered obsolete by the myriad changes in the aspects aligning BI maxims with IT, thus this time frame excludes IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 11 older research and practices that may not reflect the current advancements in BI (Seng & Chen, 2010). In order to reduce the likelihood of obsolete information becoming a part of the focus, resources published prior to January 2001 are not used in this literature review. Selection criteria. Literature is selected from peer-reviewed scholarly resources, and others that have been deemed to be authored by an authority on the topic, including business publications, whitepapers, online academic journals, and books, using keyword searches, controlled terms, and scope notes (Bell & Smith, 2007; Ormondroyd, Engle, & Cosgrave, 2009). References selected for this literature review are directly relevant to information quality assurance and the dimensions that most influence it. Additional references are used to establish the framework for this review, such as those that associate data mining with the decision-making process of BI. They are then carefully evaluated to gauge both relevancy and credibility (Bell & Smith, 2007; Creswell, 2009). References meeting the requirements are compared to identify commonalities among dimensions that have been proven to have the most influence on information quality assurance. According to Creswell (2009), scholarly material provides a practical and theoretical context for the study and is useful for developing a framework for comparing results of this study with other studies. Thus, all literature is reviewed for quality of methods, results, and conclusions and is included in or excluded from this study based on usefulness, breadth of scope, quality, publishing date, accessibility, and language (Bell & Smith, 2007; Creswell, 2009; Leedy & Ormrod, 2010). This study only includes literature that is available for viewing online or reproducible in hard copy. Outcome/Audience. This literature review is designed to produce a framework that addresses identifying and prioritizing key dimensions of information quality in the pre- IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 12 processing stage of data storage that align with unique BI goals and objectives (McGilvray, 2008). The intended outcome is based on identification of dimensions that most influence information quality assurance for consideration by those who are responsible for data storage and data mining processes, specifically in an IT business environment, such as knowledge workers, knowledge technologists, and knowledge professionals, including project managers, project team members, executives, and IT professionals and others (Haag et al., 2006; Zhao, Chen, & Yao, 2006). The audience should be familiar with the requirements for data mining within the context of a business environment in order to accurately determine how the identified dimensions should be applied for information quality assurance (Lefebvre, 2007). This study is not designed to benefit other audiences such as educational and non-profit agencies who may have an interest in implementing a BI system but may not be directly involved in the data mining or analysis processes (Lefebvre, 2007). Topic focus. In order to generalize this study, literature is selected that includes the larger context of data mining for BI; however, the study is limited to one aspect of BI, namely information quality assurance in the pre-processing stage of data storage (Obenzinger, 2005). The scope of this study is further limited to the identification of dimensions of information quality that have the most influence on information quality assurance, specifically in terms of completeness, consistency, and trustworthiness, in the pre-processing stage of data storage to ensure that information can be successfully retrieved and correctly analyzed by data mining tools (Wixom & Watson, 2001). The search for dimensions of information quality that have the most influence on information quality assurance is limited to those that are relevant to the pre-processing stage of data storage within the context of BI. However, in order to address this aspect it is necessary to IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 13 address its association with data storage, mining, and analysis processes for BI. There are a number of other related issues in BI systems that are excluded from this literature review. Specifically, the excluded subject areas are those that address data mining patterns and particular decision-making strategies for gaining competitive advantages in BI. Additionally, this study excludes the requirements and steps taken for data mining and analysis processes. Also excluded are the detailed decision-making processes that are necessary for successful BI systems. The decision to exclude these areas is designed specifically to place the focus of the inquiry on the dimensions that influence information quality assurance and lead to the success of BI, not on the outcomes. Additionally, the inclusion of these areas would have been beyond the reach of this study, given the limited time for conducting the study. Inquiry context. The problem, sub-topic, and audience selection are framed based upon real-world challenges to retrieving unstructured information that is stored in data warehouses (Piatetsky-Shapiro et al., 2009). For example, many companies compete on the basis of their analytical capabilities by using BI to make better decisions and to extract maximum value from their data warehouses; the value of information, however, is not in the information itself but in how it affects the business (Davenport & Harris, 2007). Selected literature explores the role of information quality within the context of BI and the role of the dimensions of information quality for the purpose of assuring quality of content in the pre-processing stage of data storage (Knight & Burn, 2005). Data Analysis Plan Preview Resources that satisfy the evaluation criteria are analyzed using a qualitative approach known as content analysis (Busch, De Maret, Flynn, Kellum, Le, Meyers, Saunders, & White, 2005; Obenzinger, 2005; Ormondroyd et al., 2009). Content analysis is a widely used research IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 14 tool used to determine the presence of certain words or concepts within selected resources (Busch et al., 2005; Hsieh & Shannon, 2005). The approach begins with identifying research questions, selecting resources, and classifying and coding selected text into manageable categories to enable the researcher to “focus on, and code for, specific words or patterns that are indicative of the research question” (Busch et al., 2005, para. 1). The coding process is divided into eight steps and is detailed in the Research Parameters section of this paper (Busch et al., 2005): 1. Decide the level of analysis. 2. Decide how many concepts will be coded. 3. Decide whether to code for existence or frequency of a concept. 4. Decide on how concepts will be distinguished from one another. 5. Develop rules for coding texts. 6. Decide what to do with irrelevant information. 7. Code the texts. 8. Analyze and report results. Focus during coding is on identification of dimensions that influence the quality of information in the pre-processing stage of data storage. Writing Plan Preview The writing plan for the presentation of the results compiled during the data analysis process is designed to provide a framework of the topic (Obenzinger, 2005). The objective is to address common themes between resources for assuring information quality in terms of relevance, accuracy, timeliness, and completeness that are proven effective in real-world environments (Busch et al., 2005). Presentation of the information aligns with the thematic IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 15 pattern of organization (University of North Carolina, n.d.). This approach, which organizes literature around a topic, emphasizes the development of the most influential dimensions of information quality assurance rather than the chronological development of the materials (University of North Carolina, n.d.). The goal of the writing plan is to present the data derived from the coding process in a way that addresses identifying and prioritizing key dimensions of information quality. Theme one presents a description and discussion of the role of information quality assurance in the pre- processing stage of data storage. Anticipated sub-themes include the role of information quality, information quality for data storage, the impact of the quality of content, and the effects of information quality assurance for BI. Theme two presents an identification of key information quality dimensions for assuring information quality in the pre-processing stage of data storage. Anticipated sub-themes include a discussion of the role of each identified dimension, including which are most common and which are key. IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 16 Definitions Special terms that are unique to the field of information quality, data mining, and BI are used in this study. According to Lamont (2010), decision makers in BI systems often use terms that are obscurely defined and have various meanings to different team members; thus establishing clear definitions reduces miscommunication and costly mistakes. Definitions are provided in this section to ensure that readers clearly understand the contextual meaning of the terminology, as used throughout this study. Key terms are defined in-text at the point at which they are introduced; others are withheld to prevent interruptions in the flow of the document and are defined in this section. The following list of terms provides a helpful collection of definitions interpreted in the context of information and data quality. Accessibility – Accessibility is a dimension of information quality; it is the extent to which information is quickly retrievable (Kahn et al., 2002). Accuracy – Accuracy, for the purposes of this study, is defined as the degree of the correctness of the content of the data (Davenport & Harris, 2007). It is a dimension of information quality also referred to as validity (McGilvray, 2008). Amount of Data – Amount of data, or quantity, is a dimension of information quality that refers to the volume of information appropriate for the task at hand (Kahn et al., 2002). Assessment – Assessment is the comparison of the actual environment and data to requirements and expectations (McGilvray, 2008). Attribute – An attribute is additional information included with a dimension that is not used in defining the levels of the dimension (Arkady, 2007). Believability – Believability is a dimension of information quality; it is the extent to which information is regarded as credible (Kahn et al., 2002). IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 17 Business Intelligence (BI) – Business Intelligence (BI) is a decision support system that utilizes data mining techniques to extract information from data warehouses (Andersson et al., 2008). Business Intelligence Tools– Three types of tools are referred to as BI tools: analytical software (dimensional variations in data), query tools (ask questions about patterns in data), and data mining tools (search for significant patterns in data) (Watson & Wixom, 2007). Business Systems – Business systems combine data gathering, data storage, and knowledge management with analytical tools to present complex internal and competitive information to planners and decision makers (Negash, 2008). Classification Scheme – A classification scheme is the descriptive information for an arrangement or division of objects into groups based on characteristics that the objects have in common (Kriegel et al., 2007). Compatibility – Compatibility, an information quality dimension, refers to the extent to which data are combined with other information (English, 2005). Competitive Advantages/Disadvantages - Competitive advantages and disadvantages are the strengths and weaknesses of a corporation relative to its present and likely competitors by constantly analyzing the existing market for any relevant changes, and adapting to the changes quickly (Lamont, 2010; Negash, 2008). Competitive Intelligence – Competitive intelligence is the act of gaining perspective on developments and events aimed at yielding a competitive advantage (Lamont, 2010). Completeness – Completeness, a dimension of information quality, is the extent to which the expected attributes of data are provided, as it meets the expectations of the user (Caro et al., 2008). Data is of good quality when it is complete: when the user has coverage for all IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 18 needed data, when all related pieces are intact, and when the content is updated to correct any mistakes (Negash, 2008; Olson, 2009). Concise Representation – Concise representation is a dimension of information quality that refers to the extent to which information is compactly represented. Conformity – Conformity, an information quality dimension, is the extent to which data values conform to specified formats (Jafar, 2010). Consistency – Consistency of data means that data across the business should be in sync with each other and is a dimension of information quality (McGilvray, 2008). Controlled Vocabulary – A controlled vocabulary provides a way to organize knowledge for subsequent retrieval (Stvilia et al., 2007). Customer Relationship Management (CRM) – Customer relationship management (CRM) is a term for the methodologies, software, and Internet capabilities that help a business manage customer relationships in an organized fashion (Negash, 2008). Data – Data consist of unconnected facts, numbers, names, codes, symbols, dates, words, and other items of that nature that are out of context, and that only acquire meaning through association; it is what a computer records, stores, and processes (Negash, 2008). Data Analysis – Data analysis is an approach in which data is organized so that useful information can be extracted from it (Jafar, 2010). It embeds predictive analytics into frontline applications to improve decision making (IBM, 2010). Database – A database consists of an organized collection of data for one or more uses (Berkley, Bowers, Jones, Madin, & Schildhauer, 2009). Data Capture – Data capture is the extraction of or access to data (Forcada el al., 2010). IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 19 Data Categories – Data categories are groupings of data with common characteristics or features (McGilvray, 2008). Data Cleansing – Data cleansing is updating data that are imported to the warehouse, such as cleansing or removing errors and inconsistencies (McGilvray, 2008). Data Decay – Data decay refers to a measure of the rate of negative change to the data (McGilvray, 2008). Data Integrity – Data integrity is the process of ensuring consistency throughout all information systems, and providing end-to-end management of all metadata (Gallo, 2010). Data Mapping – Data mapping is the process of determining where the data in a source data store is moved to another target data store (Seng & Chen, 2010). Data Mining – Data mining refers to the technology that allows the user to efficiently retrieve information from the data warehouse (Sen & Sinha, 2007). Data mining technology is used to discover hidden relationships, patterns, and interdependencies, and generate rules to predict the correlations in data warehouse (Su et al., 2009). Data Quality – See information quality. Data Warehouse – A data warehouse is defined as a repository of historical data used to support decision making that allows centralized analysis, security, and control over data (Sen & Sinha, 2007; Web4All, 2010). Decision Support – Decision Support is information that is generated to support decision makers in the decision-making process (Andersson et al., 2008). Decision Support Systems (DSS) – Decision Support Systems (DSS) are systems used to directly support specific decision-making processes (Parameter, 2010). IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 20 De-duplication – De-duplication is a feature of data cleansing tools or processes that identifies multiple records representing the same real-world object (McGilvray, 2008). Dimension – A dimension is one of the perspectives that can be used to analyze the data (Kanal, 2009). Document Management System – A document management system (DMS) is a computer system used to track and store electronic documents or images of paper documents (Parameter, 2010). Duplication – Duplication is an information quality dimension that refers to maintaining a single representation of similar data within the data set (Jafar, 2010). Ease of Use – Ease of use refers to the degree to which data can be accessed and is a dimension of information quality (McGilvray, 2008). Enhancement – Enhancement refers to a feature of data cleansing tools that updates or corrects data or adds new information to existing data (Berkley et al., 2009; McGilvray, 2008). Entity – An entity is a person, place, or thing that is of interest to the business (Negash, 2008). Enterprise Content Management – Enterprise content management refers to the use of appropriate technology and software to collect, manage, store, and retrieve content of any kind, including documents and unstructured information within an organization in order to better achieve the aims and goals of the business (Negash, 2008; Olson, 2003). Environment – The environment refers to the conditions within a company that affect the way employees work and act (Panin, 2006). Enterprise Resource Planning (ERP) – In the case of structured data, analysts use Enterprise Resource Planning (ERP) to create BI information by searching, analyzing, and delivering information to the decision maker (English, 2005). IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 21 Extract, Transform and Load (ETL) - Extract, Transform and Load (ETL) is a common three step process designed for data transformation and integration; it is used in data mining for patterns to extract data from a source system, transform and aggregate them to meet target system requirements, and load them into a target database (Keeton, et al., 2009; McGilvray, 2008). Free of Error – Free of error is a dimension of information quality that refers to the extent to which information is correct and reliable (Kahn et al., 2002). Gigabyte - A gigabyte is a unit of information equal to one billion bytes of memory (Popovic et al., 2009). Geographic Information Systems (GIS) – A geographic information system (GIS) integrates hardware, software, and data for capturing, managing, analyzing, and displaying all forms of geographically referenced information (Negash, 2008). Indexing – Indexing refers to a list of records arranged in order of some attribute (Wang & Wang, 2007). Information – Information is the meaning given to data or the interpretation of data based on its context (English, 2005). Information Quality – Information quality is the degree to which information and data can be a trusted source for any and/or all required users (McGilvray, 2008). While there is no single definition for information quality, researchers agree that it quantifies whether the correct information is being used to make a decision or take an action, and whether that information is good enough for the purpose of making a decision (Keeton et al., 2009; Popovic et al., 2009). IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 22 Information Quality Dimensions – Information quality dimensions are the minimum desired qualities that data should have to be considered effective for data mining techniques (English, 2009). Information Quality Assurance – Information quality assurance is a methodology of assuring that the data retrieved is relative for BI (Stvilia, et al., 2007). Integrity – Integrity, an information quality dimension, is when data generated by BI information systems are protected from deliberate bias or manipulation for political or personal reasons (Kahn et al., 2002). Interpretability – Interpretability is a dimension of information quality that refers to the extent to which information is in appropriate languages, symbols, and units, and the definitions are clear (Kahn et al., 2002). Key Performance Indicators (KPI) – KPIs are a set of quantifiable, long-term goals, which are measurable and key to the success of the company, that determine if a company is reaching its performance and operational goals (Parameter, 2010). Keyword – Keyword is a substantive word in the title of a document or a record in a database that can be used to classify or index content (Arkady, 2007). Knowledge – Knowledge is data that has been organized, synthesized, and made useful; it is what a business uses to make decisions (McGilvray, 2008). Knowledge Discovery in Databases (KDD) – Knowledge Discovery in Databases (KDD) is the process of extracting useful knowledge from volumes of data (English, 2005; Lupu et al., 2007; Web4All, 2010). IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 23 Knowledge Management – Knowledge management is used to address technologies employed for the management and analysis of unstructured information (Halonen & Thomander, 2008). Knowledge Worker – A knowledge worker is one who uses data or information to perform his or her work or to complete job responsibilities (Halonen & Thomander, 2008; McGilvray, 2008). Linking – Linking is a feature of data cleansing tools that matches, or links, associated records through a user-defined or common algorithm (McGilvray, 2008). Management Information Systems – A management information system is a system that provides information needed to manage organization effectively (Gallo, 2010). Master Data – A data category that describes the people, places, and things that are involved in an organization’s business (McGilvray, 2008; Rodriguez et al., 2010). Matching – Matching is a feature of data cleansing tools or processes that matches, or links, associated records, through user-defined or common algorithm (McGilvray, 2008). Measure – A measure refers to an indicator that is an indirect predictor of performance (Kanal, 2009; McGilvray, 2008). Media – Media refers to the various means of communication, such as user guides, Web surveys, hardcopy forms, and database entry interfaces (Lupu et al., 2007; McGilvray, 2008). Megabyte – A megabyte is a unit of information equal to one million bytes of memory (Popovic et al., 2009). Metadata – A data category that describes data in the warehouse that labels, describes, or characterizes other data for ease of use in retrieving, interpreting, or using information; it literally means “data about data” (McGilvray, 2008, p. 294). IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 24 Objectivity - Objectivity, a dimension of information quality, is the extent to which information is unbiased, unprejudiced, and impartial (Kahn et al., 2002). On-line Analytical Processing Tools (OLAP) – On-line analytical processing tools (OLAP) are computer applications designed to search, analyze, and deliver data to assist in the decision-making process of BI (English, 2005). Parsing – Parsing refers to the separation of character strings or free-form text fields into component parts, meaningful patterns, or attributes, which are then moved into clearly labeled and distinct fields (McGilvray, 2008; Popovic et al., 2009). Petabyte – A petabyte is a unit of information equal to one quadrillion bytes of memory (Popovic et al., 2009). Precision – Precision, an information quality dimension, means that data have sufficient detail (McGilvray, 2008). Predictive Analytics – Predictive analytics is a tool used in data mining to predict future probabilities and trends (Kriegel et al., 2007; Davenport & Harris, 2007; Forcada et al., 2010). Process – Process refers to any functions, activities, actions, tasks, or procedures that touch the data or information (English, 2005; Berkley et al., 2009; McGilvray, 2008). Profiling – Profiling is the use of analytical techniques to discover the structure, content, and quality of data (Olson, 2003). Reference Data – A data category that are sets of values or classification schemas referred to by systems, applications, data stores, processes, and reports (McGilvray, 2008). Relevancy – Relevancy is a standard for determining if what is being considered in the project is associated with and meaningful to the business issue to be resolved (McGilvray, 2008). IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 25 Reliability – Reliability, a dimension of information quality, is the extent to which data are measured and collected consistently (McGilvray, 2008). Reputation – Reputation, an information quality dimension, is the extent to which information is highly regarded in terms of its source or content (Kahn et al., 2002). Return on Investment (ROI) – Return on investment (ROI) is a means of showing benefit from investing in data quality (English, 2005; McGilvray, 2008). Root Cause Analysis – Root cause analysis is the study of all possible causes of a problem, issue, or condition to determine its actual cause (Cong et al., 2007; McGilvray, 2008). Sample – Sample refers to a subset of a population or a group under study that is representative of the entire population (Arkady, 2007). Schema – The schema refers to the local organization of data in a database (McGilvray, 2008). Search Engines – Search engines are software programs capable of successfully retrieving information from computer networks or databases in order to match the needs of searchers (English, 2005; Negash, 2008; Zhao et al., 2006). Security – Security is an information quality dimension that refers to the extent to which access to information is restricted appropriately to main its security (Kahn et al., 2002). Serviceability – Serviceability, an information quality dimension, is the extent to which data are consistent and follow a predictable revisions plan (Olson, 2003). Stakeholder – A stakeholder is any individual or group that has a direct interest, or some level of involvement, in the success of an organization and would be affected by the outcome of any decisions (Popovic, 2009). Standardization – Standardization refers to converting data into standard formats to facilitate parsing and, thus, matching, linking, and de-duplication (McGilvray, 2008). IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 26 Strategic Early Warning – Strategic early warning is the process of monitoring the business environment for weak signals and early trends that may reveal potential changes before they become obvious to others (Gallo, 2010; Lupu et al., 2007; Rodriguez et al., 2010). Strategic Group Analysis – Strategic group analysis identifies groups or clusters of businesses that adopt similar strategies and that tend to be affected by, and respond to, competitive actions and external events in similar ways (Gallo, 2010; McGilvray, 2008). Strategic Intelligence – Strategic intelligence is knowledge about an organization’s business environment that has implications for its long-term viability and success, usually extending several years into the future (Gallo, 2010; McGilvray, 2008). Strategic Research – Strategic research is mission-oriented and involves the application of established scientific knowledge and methods to broad social or economic objectives, often extending over a considerable period (Gallo, 2010; McGilvray, 2008). Synthesis – Synthesis is the process of combining data, information, and existing knowledge in order to produce a connected whole, such as a hypothesis, theory, or system (Arkady, 2008). Target Audience – The target audience is a group of people for whom a specific study is directed (Lefebvre, 2007). Terabyte – A terabyte is a unit of information equal to one trillion bytes (Popovic et al., 2009). Timeliness – Timeliness, an information quality dimension, refers to the degree to which data are sufficiently up to date for the task at hand (Kahn et al., 2002; McGilvray, 2008). Transactional Data – A data category that describes an internal or external event or transaction that takes place as an organization conducts its business (McGilvray, 2008). IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 27 Transformation – Transformation is any change to the data, such as during parsing and standardization (Gallo, 2010; Parameter, 2010). Trust – Trust refers to confidence in data quality (McGilvray, 2008; Rodriguez et al., 2010). Unstructured Data – Unstructured data is information that has no defined or standard structure such as would allow for its convenient storage and retrieval (Popovic et al., 2009). Usage – Usage is a technique that inventories the current and/or future uses of the data (McGilvray, 2008). Validity – Validity, a dimension of information quality also referred to as accuracy, refers to the determination that values in the field are or are not within a set of allowed or valid values (McGilvray, 2008). IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 28 Research Parameters Literature reviews are beneficial because they provide a meaningful context of a study within the framework of existing research to broaden the understanding of the problem (Obenzinger, 2005). This section presents the framework and methods in which the literature review is designed and conducted (Geist, 2008). A detailed search strategy is established and outlined as a guide for continued methods to search, locate, and retrieve literature (Obenzinger, 2005). A method is defined by which resources are deemed credible and relevant to the information search (Luckey, 2009). Evaluation criteria are conveyed in terms of credibility and relevance to the topic (Obenzinger, 2005). The initial set of research questions and sub-questions, the search strategy report, the documentation approach, and the full descriptions of the data analysis and writing plans are presented (Creswell, 2009). The documentation approach outlines and details the processes used to record, classify, code, and capture all resources found. Research Design Obenzinger (2005) describes a literature review as a method of “providing meaningful context for a project within the universe of already existing research” (p.1). A methodological review of past literature is a crucial endeavor for any academic research work (Levy & Ellis, 2006). Indeed, it is through the literature review that previous perspectives are synthesized, and new ones are gained (Obenzinger, 2005). This inquiry is structured as a review of literature that evaluates and summarizes the most relevant information on and provides meaningful context of the topic in order to set the basis for an indication toward further research (Obenzinger, 2005). According to Obenzinger (2005), IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 29 providing direction for further research requires the analysis of a large body of selected literature to present a “picture of current knowledge, identifying gaps or holes in the field” (p. 5). The direction for further research is expressed by focusing on factors that most influence information quality assurance in the pre-processing stage of data storage within the context of BI. Research Questions and Sub-questions The design of this study is framed by a series of research questions that guide the development of both content and research process (Creswell, 2009). The questions are each formulated to focus on identification of the dimensions that most influence information quality assurance. The overarching research question is: What dimensions most influence information quality assurance in the pre-processing stage of data storage, in an effort to support data mining, where the goal is to produce competitive advantage information for BI (Andersson et al., 2008; Creswell, 2009; IBM, 2010; Rodriguez et al., 2010)? The guiding questions and sub-questions are listed below. 1. What is the role of information quality assurance in the pre-processing stage of data storage within the context of BI? a. What is the role of information quality? b. What are the benefits of information quality for data storage? c. What is the impact of quality of content? 2. How are key dimensions identified and prioritized to assure information quality in the pre-processing stage of data storage? a. What are the key information quality dimensions for BI? b. How are key dimensions identified and prioritized? IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 30 c. How do key dimensions contribute to assuring information quality in the pre- processing stage for data storage? Search Strategy Report Although the collection of literature is focused on materials pertaining to information quality assurance in the pre-processing stage of data storage, other areas relevant to the topic are addressed in order to provide a basis for understanding within the larger context of BI (Creswell, 2009; Fink, 2010; Leedy & Ormrod, 2010). The process of literature collection focuses on information quality assurance and the role it plays in data mining for BI decision-making processes. Search terms. Exploratory keyword searches are derived from a variety of types of literature, including books published on information quality assurance, data storage, data mining, and BI. The whitepaper "Business Intelligence: From data collection to data mining and analysis” published by Web4All (2010) refers to keywords as IT industry standard terminology. For example, the term “data mining” dates back to the 1950s, with even earlier methods of identifying patterns in data and regression analysis in the 1700s (Web4All, 2010). The predominant search method is to identify co-referential terms and links in books, peer-reviewed journal articles, conference proceedings, and reports that align with the role of information quality for successful data mining in BI (Creswell, 2009; Fink, 2010; Tang, Jin, & Zhang, 2008). Controlled terms and scope notes help focus the keyword searches by referring to other terms, concepts, or links connected to the much broader database search (Tang et al., 2008). Key search terms initially used are: • data mining; IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 31 • business intelligence; • information quality; • information quality assurance; • knowledge discovery; • competitive advantage; • data analytics; and, • data warehouse. Literature resources. This study is designed as a literature review, with the goal of expanding knowledge and adding clarification of the topic (Creswell, 2009; Fink, 2010; Leedy & Ormrod, 2010). The initial search for literature includes the following types of references: peer reviewed academic research, business publications, whitepapers, online academic journals, and books. Search engines. Initial specific sites searched include the UO Libraries, Web of Science, Google Scholar, Sage Journals Online, Academic Search Premier, ERIC, and Google. As the topic focus began to reveal itself, database searches expanded to include CiteSeer Index, ACM Digital Library, IEEE Computer Science Digital Library, JSTOR, and Project Muse. Search strategies. The first search strategy is to utilize the University of Oregon’s (UO) libraries Web site, using a combination of controlled term and keyword searching. Creswell (2009) suggests that a good search method is to follow leads to the specific article, or to the database with a large number of relevant results, and refine and search again until articles relevant to the topic focus are located. The UO libraries Web site provides search access to several relevant academic search indexes, such as Academic Search Premier, JSTOR, Project Muse, and Web of Science. IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 32 According to Berkley et al. (2009), another effective strategy is to perform both keyword and controlled term searching to cover a wider range of possible results and to avoid false hits. This can be done on most search sites via the use of thesauri (Creswell, 2009). Kanal (2009) suggests employing an iterative process as another effective strategy. Thus some searches are conducted using relevant fixed fields; text queries are added in subsequent searches to narrow the results (Berkley et al., 2009). Search Results Appendix A, Detailed Record of Searches, illustrates which search engines and databases are utilized, search terms that are applied, and results that are obtained. Categories of information include: • Search Engine/Database, which lists the resource used for the search; • Search Terms, denoting keywords used that are related to topic, subtopic, and research questions; • Number of Search Results, indicating the number of hits resulting from the search; • Number of Eligible Titles Found, referring to the number of relevant pieces of literature that are eligible for inclusion in this study; and • Comments, stating the rationale for continuing with or abandoning specific search engines, databases, and search terms. Results from searches of the ACM Digital Library, Academic Search Premier Index/EBSCO HOST (UO Libraries), and CiteSeerX Search Index produced very good to excellent results, yielding some of the most relevant, quality literature pertaining to the topic. Searches of Google Scholar Advanced, IEEE Computer Science Digital Library, Project Muse IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 33 (UO Libraries), and Web of Science (UO Libraries) produced adequate to good results, although some required membership to view the full article. Several other search engines, including ERIC, SAGE Journals Online, and Google, were abandoned because search results were consistently non-productive to the topic; it was deemed not worth continuing the effort to use these databases or search engines. Several other search engines and databases were abandoned due to duplication of results or lack of authority for the source. A summary of search results is illustrated in Table 2. Table 2 Summary of Search Engine and Database Results Search Engine/Database Eligible Titles Found ACM Digital Library 69 Academic Search Premier Index/EBSCO HOST 74 CiteSeerx Search Index 79 ERIC 1 Google Scholar Advanced 34 IEEE Computer Science Digital Library 31 Project Muse 43 Sage Journals Online 17 Web of Science 45 Evaluation Criteria The literature selected for this study comes from a variety of resources in order to focus on dimensions that most influence information quality (Creswell, 2009; Obenzinger, 2005). All resources are collected using keyword searches from online search engines and databases. The majority of resources are drawn from CiteSeerx Search Index, which is a scientific literature IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 34 digital library and search engine indexing over 750,000 documents that focuses primarily on the literature in computer and information science (CiteSeerx, n.d.). Results are restricted to articles, papers, conference proceedings and books published after January 1, 2001 in an effort to reference the most current and updated information (Bell & Smith, 2007). Excluding information older than ten years is critical to this literature review in order to focus on the most influential dimensions affecting information quality (English, 2009). Keyword search and date parameter filters are set within each search engine or database. The resulting list of matches is reviewed to determine validity and value to this topic. Abstracts are reviewed to further determine the significance of the match. Matches that meet the criteria are considered relevant and are added to BibMe, an online automatic citation creator that supports APA formatting. If the search does not produce any relevant hits, the keywords are revised and the search is repeated. If results are still not relevant to the topic after multiple search attempts with various controlled terms, the decision is made to abandon the search engine altogether. After relevant resources are identified, credibility is examined to determine the authority of the document (Bell & Smith, 2007). Three steps are then followed to evaluate the credibility of the author, the validity of the research, and the relevance of articles (Bell & Smith, 2007; University of Colorado at Boulder, n.d.). The first step is to evaluate authority and trustworthiness by determining credentials, education, and experience of the author (University of Colorado at Boulder, n.d.). Next, the second step, according to the University of Colorado at Boulder (n.d.), is to determine the validity of the research based on the author’s use of citations and references, and whether or not the literature is classified as peer-reviewed or a refereed publication by Ulrich’s Periodicals IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 35 Directory (Ulrichsweb™, n.d.). Ulrichsweb™ is the authoritative source of bibliographic and publisher information on more than 300,000 periodicals of all types: academic and scholarly journals, Open Access publications, peer-reviewed titles, popular magazines, newspapers, newsletters, and more from around the world (Ulrichsweb™, n.d.). Finally, the third step is to evaluate relevance to the topic by determining how broad or narrow the article is to the topic; is the information applicable or generalizable to the topic (University of Colorado at Boulder, n.d.). Documentation Approach Search results are captured in an electronic database using the software tool, BibMe. This method stores document information, including abstracts and other bibliographic detail, in APA format. Resources are hand-coded and electronically stored using the Zoho® Creator software tool. Zoho® provides the ability to create forms and fields, and to conveniently sort by author, title, date, topic area, or any other named convention assigned to the information. Full-text articles are uploaded directly or scanned and saved in Zoho©, along with research notes and relevant keywords used to find the resource. This system allows all resources to be in one location and provides a quick, efficient tool to search through documents when reviewing the literature. Resources are coded with the following naming convention: Topic_Year_Author_Word or Phrase (01-10 for 10 identified words or phrases) _Page Number (page on which information is found). For example, the article published by Kahn et al. (2002) is coded for information quality and data dimensions as follows: IQ_2002_KAH_02_187, where the phrase dimensions of information quality is coded as 02. The topic field is either assigned IQ, which refers to information quality; DM, which refers to data mining; DS, which refers to data storage; or BI, IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 36 which refers to business intelligence. The field for year refers to the year the reference was published. The first three letters of the author’s last name are added as another method to quickly locate the article. The file naming convention and the research notes allow articles cited to be found quickly when more information is needed. Text for the words or phrases will be coded on the implication level of concepts with similar meaning to distinguish among concepts. Within each resource, level of analysis, relevant categories, existence of concepts, and level of implication are coded as outlined in the data analysis plan. A summary of the documentation coding plan is illustrated in Table 3. Table 3 Summary of Documentations Method and Coding Plan Column Variable Code Description 1-2 Topic BI = Business Intelligence, IQ = Information Quality, DM = Data Mining 3 (space marker) __ 4-7 Year 2001 through 2011 8 (space marker) __ 9-11 Author First three letters of author’s last name 12 (space marker) __ 13-14 Word or phrase (implication level of concepts with similar meaning coded the same) 01 = assurance, 02 = dimensions of information quality, 03 = decision-making, 04 = guidelines for information quality, 05 = framework for data management, 06 = data mining processes, 07 = business intelligence systems, 08 = benchmarks for effectiveness, 09 = information quality recommendations 15 (space marker) __ 16-18 Page Number Page on which specific information is found Data Analysis Plan Creswell (2009) and Obenzinger (2005) describe qualitative data analysis as a form of content analysis that is a non-linear, iterative, progressive process in which the key is coding, sorting, and sifting through resources that satisfy evaluation criteria. The key components of an analysis plan, appropriate for application in a literature review, synthesizes old perspectives with IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 37 new ones by identifying research questions, selecting resources, and coding text into manageable categories (Busch et al., 2005; Ormondroyd et al., 2009; Obenzinger, 2005). The process of coding is basically one of selective reduction; reducing the text to categories consisting of a word, set of words, or phrases, focuses on and codes for, specific words or patterns that are indicative of the research question (Busch et al., 2005). Key concepts addressing information quality assurance in the pre-processing stage of data storage to ensure the quality of content of information before reaching the warehouses are identified, classified, and coded in selected literature found in the Annotated Bibliography. The results are analyzed and synthesized to address identifying and prioritizing key dimensions of information quality for assurance in the pre-processing stage of data storage, with the underlying goal of increasing competitive advantages in the decision-making process across the broader context of BI (Negash, 2008). The data analysis process is conducted in two stages on a single set of literature listed for coding in the Annotated Bibliography, according to the eight coding steps for conceptual analysis (Busch et al., 2005). Stage one is the process of selecting literature from the Annotated Bibliography that is most applicable to the development of this study (Ormondroyd et al., 2009). Stage two is reading and coding the selected literature to identify terms and phrases that are relevant to the purpose and research goals (Obenzinger, 2005). After references are identified, classified, and coded, the results are presented in the Conclusions section (Busch et al., 2005; Ormondroyd, 2009; Obenzinger, 2005). Coding process. An eight-step process for coding text is used to identify the dimensions that most influence information quality assurance in the pre-processing stage of data storage within the context of BI (Busch et al., 2005). IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 38 1. Determine level of analysis – Analysis is conducted using words or phrases. The concept of BI is identified by the following sets of words or phrases: BI, decision makers, decision making, decision support system (DSS), informational advantage, competitive intelligence, competitive intelligence advantages, and competitive advantages. The concept of data or information quality is identified by the following words and sets of words or phrases: data, information, profiling, information systems, accuracy, completeness, consistency, integrity, and relevancy. The concept of data mining is identified by the following sets of words or phrases: data mining, patterns, extracting information, and transformation. 2. Decide how many concepts will be coded – A pre-defined set of three concepts is created, as detailed in step one, Level of Analysis. Concepts include: BI, data or information quality, and data mining. Additional concepts may be added to introduce a level of coding flexibility that permits important new material to be incorporated into the coding process (Busch et al., 2005). 3. Decide whether to code for existence or frequency of a concept – The text is coded for existence. For example, each dimension coded in relation to the concept of information quality assurance is counted once, no matter how many times it appears. 4. Decide on how to distinguish among concepts – Text is coded on the implication level of concepts with similar meaning. For example, information quality and data quality are similar enough to be coded as implying the same thing and thus will not require a separate category. IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 39 5. Develop rules for coding text – The coding process is streamlined and organized to ensure consistent and coherent coding throughout the text. For example, information is referred to as data, and both are coded in the same category. 6. Decide what to do with irrelevant information – Information deemed irrelevant to this study is considered immaterial and will be disregarded without impacting the outcome of the coding. 7. Code the texts – Terms and phrases are coded by hand first, and then entered in Zoho©, a free qualitative research software. Once key terms and phrases are established and entered, the program examines texts for data matching the parameters. 8. Analyze results – After the data is coded, conclusions and generalizations are summarized as specified in the Writing Plan section of this study and reported in the Review of Literature section. Writing Plan The final step in presenting the results of the data analysis process is to reflect on the themes in relation to the needs of the audience and describe the outcome of the study (Busch et al., 2005). This writing plan is designed to present and discuss the results compiled during the data analysis process through the use of a rhetorical pattern, as suggested by Obenzinger (2005). In particular, the writing plan in this study is designed thematically (Busch et al., 2005; Creswell, 2009; Obenzinger, 2005). An overview of current research is presented, as revealed by the conceptual analysis, followed by context based inferences to create meaning (Obenzinger, 2005). The objective is the development of a set of guidelines addressing dimensions with the most influence on information quality assurance in real-world BI environments in relation to two IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 40 overarching themes: (a) the role of information quality assurance in the pre-processing stage of data storage, and (b) key dimensions of information quality for assurance within the context of BI. Relevant dimensions identified during the identification, classification, and coding process will be discussed individually. Two themes have been identified as most important to the study. Theme one examines the critical nature of the role of quality information in the pre-processing stage of data storage within the context of BI. The first theme is informed by broad searches of previous research on the importance of information quality in the pre-processing stage of data storage (Watson & Wixom, 2007). Popovic et al. (2009) propose and test a model of the relationship between BI and information quality, and investigate in more detail the potential differential impact of BI on two dimensions of information quality: the quality of content and the effects of quality assurance. Su et al. (2009) examine a methodology to determine key information quality dimensions, and provide models to examine how the precision, timeliness, and integrity of source data affect information quality in the pre-processing stage of data storage. Negash (2008) discusses a BI framework for the cleanup, search, analysis, and delivery of unstructured data, and explores a matrix of on-line analytical processing (OLAP) tools for use in the case of unstructured or semi-structured data for BI systems. The second theme is informed by previous research from Cong et al. (2007) and McGilvray (2008) among others, and identifies the role of dimensions with the most influence on information quality in the pre-processing stage of data storage. Research is focused on identifying and describing dimensions of information quality that influence quality of content in the early stages of data preparation for storage and management (Negash, 2008; Olson, 2009). The role of dimensions is further analyzed as a way to identify and prioritize key dimensions for IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 41 assuring information quality early on in data storage within the context of BI. The objectives of each key dimension, along with those that should be considered by decision makers to gain competitive advantages, are described. The goal of the writing plan is to organize the presentation of the results of the coding process in such a way as to address identifying and prioritizing key dimensions of information quality for assurance in the pre-processing stage of data storage (English, 2009; McGilvray, 2008). An outline of the thematic presentation format is as follows: 1. Theme one: The role of information quality assurance in the pre-processing stage of data storage within the context of BI. 1.1. Examining the role of information quality. 1.2. Discussing information quality for data storage. 1.3. Discussing the impact of the quality of content. 2. Theme two: Key information quality dimensions for assurance in the pre-processing stage of data storage. 2.1. Examining key information quality dimensions. 2.2. Identifying and prioritizing key dimensions. 2.3. Examining key dimensions for assurance in the pre-processing stage of data storage. IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 42 Annotated Bibliography The references presented in this annotated bibliography are those that are judged to be the most significant to the identification of the dimensions that influence information quality assurance for data storage within the context of BI, specifically addressing problems prior to storage in data warehouses (Obenzinger, 2005; Ormondroyd et al., n.d.). This section, which consists of 24 key references, provides citations selected for use in the Review of Literature, and represents the core data set for coding as part of the larger content analysis (Luckey, 2009). A few additional references are coded (see Appendix B). References represent current knowledge about dimensions that influence information quality assurance in the pre-processing stage of data storage in an effort to support data mining, where the goal is to produce competitive advantages for BI systems (Obenzinger, 2005; Ormondroyd et al., n.d.). The annotated bibliography contains an abstract pulled directly from the reference, along with a content summary, a credibility assessment, and consideration of the relevance to this study (Stacks & Karper, 2008). Caro, A., Calero, C., Caballero, I., & Piattini, M. (2008) A proposal for a set of attributes relevant for web portal data quality, Software Quality Journal, 16(4), 513-542. doi:10.1007/s11219-008-9046-7 Abstract. Data Quality is a critical issue in today's interconnected society. Advances in technology are making the use of the Internet an ever-growing phenomenon with the creation of applications such as Web Portals. These applications are important data resources and means of accessing information which many users employ to make decisions. Quality is a very important factor in any data software. As quality is a broad concept, quality models are typically used to assess the quality of a software product. IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 43 From the software point of view, there is a widely accepted standard proposed by ISO/IEC (the ISO/IEC 9126) for a quality model for data software products. Similar proposals for data quality are non-existent. Although proposals of data quality models exist, none focus specifically on web portal data quality and the user's perspective. In this paper, the authors propose a set of 33 attributes which are relevant for portal data quality. These have been obtained from a literature review and through a validation process carried out by means of a survey. Although these attributes do not conform to a usable model, it might be a good starting point for constructing one. Comments. This article provides a framework for understanding the evolution of data quality. The authors discuss a variety of applications available for quality assurance, and examine the broad topic of quality overall. Caro, Calero, Caballero, and Piattini are professors of computer science, with a combined peer-reviewed publication count of over 350 articles pertaining to data quality software product applications. The article supports content development of the study by focusing on identifying dimensions that most influence information quality. Software Quality Journal is listed as an academic/scholarly refereed journal on Ulrichsweb™, and thus is considered to be a credible resource. It is classified within theme one to support the context of information quality. IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 44 Cong, G., Fan, W., Geerts, F., Jia, X., & Shuai, M. (2007). Improving data quality: Consistency and accuracy. Proceedings of the 33rd International Conference on Very Large Databases (VLDB), Vienna, Austria, 2007, 315-326. Retrieved from http://www.vldb.org/conf/2007/papers/research/p315-cong.pdf Abstract. Two central criteria for data quality are consistency and accuracy. Inconsistencies and errors in a database often emerge as violations of integrity constraints. Given a dirty, or inconsistent, database D, applying automated methods make it consistent, i.e., find a repair D' that satisfies the constraints and minimally differs from D. Equally important is to ensure that the automatically-generated repair D' is accurate, or makes sense, i.e., D' differs from the correct data within predefined boundaries. This paper studies effective methods for improving both data consistency and accuracy. A class of conditional functional dependencies (CFDs) proposed to specify the consistency of the data is examined, which are able to capture inconsistencies and errors beyond what their traditional counterparts can catch. To improve the consistency of the data, two algorithms are proposed: one for automatically computing a repair D' that satisfies a given set of CFDs, and the other for incrementally finding a repair in response to updates to a clean database. Both problems are intractable. The resulting algorithms develop a statistical method that guarantees that the repairs found by the algorithms are accurate above a predefined rate without incurring excessive user interaction. Comments. This article helps to clarify data mining techniques as well as the need for information quality; thus it is critical to the study as it presents a framework for data mining of information for BI. This article is highly technical, but is heavily cited with IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 45 references that inform the need for information quality assurance for data mining in BI. The authors discuss real-world data situations in which inconsistencies, conflicts, and errors affect data quality. The authors are professors of Web data management and are also all software engineers for The Database Group at the University of Edinburgh. The authors have over 50 years of combined experience in data cleansing and information quality. The Proceedings of the VLDB is a scholarly peer reviewed journal listed on Ulrichsweb™. Based on these criteria, this article is deemed a credible resource, and is generalizable to the broad topic of information quality. Davenport, T.H., & Harris, J.G. (2007). The architecture of business intelligence. In Competing on analytics: The new science of winning. (chapter 8). Boston, MA: Harvard Business School Press. Retrieved from http://www.accenture.com/NR/rdonlyres/15DCFF6A- 4DE0-44D8-B778-630BE3A677A2/0/ArchBIAIMS.pdf Abstract. Many companies today are collecting and storing a mind-boggling quantity of data. The numbers are hard to fathom: in just a few years, the common terminology for data volumes has grown past projected amounts. However, while organizations have more data than ever at their disposal, they rarely impose sufficient order on it and thus get limited value from all that information. Further, many IT departments lack the capabilities to do more than support and maintain basic transactional and reporting capabilities. In short, while improvements in technology’s ability to store data have been astonishing, most organizations struggle to manage, analyze, and apply it. Comments. The authors provide an overview of the need for information quality assurance practices, and indicate that with the large volume of increased data, it is crucial to have an information quality management system in place in the beginning. Besides IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 46 authoring 13 books (including the first books written on knowledge management) and hundreds of articles for refereed journals, Davenport was named one of the top 25 consultants in the world by Consulting Magazine in 2003. In 2007 and 2008, he was named one of the 100 most influential people in the IT industry by Ziff-Davis publishers, one of the world’s premier publishers of technology-based digital content products. Harris is a senior executive research fellow; her work is published in numerous refereed journals and is quoted extensively by the Wall Street Journal, Forbes Magazine, CIO Magazine, and many others. This book is considered a credible resource based on the professional and academic achievements of both authors and the extensive use of citations from and references to peer reviewed published works. The content in this book provides a framework for the critical need for information quality in the pre-processing stages of data storage. It is classified within theme two and is generalizable to the topic of BI. English, L. (2009). Information quality applied: Best practices for improving business information, processes and systems. New York, NY: John Wiley & Sons, Inc. Abstract. In this book, the author takes a hands-on approach, showing how to apply the concepts outlined in his first book, Improving Data Warehouse and Business Information Quality, to specific business areas like marketing, sales, finance, and human resources. The book presents real-world scenarios with examples for melding data quality concepts to specific business areas such as supply chain management, product and service development, customer care, and others. Step-by-step instructions, practical techniques, and helpful templates from the author enable the application of best practices for businesses to begin immediate modeling of quality initiatives. IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 47 Comments. The author’s explanation of how to ensure information quality for BI and data mining focuses on maintaining the quality and accuracy of business data by conducting information quality assessments in the pre-processing stage to allow time for correction initiatives and adequate preparation for mining. He offers IT, database, and business managers step-by-step instructions for setting up methodical and effective procedures. Templates are included for businesses to model their own quality initiatives. A companion Web site provides templates, updates to the book, and links to related sites. English has extensive academic experience with information quality systems and management and is an internationally recognized speaker, educator, author, and consultant in knowledge management and information quality improvement. He developed an information quality system that was the basis for Six Sigma, and was awarded the 1998 Individual Achievement Award for his contributions to Information Management. His refereed published works are regularly cited in peer reviewed journals in over 40 countries on six continents. Thus this is deemed to be a valid resource and is considered to be significantly relevant to framing the context of information quality assurance for BI. This article provides a framework for assuring quality information in the pre-processing stage of data mining and is classified within theme one. Fisher, C., Lauria, E., Chengalur-Smith, S., & Wang, R. (2008). Introduction to information quality (4th ed.). Cambridge, MA: MIT Press. Abstract. This book educates readers about the critical issues in data and information quality that have been plaguing information systems for many years. Researchers have only recently begun to address data quality as a discipline in its own right, and a body of data quality literature has just begun to appear. Researchers at Massachusetts Institute of IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 48 Technology (MIT) began a total data quality management program and have hosted ten international conferences on information quality aimed at practitioners, academicians, and researchers. This book is built on two primary sources. After an extensive literature review and study, an importance of data quality knowledge and skills survey was completed by 110 data quality researchers and practitioners, all data quality leaders in their own right, at the International Conference on Information Quality held at MIT. The results of these studies led to a consensus of the most critical skills necessary to begin performing information quality work. An introduction to those critical skills and knowledge areas are the primary topics of this book. The second source is the research into data and information quality of the four authors who collectively have published over 100 articles. Comments. The book discusses the need to address data quality practices in businesses and organizations. The authors are convinced that an organized discipline for data and information quality is required. The contents of this book provide a broad basis for understanding the concepts and philosophy of data and information quality. Tools and techniques are introduced that are essential for a data quality analyst to make improvements. Authority is established based on the credentials, education, and experience of the authors: all hold a Ph.D. in information science; all are regularly published and frequently cited in peer reviewed journals; and all have at least 20 years of experience each in the IT arena working for multinational firms including Microsoft, IBM, and Hewlett Packard. Validity is established by the use of multiple citations and references from refereed publications. The book is relevant to the topic of information quality assurance and is classified within theme one. IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 49 Hakim, L. (2007a). Information quality management: Theory and applications. Hershey, PA: Idea Group Publishing. Abstract. This book provides insights and support for professionals and researchers working in the field of information and knowledge management, information quality, practitioners and managers of manufacturing, and service industries concerned with the management of information. Comments. This book offers tips for information quality assurance, and helps to structure and inform key dimensions influential for information quality that are identified in this study. It suggests ways in which different professionals working in information quality management can manage information effectively. It offers advice and recommendations, and describes best practices beneficial to knowledge management professional. The author’s education and professional experiences in information quality and management span industry, research, and development over various academic institutions. His research is extensively published in refereed journals; he is the author of more than 60 papers published in peer reviewed journals and books. He is considered to be an expert in the field. His use of citations and references to peer reviewed work establishes validity. Thus, this book is a trusted resource based on the author’s extensive professional and academic experience with knowledge management. Relevant text and key information is classified within theme one; results are extrapolated and coded in the dataset. Jafar, M.J. (2010). A tools-based approach to teaching data mining. Journal of Information Technology Education: Innovations in Practice, 9, 2-24. Retrieved from http://jite.org/documents/Vol9/JITEv9IIPp001-024Jafar740.pdf IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 50 Abstract. Data mining is an emerging field of study in Information Systems programs. Although the content has been streamlined, the underlying technology is still in a state of flux. The paper describes how Microsoft Excel's data mining add-ins as a front-end to Microsoft's Cloud Computing and SQL Server 2008 BI platforms as back-ends is used to teach a senior level data mining methods class. The content presented and the hands on experience gained have broader applications in other areas, such as accounting, finance, general business, and marketing. Business students benefit from learning data mining methods and the usage of data mining tools and algorithms to analyze data for the purpose of decision support in their areas of specialization. Newly introduced capabilities to faculty currently teaching a BI course are highlighted. This set of integrated tools allowed focus on teaching the analytical aspects of data mining and the usage of algorithms through practical hands-on demonstrations, homework assignments, and projects. As a result, students gained a conceptual understanding of data mining and the application of data mining algorithms for the purpose of decision support. Without such a set of integrated tools, it would have been prohibitive for faculty to provide comprehensive coverage of the topic with practical hands-on experience. The availability of this set of tools transformed the role of a student from a programmer of data mining algorithms to a BI analyst. Students now understand the algorithms and use tools to perform (1) elementary data analysis, (2) configure and use data mining computing engines to build, test, compare and evaluate various mining models, and (3) use the mining models to analyze data and predict outcomes for the purpose of decision support. If it was not for the underlying technologies that were used, it would have been impossible to cover such material in a one-semester course and provide students with IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 51 much needed hands-on experience in data mining. Finally, utilizing the cloud as a computing platform that transformed the role of a student from "doing low-level IT" in a data mining course to a BI analyst using tools to analyze data for the purpose of decision support is described. Comments. The authors teach students how to analyze data and use data mining tools to predict outcomes for the purpose of decision support. This informs the broader context of BI in the literature review, and is relevant to the study because it focuses on using a set of integrated tools together with the analytical aspects of data mining to benefit BI. The author is a professor of computer information systems and is published extensively in refereed publications. He is considered an expert in the field of data mining by the academic community of higher education. The validity of this article is established because it is listed as an academic, scholarly refereed journal on Ulrichsweb™, and it is heavily cited with prior research in the area of data mining and data analysis. The relevance of the article is to data mining in general, and it is classified within theme one. Keeton, K., Mehra, P., & Wilkes, J. (2009). Do you know your IQ: A research agenda for information quality in systems. ACM Sigmetrics Performance Evaluation Review, 37(3), 1-6. Retrieved from http://www.sigmetrics.org/sigmetrics/workshops/papers_hotmetrics/session1_4.pdf Abstract. Information quality (IQ) is a measure of how fit information is for a purpose. Sometimes called Quality of Information (QoI) by analogy with Quality of Service (QoS), it quantifies whether the correct information is being used to make a decision or take an action. Not understanding when information is of adequate quality can lead to bad decisions and catastrophic effects, including system outages, increased costs, lost IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 52 revenue – and worse. Quantifying information quality can help improve decision making, but the ultimate goal should be to select or construct information producers that have the appropriate balance between information quality and the cost of providing it. In this paper, a brief introduction to the field of data mining is presented, the case for applying information quality metrics in the systems domain is argued, and a research agenda to explore this space is proposed. Comments. The authors indicate the need for determining whether information is good enough to lead to results that will help decision makers inform their process. They note that poor information can lead to bad results, but good information may be costly to acquire. As such, the authors introduce the field of information quality and suggest ways it can be measured and used. Keeton is a Senior Researcher in the Storage and Information Management Platform Lab at HP Labs. Her research focuses on simplifying the management of enterprise information systems. Mehra has over 20 years of large- scale systems and software design experience with HP Labs, and has won numerous awards and honors for articles published in refereed journals. Wilkes was with HP Labs for 26 years, before he left to join Google. He has written three books, and his publications are found in refereed journals. His research in self-managing storage systems paved the way for open cloud-computing. This article is published in a peer reviewed journal listed on Ulrichsweb™; thus it meets evaluation criteria for validity. This is a pivotal article for establishing the context of the decision-making process, and is classified within theme one. IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 53 Kriegel, H. P., Borgwardt, K.M., Kroger, P., Pryakhin, A., Schubert, M., & Zimek, A., (2007). Future trends in data mining. Data Mining and Knowledge Discovery, 15(1) 87-97. doi:10.1007/s10618-007-0067-9 Abstract. Over recent years data mining has been establishing itself as one of the major disciplines in computer science with growing industrial impact. Undoubtedly, research in data mining will continue and even increase over coming decades. In this article, we sketch a vision of the future of data mining. Starting from the classic definition of data mining, topics that will set trends in data mining are discussed. Comments. The authors provide an excellent classic description of data mining in this article. They address data mining approaches to complex objects as well as dynamic real-world systems. Furthermore, they discuss pre-processing as the most important and essential part of data mining. Pre-processing is a critical part of information quality, and the authors conclude that the techniques used in pre-processing can deeply influence the results of the actual data mining analysis. The authors are all part of an international research group that focuses on database and information management systems. The group is ranked by the ACM SIGKDD (Special Interest Group on Knowledge Discovery and Data Mining) among the top-10 in the world, second in Europe, and top-ranked in Germany. Data Mining and Knowledge Discovery is listed as an academic/scholarly refereed journal on Ulrichsweb™, and is heavily cited with prior research studies; thus it meets the evaluation criteria for validity. The context of the article is relevant to the study and is classified within theme one. Lee, Y.W., Pipino, L.L., Funk, J.D., & Wang, R.Y. (2009). Journey to data quality. Cambridge, MA: MIT Press. IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 54 Abstract. All organizations today confront data quality problems. Neither ad hoc approaches nor fixes at the systems level installing the latest software or developing an expensive data warehouse solve the basic problem of bad data quality practices. Journey to Data Quality offers a roadmap that can be used by practitioners, executives, and students for planning and implementing a viable data and information quality management program. This practical guide, based on rigorous research and informed by real-world examples, describes the challenges of data management and provides the principles, strategies, tools, and techniques necessary to meet them. The authors, all leaders in the data quality field for many years, discuss how to make the economic case for data quality and the importance of getting an organization's leaders on board. They outline different approaches for assessing data, both subjectively (by users) and objectively (using sampling and other techniques). They describe real problems and solutions, including efforts to find the root causes of data quality problems at a healthcare organization and data quality initiatives taken by a large teaching hospital. They address setting company policy on data quality and, finally, they consider future challenges on the journey to data quality. Comments. This book is a practical guide that offers strategies for planning and implementing a viable data and information quality management program. According to the ACM Journal of Data and Information Quality, the authors are leaders in the data quality field and have many years of combined experience with different approaches to assess data both subjectively and objectively. Analysis of their research in conducting in- depth analyses of the role of data security in enterprise information quality at Massachusetts Institute of Technology (MIT) is published in numerous books and IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 55 refereed journals. This book is deemed valid to the study based on authority, and myriad citations and peer reviewed references throughout the book. It is relevant to and supports the framework of data quality, and is classified within theme one. Lupu, A.R., Razvan, B., Sabau, G., & Muntean, M. (2007). Influence factors of business intelligence in the context of ERP projects, International Journal of Education and Information Technologies, 2(1), 90-94. Retrieved from http://www.naun.org/journals/educationinformation/eit-15.pdf Abstract. BI projects are very dynamic and during their development may encounter many environmental, technological, and personnel changes. All of these changes determine the need for progressive planning and an iterative development approach. This article presents the development of a real industry BI project in a company that used an ERP system. It focuses on the main factors that influence and affect project development and it analyses the system evolution from technical point of view. The description of this particular experience is useful to all those who are involved in building BI solutions to reveal success factors. Comments. The authors establish the context of BI in an integrated business environment and present a case study involving a real experience of developing a large BI project, along with the analysis of difficulties and problems. Technical solutions are provided along with direction for future research in BI. The real-world examples clarify relationships and further the understanding of BI systems. The authors are professors of data and information systems integration and are internationally recognized leaders in the field, as noted by the International Journal of Education and Information Technologies. They have published numerous articles in refereed journals, and are respected speakers IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 56 on the topic world-wide. This resource is considered credible because it is published in a peer reviewed journal listed on Ulrichsweb™, and the authority of the authors establishes validity. The relevance of the article is generalizable to BI in a broad sense, and is classified within theme two because it helps to establish the framework for the overall context of information quality in BI. McGilvray, D.M. (2008). Executing data quality projects: Ten steps to quality data and trusted information. Burlington, MA: Morgan Kaufmann Publishers. Abstract. In this book the author presents a thorough understanding of significance of information quality in the world today. She describes the impact of information quality on the ability to make effective business decisions, and notes that with flawed, incomplete, or misleading data, information cannot be trusted to further business goals and objectives. Comments. This book provides a systematic approach for improving and creating data and information quality within businesses. It provides a central role in identifying dimensions that influence information quality. It explains a methodology that combines a conceptual framework for understanding information quality with the techniques, tools, and instructions for improving and creating information quality. The author presents a ten-step process for implementing the concepts she describes. McGilvray has extensive professional experience in information quality management and data governance and is recognized as a leader in the field by Fortune 50 organizations. She is an accomplished program manager and facilitator, and is an internationally respected expert on data profiling, metrics, quality, audits, benchmarking, and tool acquisition and implementation. The use of citations and references from peer reviewed journals IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 57 throughout her book establishes the validity of this resource. Relevant text provides a basis for understanding the necessity of information quality assurance in the pre- processing stage. The context of the book is relevant to the topic of information quality and is classified within theme one. Negash, S. (2008). Handbook on decision support systems 1: Business intelligence. In International handbooks on information systems. (chapter 45). Berlin, Germany: Springer-Verlag. doi: 10.1007/978-3-540-48713-5 Abstract. Business intelligence (BI) is a data-driven decision support system (DSS) that combines data gathering, data storage, and knowledge management with analysis to provide input to the decision making process. The term originated in 1989; prior to that many of its characteristics were part of executive information systems. BI emphasizes analysis of large volumes of data about the company and its operations. It includes competitive intelligence (monitoring competitors) as a subset. In computer-based environments, BI uses a large database, typically stored in a data warehouse or data mart, as its source of information and as the basis for sophisticated analysis. Analyses ranges from simple reporting to slice-and-dice, drill down, answering ad hoc queries, real-time analysis, and forecasting. A large number of vendors provide analysis tools. Perhaps the most useful of these is the dashboard. Recent developments in BI include business performance measurement (BPM), business activity monitoring (BAM), and the expansion of BI from being a staff tool to being used by people throughout the organization (BI for the masses). In the long-term, BI techniques and findings will be imbedded into business processes. IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 58 Comments. The author presents a definition of BI, describes its purpose, and provides an architectural framework. The costs and benefits of BI systems are weighed, and competitive analyses are presented. Focus is placed on techniques and applications that support informed actions by decision makers. The author is well respected in the field of BI as an expert, with over 75 published articles in peer reviewed journals and refereed conference proceedings. The author cites peer reviewed references through the article. The information provided in this article is classified within theme two and is generalizable to a broad framework of BI. This resource is considered credible because it is published in a peer reviewed journal listed on Ulrichsweb™, and the authority of the author establishes validity. This article adds to knowledge of BI in general and therefore is deemed relevant to the study. Olson, J.E. (2003). Data quality: The accuracy dimension. San Francisco, CA: Morgan Kaufmann Publishers. Abstract. This book describes techniques for assessing the quality of corporate data and improving its accuracy using the data profiling method. Corporate data is increasingly important as companies continue to find new ways to use it. Likewise, improving the accuracy of data in information systems is becoming a major goal as companies realize how much it affects their bottom line. Data profiling is a new technology that supports and enhances the accuracy of databases throughout major IT shops. The author explains data profiling and shows how it fits into the larger picture of data quality. Comments. This book provides a thorough understanding of data accuracy in real-world environments and provides a framework for data profiling. It describes analytical tools appropriate for assessing data accuracy. The author has over 36 years of experience IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 59 developing commercial software and tools for data management systems. He is an early pioneer of data profiling and has developed concepts for building an understanding of databases at the content, structure, and quality levels. He is considered an expert in the field of database management systems by publishers and the data management arena. The book is heavily cited and references to peer reviewed journals appear throughout it; thus this is deemed to be a valid and credible resource for this study. The content is relevant to data quality and mining and is classified within theme one. Olson, J.E. (2009). Database archiving: How to keep lots of data for a very long time. San Francisco, CA: Morgan Kaufmann Publishers. Abstract. This book is about database archiving for large database applications. The types of organizations that benefit from building a database archiving practice are any that have long-term retention requirements and lots of data. This includes most public companies and those that are private but that work in industries requiring retention of data (such as medical, insurance, or banking fields). It also includes educational and government organizations. Comments. This book represents the author’s view of the current state of thinking on the topic of database archiving. Database archiving is a new and growing field within data management. The author points out that data archived today will take years to grow old enough to expose some of the flaws in current thinking, and that it is critical to establish database archiving practices now. Olson has over 36 years experience developing commercial software and tools for data management systems, and is considered an expert in the field of database management systems. Similar to Data quality: The accuracy dimension, this book is heavily cited and refereed journal references appear throughout it. IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 60 The content is relevant to data quality and mining; thus this is deemed to be a valid and credible resource for this study, and is classified within theme one. Piatetsky-Shapiro, G., Djeraba, C., Getoor, L., Grossman, R., Feldman, R., & Zaki, M. (2009). What are the grand challenges for data mining? SIGKDD Explorations, 8(2), 70-77. doi:10.1145/1233321.1233330 Abstract. The authors create grand challenge problems for data mining, and then propose criteria for solutions. They consider possible grand challenge problems from multimedia mining, link mining, large-scale modeling, text mining, and proteomics. Comments. This article builds a framework for understanding problems facing data mining processes. The authors take a real-world perspective to create potential problems and then consider solutions on a broad scale. Their research spans many different approaches to data mining that address the need for tools and techniques for intelligence data understanding. Piatetsky-Shapiro is considered to be one of the founders of data mining and knowledge discovery fields and has extensive experience developing data analysis models for banks, insurance companies, and pharmaceutical companies. He has served as an expert witness and provided expert opinions in several cases. He has over 60 publications in refereed journals, including two best-selling books and several edited collections on topics related to data mining and knowledge discovery. Djeraba has produced over 150 publications in book chapters, conferences, and peer reviewed journals. Getoor’s research interests are in machine learning, databases and artificial intelligence, with over 150 publications in refereed arenas. Grossman is involved in open source project in data intensive computing. Research accomplishments include developing scaled tree-based classifiers to very large data sets and the introduction of IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 61 infrastructures for deploying statistical data mining models for BI. Feldman and Zaki are researchers specializing in the development of text mining tools and applications; they have over 75 combined papers on the topic published in refereed journals. This resource is considered credible because it is published in a peer reviewed journal listed on Ulrichsweb™. Authority and validity are established. It is deemed relevant to the topic of data mining and is classified within theme one. Popovic, A., Coelho, P.S., & Jaklic, J. (2009). The impact of business intelligence system maturity on information quality. Information Research, 14(4), 1-14. Retrieved from http://informationr.net/ir/14-4/paper417.html Abstract. A model of the relationship between BI systems and information quality is proposed and tested. The potential differential impact of BI systems' maturity on two aspects of information quality, content quality and media quality, is investigated in more detail. The results indicate that the implementation of a BI system positively affects both aspects of information quality as conceptualized in the model. However, the effect of BI systems' maturity is greater on media quality than on content quality. Since most of the information quality problems in knowledge-intensive activities relate to content quality, it is reasonable to expect that the implementation of BI systems would adequately address these problems. However, the effects of implementing such systems seem to be more focused on media quality outcomes. Based on the findings, it is suggested that projects implementing BI systems need to focus more on ensuring content quality. Comments. The authors discuss the implementation of BI systems and whether or not BI adequately addresses all the information quality problems that knowledge workers most often encounter. The focus is on whether the implementation of BI technologies IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 62 and related data management activities contribute to the ability to access information, and whether it focuses adequately on the content aspects of information quality. The authors have published over 60 papers in refereed journals, with main research focuses on Web- based information systems applications, techniques, and tools for decision makers. The article is heavily cited and references peer reviewed journals. Information Research is listed as an academic, scholarly refereed journal on Ulrichsweb™; therefore, this article is deemed credible for use in the study. This pivotal article focuses on the problems with which decision makers are most faced, is relevant to the study, and is classified within theme two. Rodriguez, C., Daniel, F., Casati, F., & Cappiello, C. (2010). Toward uncertain business intelligence: The case of key indicators. IEEE Internet Computing, 14(4), 32-40. http://doi.ieeecomputersociety.org/10.1109/MIC.2010.59 Abstract. Enterprises widely use decision support systems (DSS) and, in particular, BI techniques for monitoring and analyzing operations to understand areas where the business is not performing well. These tools are often unsuitable in scenarios involving Web-enabled, intercompany cooperation and IT outsourcing, however. The authors analyze how these scenarios impact information quality in BI applications and lead to nontrivial research challenges. They describe the idea of uncertain events and key indicators and present a model to express and store uncertainty and a tool to compute and visualize uncertain key indicators. Comments. The authors summarize the factors that are critical to a company’s performance, and how those key indicators can be used to detect problems and trigger business decisions. The specificity of the indicators increases knowledge, which in turn IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 63 leads to ensuring information quality for effective BI. The authors are well-known researchers, particularly for their work with Intelligent Business Operations Management in the Information Services and Process Innovation Lab at HP Labs. Combined, they have over 250 papers published in books, in conference proceedings, and in refereed journals. IEEE Internet Computing is listed as an academic, scholarly refereed journal on Ulrichsweb™ and thus this is considered to be a credible and valid resource for the study. The article’s content and extensive bibliographic information is generalizable to the topic of BI, and is classified within theme two. Sen, A., & Sinha, A.P. (2007). Toward developing data warehousing process standards: An ontology-based review of existing methodologies. IEEE Transactions on Systems, Man and Cybernetics: Part C, Applications and Reviews, 37(1), 17-31. http://dx.doi.org/10.1109/TSMCC.2006.886966 Abstract. A data warehouse is developed using a data warehousing process (DWP) methodology. Currently, there are a large number of methodologies available in the data warehousing market, in part due to the lack of any centralized attempts at creating platform-independent DWP standards. For the development of such standards, it is very important that current practices being followed by the data warehousing industry are first examined. In this study, 30 commercial data warehousing methodologies are reviewed and the standard practices they have adopted with respect to DWP are analyzed. The study provides valuable insights into the prevailing standard practices for different DWP task-system development, requirements analysis, architecture design, data modeling, ETL, data extraction, and end-user application design-and identifies important directions for future research on DWP standardization. IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 64 Comments. In this article, the authors provide a framework for understanding data mining and data warehouses. The authors foresee the need to develop a methodology that standardizes current practices. The authors have over 100 papers between them that are published in refereed journals. They are both well-known in the field and are considered respected researchers in data mining standards. The journal is listed as an academic, scholarly refereed journal on Ulrichsweb™; therefore, it is deemed credible for use in the study and is classified within theme one. Seng, J.L., & Chen, T.C. (2010). An analytic approach to select data mining for business decisions. Expert Systems with Applications, 37(12), 8042-8057. http://dx.doi.org/10.1016/j.eswa.2010.05.083 Abstract. Due to the information technology improvement and the growth of the internet, businesses are able to collect and store huge amounts of data. Using data mining technology to aid the data processing, information retrieval, and knowledge generation process has become one of the critical missions to businesses. Proper use of data mining tools properly is now the primary user concern. Since not every user completely understands the theory of data mining, choosing the best solution from the functions data mining tools provides is not easy. A selection model of data mining algorithms is proposed. By analyzing the content of business decision and applications, user requirements will map to certain data mining category and algorithm. This method makes algorithm selection faster and reasonable to improve the efficiency of applying data mining tools to solve business problems. Comments. The authors present a selection model of data mining designed to save users time and money by analyzing the content of a business decision and presenting a specific IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 65 data mining strategy. They believe that their method improves efficiency in applying data mining tools to solve business problems. This article clarifies the relationship between data mining and information quality, which is central to the study as it defines the framework for data mining strategies. Expert Systems with Applications is listed as an academic, scholarly refereed journal on Ulrichsweb™, establishing validity and authority. This article is relevant to the broader topic of the BI decision-making process and is classified within theme two. Stvilia, B., Gasser, L., Twidale, M.B., & Smith, L.C. (2007). A framework for information quality assessment. Journal of the American Society for Information Science and Technology, 58(12), 1720-1733. doi:10.1002/asi.20652 Abstract. One of the main components in information quality (IQ) assurance is an IQ measurement model design. One cannot manage information quality without first being able to measure it meaningfully and establishing a causal connection between the source of IQ change, the IQ problem types, the types of activities affected, and their implications. A better understanding is needed of the roots of IQ change through the development of a systematic, predictive, reusable IQ assessment framework. The framework should enable effective IQ reasoning through the disambiguation of IQ problem resources, and through the rapid and inexpensive development of context- specific IQ measurement models. A general IQ assessment framework is proposed in contrast to context-specific IQ assessment models, which usually focus on a few variables determined by local needs. The proposed model’s framework consists of comprehensive typologies of IQ problems, related activities, and a taxonomy of IQ dimensions organized in a systematic way based on sound theories and practices. The IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 66 framework can be used as a knowledge resource and as a guide for developing IQ measurement models for many different settings. Comments. Sources of information quality problems are analyzed and solutions are identified with the use of decision-tree models. Types of activities affected by information quality problems are discussed, and direction for future research is presented in the form of case studies. Specific research interests for the authors include information quality, metadata and ontologies, information retrieval, and digital data curation. Together they contributed to the design of the Theory of Information Quality, and are known as experts in the field of information quality. The Journal of the American Society for Information Science and Technology is listed as an academic, scholarly refereed journal on Ulrichsweb™; thus validity is established and the article is deemed credible. The authors present a framework for information quality assessment that contributes to the understanding of the focus of this study. It is central to the framework for describing and identifying information quality measures and therefore is classified within theme one. Su, Y., Peng, J., & Jin, Z. (2009). Modeling information quality risk for data mining in data warehouses. Human & Ecological Risk Assessment, 15(2), 332-350. doi: 10.1109/ICISE.2009.755 Abstract. Information Quality (IQ) is a critical factor for the success of many activities in the information age, including the development of data warehouses and implementation of data mining. The issue of IQ risk is recognized during the process of data mining; however, there is no formal methodological approach to dealing with such issues. Consequently, it is essential to measure the risk of IQ in a data warehouse to IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 67 ensure success in implementing data mining. This article presents a methodology to determine three IQ risk characteristics: accuracy, comprehensiveness, and non- membership. The methodology provides a set of quantitative models to examine how the quality risks of source information affect the quality for information outputs produced. It can be used to determine how quality risks associated with diverse data resources affect the derived data. Comments. The authors discuss their development of quantitative models to confirm information quality risks for data mining in data warehouses. This establishes a connection between information quality and data mining. The connection helps to describe the larger context within which decision making resides. The study also proposes that two important system design factors, control transparency and outcome feedback, will incrementally influence perceived information quality. The quality checks listed in this paper are presented in the form of risk measures to have in place prior to data mining. The analysis process is usable in business data mining environments to determine how information that is mined identifies datasets with acceptable quality. The authors have extensive experience designing models for information quality assurance and have published over 100 papers in refereed journals. Human and Ecological Assessment is listed as an academic, scholarly refereed journal on Ulrichsweb™, establishing validity for this resource. Based on this criteria, the article is deemed credible and is classified within theme one. Watson, H.J., & Wixom, B.H. (2007). The current state of business intelligence. Computer Society, 40(9), 96-99. http://dx.doi.org/10.1109/MC.2007.331 IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 68 Abstract. BI is now widely used to describe analytic applications. BI has become a strategic initiative and is now recognized by CIOs and business leaders as instrumental in driving business effectiveness and innovation. BI is a process that includes two primary activities: getting data in and getting data out. Getting data in, traditionally referred to as data warehousing, involves moving data from a set of source systems into an integrated data warehouse. Getting data in delivers limited value to an enterprise; only when users and applications access the data and use it to make decisions does the organization realize the full value from its data warehouse. Thus, getting data out receives most attention from organizations. This second activity, which is commonly referred to as BI, consists of business users and applications accessing data from the data warehouse to perform enterprise reporting, OLAP, querying, and predictive analytics. Comments. This article describes the BI process, beginning with a description of the role, characteristics, benefits, and suitability of data warehouses. Successes and failures of data warehouses are presented, and data analysis and knowledge discovery are defined. Data mining is described in detail, and a sample of data mining applications is presented. Watson helped develop much of the conceptual foundation for decision support systems (DSS) in the 1970’s and applied his knowledge and expertise to executive information systems in the 1980’s, making him a recognized leader in information management, and one of the world’s leading scholars and authorities on decision support. He is the author of over 25 books and over 100 scholarly refereed journals. Wixom research is also recognized a s a leader in the industry, and has published over 70 papers in peer reviewed journals. Computer Society is listed as an academic, scholarly refereed journal on IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 69 Ulrichsweb™; therefore the article is deemed a credible resource for this study. It provides a framework of the BI process that is central to the framework of this study. Zhao, Y., Chen, Y., & Yao, Y. (2006). User-centered interactive data mining. IEEE International Conference on Cognitive Informatics 2006, 457-466. http://dx.doi.org/10.1109/COGINF.2006.365532 Abstract. While many data mining models concentrate on automation and efficiency, interactive data mining models should focus on adaptive and effective communications between human users and computer systems. The crucial point is not how intelligent users are, or how efficient systems are, but how well these two parts can be connected, adapted, understood, and trusted. Some fundamental issues including processes and forms of interactive data mining, roles, requirements, as well as complexities of interactive data mining systems are discussed in this paper. Comments. This article provides a framework for the efficiency of data mining systems. The authors explore the requirements and forms of different data mining systems, with a focus on the connection between users and systems. Zhao has published over 70 papers in peer reviewed journals and refereed conference proceedings; his research interests are in data analysis and computational engineering. Chen and Yao have published numerous papers in refereed journals; their interests include data mining methodologies and conceptual data analyses. This article is deemed credible because it is published in a peer-reviewed journal. It is classified within theme two to establish a connection between data mining and the decision-making process of BI. IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 70 Review of the Literature The underlying assumption of this study is that establishing effective information quality in the pre-processing stage assures capitalization of advantages and opportunities in the form of increased ROI and competitive advantage gains for BI (Keeton, Mehra, & Wilkes, 2009). Thus the review of the literature begins by examining the impact of information quality assurance. Next, two primary themes are examined: the first theme frames the context of the importance of information quality assurance in the pre-processing stage of data storage; the second theme describes the key dimensions with the most influence on information quality in the pre- processing stage of data storage. Information Quality Assurance Business decisions are based on data regardless of whether that information is poor or high-quality (McGilvray, 2008). However, according to English (2008), Keeton et al. (2009), McGilvray (2008) and others, effective business decisions and actions are made when they are based on high-quality information. The concept of information quality is the degree to which information and data are a trusted source for decision makers to effectively run the business, to serve customers, and to achieve and meet goals and objectives (McGilvray, 2008). Thus assuring information quality for decision makers is essential to successful BI (Davenport & Harris, 2007). Figure 2 depicts the concept of information quality assurance for competitive advantages in BI as data passing through an information quality dimension filter; the resulting information aids in the decision-making process to ensure BI goals and objectives are met (K. Brown, AIM Program instructor, personal communication, November 28, 2010). IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 71 Figure 2. The concept of information quality as a trusted source for decision makers to meet BI goals and objectives (K. Brown, AIM Program instructor, personal communication, November 28, 2010) Lefebvre (2007) contends that successful decision makers are familiar with information quality assurance and data mining techniques in the business environment in order benefit from focusing on the dimensions that most influence information quality assurance. Moreover, according to Kriegel et al. (2007) and Lefebvre (2007), the degree to which BI is successful depends on the objective characteristics of the audience and the focus placed on identifying and prioritizing specific key dimensions that align with goals and objectives. For example, media consumption habits, attitudes, and personal Web site preferences are characteristics of audiences that must be systematically and quantifiably identified and prioritized in order to have a higher degree of BI success (Kriegel et al., 2007; Lefebvre, 2007). Thus, the specific audience for this study is broadly described as business and IT professionals, managers, and non-management specialists who are involved in increasing competitive advantages for BI through informed decision making (Lefebvre, 2007). Although various aspects of quality and information exist, there is a critical need for a methodology that assures a uniquely consistent definition, identification, and prioritization of IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 72 quality of content for individual BI systems (Kahn et al., 2002). Key dimensions identified from such a methodology provide priorities for assessing and improving information quality procedures (Cong et al., 2007; Lupu et al., 2007). Information quality assurance affects the level of success of a business, and thus is the most important aspect of any company (Davenport & Harris, 2007). By developing and improving quality of content, businesses gain an understanding of different policies and practices in information quality assurance followed by organizations across the world (Fisher et al., 2008). BI strategies can be formulated by keeping ahead of the competition through the framework used to develop assurance guidelines (Hakim, 2007a; Negash, 2008). Good data are needed to inform the design of the decision-making process and to monitor and evaluate the quantitative progress toward goals and objectives; poor or unstructured data can mislead decision makers and result in loss of competitive advantages (Jafar, 2010). According to Fisher et al. (2008) and Negash (2008), attention to key information quality dimensions ensures that goals and objectives are informed by valid information and that those BI systems are collecting and organizing data in the same manner. Furthermore, Negash (2008) notes that data are unique for each business; thus, if data are correct and managed, competitive advantages increase. Information quality assurance requires true continuous assessment; as such, successfully planning and implementing information quality assurance is an iterative approach (McGilvray, 2008). Assuring information quality means that data must adequately represent dimensions that are inherent to the BI system goals and objectives (Olson, 2003). The dimensions are ubiquitous and influence information quality regardless of the unique BI system plan (McGilvray, 2008). For example, in the real world, plans are implemented and processes are designed to produce IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 73 quantifiable results (Keeton et al., 2009). The BI system collects and analyzes the results for the decision-making process by identifying and prioritizing the dimensions that fundamentally influence and assure information quality within the context of its goals and objectives (Andersson et al., 2008). Thus information quality is defined as the accuracy with which the BI system represents the real world (McGilvray, 2008; Negash, 2008). Information Quality Awareness The quality of data and validity of results for BI systems rely on assuring information quality in the pre-processing stage of data storage (Lupu et al., 2007). However, the continued growth of data warehouse storage capabilities increases the volume of information available for decision makers, which may not always be of the highest quality; as a result, data mining processes and applications require a framework for assuring quality of content (Zhao et al., 2006). To remedy unstructured or low-quality data, Olsen (2003) calls for information quality awareness as a way to bridge the gap between unstructured and structured data. The significant amount of research in information quality in the last decade is generating greater awareness of the importance of quality of content, particularly in the pre-processing stage of data storage (Popovic et al., 2009; Stvilia et al., 2007). BI technology is changing and expanding, both in the scope of the data it collects and analyzes, and in the range of employees using it (Rodriguez et al., 2010). Today, virtually every software application feeds data into warehouses, permitting focus on the current picture, rather than on something that took place months or years ago (Popovic et al., 2009). McGilvray (2008) and Olson (2003) note two major trends towards an environment in which information quality assurance is commonplace. The first trend is the increasing number of IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 74 legal and regulatory data quality constraints on businesses that requires information quality assurance aligns with stated goals and objectives (Caro et al., 2008; Lee et al., 2009; Olson, 2003). According to Hakim (2007a), there is a direct correlation between the recent regulatory requirements for information quality standards and the increase in the number of assurance processes for BI systems, the results of which are significantly improved competitive advantages. For example, the Sarbanes-Oxley Act of 2002 requires that businesses protect investors by improving the accuracy and reliability of information that is produced, or face large fines and corporate disgrace (Sarbanes & Oxley, 2002). These standards reduce risks of incompatibility, incompetence, and promise conformity for compliance, accuracy, and best practices (Seng & Chen, 2010). Another example of regulatory requirements for assuring information quality is the Capital Requirement Directive, which requires that data and information is accurate, complete, and appropriate for the task at hand (Rodriguez et al., 2010). BI systems deploy policies and procedures to manage and measure risk, as well as to meet standards critical to legal and regulatory compliance (Caro et al., 2008). The second trend is based on the need for businesses to increase competitive advantages to make data available for decision support through BI and data warehousing (McGilvray, 2008). The emergence of data warehouses, the advances in data mining, the increased capabilities of hardware and software, and the growth of the Internet present complex competitive information to decision makers; the overarching goal to improve competitive advantages through quality of content (Lee et al., 2009). The basis for competition has changed from tangible products to intangible information, and that information represents collective knowledge used to produce and deliver products and services to meet goals and objectives (Stvilia et al., 2007). IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 75 BI systems, then, are tools for maximizing competitive advantages by reducing redundancy, increasing efficiency, and ensuring better data integrity by streamlining information assurance processes (Popovic et al., 2009). Increasing competitive advantages provides decision makers with current information to make effective, rapid decisions to maximize profit and decrease overhead (Rodriguez et al., 2010). The Information Quality Challenge Businesses need information that can be trusted to be correct and current to meet goals and objectives (Olson, 2003). Negash (2008) notes that increasing pressure in businesses to justify ROI is met with the challenge of competitive intelligence: it is not the amount of information available to decision makers that ensure competitive advantages as much as it is the ability to differentiate useful data from misinformation. Information quality problems are caused by human, process, and systems issues, and are not restricted to older systems (McGilvray, 2008). For example, normal business activities such as correction activities, duplication of work, and handling returns are indicative of data quality problems (Olson, 2003). BI systems create, update and delete data, and while IT teams are responsible for the quality of the systems that store and move the data, they are not completely responsible for content (McGilvray, 2008). In fact, according to McGilvray (2008), both IT and BI need clearly articulated requirements for the development of quality processes for effective data management. Quality information is the most valuable asset of a firm; thus capitalizing on information quality assurance from BI systems enables decision makers to understand the capabilities available in a company to increase ROI by meeting goals and needs (Negash, 2008; Popovic et IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 76 al., 2009). According to English (2005) and McGilvray (2008), investing in information quality assurance is a means of showing benefits in returns on investment (ROI). However, a business must first identify and prioritize dimensions of information quality that align with corporate needs and goals to reach the level of data accuracy within the critical data warehouses of the corporation, and then, keep them at that level (Olson, 2003). The Role of Information Quality Assurance in the Pre-Processing Stage of Data Storage within the Context of Business Intelligence Information quality assurance in the pre-processing stage of data storage guards against erroneous data or information of marginal quality becoming factors in data mining and analysis procedures (Olson, 2003). According to Watson and Wixom (2007), too much information can be as ineffective as unstructured or poor-quality data. Focusing on key information and ensuring it is of useful data quality is the role of assurance plans for data storage (Lee et al., 2006). By collecting more, businesses end up with less; too many fields to check mean many fields to define and rules to implement (Cong et al., 2007). Assuring quality of content in data storage redesigns the processes of building data warehouse applications and automates the processes of measuring significant and structured information (Caro et al., 2008). Ensuring that necessary data quality guidance is developed and implemented within BI structures for consistency and accuracy is a major role of information quality assurance in data storage, and according to Stvilia et al. (2007) one that indicates the effectiveness of the decision-making process for knowledge workers in BI. A solid, scalable information quality assurance plan for data storage is the essence of effective BI (Popovic et al., 2009). Assuring quality of content for data storage and management IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 77 maintains integrity for BI decision makers by ensuring that inconsistencies and discrepancies are non-existent (Davenport & Harris, 2007). In particular, information quality assurance for data storage ensures that results for the decision-making process are factual, present solutions for achieving or exceeding BI goals and objective, and provide clarity for BI knowledge workers (English, 2009). Information quality. Information quality produces a clear competitive advantage for companies in both the public and private sectors (Lee et al., 2009). According to Lee et al. (2009), Knight and Burn (2005), and Keeton et al., (2009), the role of information quality is to: • maximize objectivity and integrity of information; • adopt a basic standard of quality and implement criteria into information quality practices; and, • ensure compliance with legal and regulatory standards. Information quality ensures objective, unbiased, and consistent data for substantively accurate identification of information sources (Knight & Burn, 2005). According to Davenport and Harris (2007), strategies for information quality policies and programs support business needs, goals, and objective by defining, measuring, analyzing, and improving the quality of data. Assurance for data storage also prioritizes requirements so that resulting systems produce information that better serves the needs of knowledge workers in the decision-making process (Hakim, 2007a). Information quality assurance benefits. Assuring quality of content ensures that results from the data mining process are high-level quality and meet or exceed BI goals and objectives (Jafar, 2010). Thus information quality assurance aids BI knowledge workers in ensuring that quality is effectively managed in the data storage process (Olson, 2003). IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 78 Evaluating the quality of information before using it in the decision-making process ensures integrity for BI (Kahn et al., 2002). Collection of high quality data requires planning in the pre-processing stage of data storage to ensure accurate, consistent, reliable results for the decision-making process (Su et al., 2009). According to Su et al. (2009), poor quality data are caused by human, process, and system issues and it is often difficult to perceive the extent to which these problems affect the business systems. However, poor quality data cost as much or more to produce than meaningful quality data: an under- or over-designed solution for a problem results in a considerable expenditure of time and wasted money for decision makers (English, 2009). Thus the level of the importance of information quality assurance before reaching the warehouse is the degree to which information and data are viewed as trusted sources for achieving company goals (Watson & Wixom, 2007). Sen and Sinha (2007) note that many businesses are ensuring quality within decision- making processes but still struggle with the critical task of assuring information quality for data before it is stored in warehouses. Lee et al. (2009) point out that while IT teams are responsible for the quality of the systems that store and move the data, they are not responsible for the content. Moreover, Piatetsky-Shapiro et al. (2009) state that both IT and BI systems need clearly articulated information quality processes in the pre-processing stage of data storage for successful data mining and management. The impact of quality of content. Information quality assurance impacts business decision and actions by providing data in the form of intangible information (McGilvray, 2008). According to Jafar (2010), the importance of assuring high quality information in the pre- processing stage is often misunderstood, with the implicit assumption that the data mining process correctly represents the business when in fact the quality of the final result are only IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 79 representative of the level of quality in the early stages of data storage. That is, data are mined to discover knowledge about a business, and ultimately afford competitive advantages for BI systems (Panin, 2006; Seng & Chen, 2010). Importantly, results for decision makers are a reflection of the quality of the data captured during the pre-processing stage of data storage (Sen & Sinha, 2007). Data mining tools and procedures, such as decision trees or neural networks, are only effective when information quality assurance procedures are in place in the pre-processing stage (Zhao et al., 2006). According to Lupu et al. (2007), an understanding of the processes that are used to capture, generate, use, and store data are essential to information quality assurance in the pre-processing stage of data storage. The Need to Define and Prioritize Key Information Quality Dimensions for Assuring Quality of Content in the Pre-Processing Stage of Data Storage Information quality is not linear and thus, has many dimensions (English, 2009; McGilvray, 2008). Information quality assurance initiatives combine information from different sources in such a way that new and better uses are made with the resulting information (Olson, 2003). A clear understanding of the dimensions of information quality that most correctly align with BI systems goals and objective provides ways to effectively measure and manage the quality of data and information in the early stages of storage in data warehouses (English, 2009; Fisher et al., 2008; McGilvray, 2008). Defining key information quality dimensions. Information quality is a multi- dimensional concept in which dimensions, or elements used in assessing subjective quality of content, are its measures (Olson, 2003). Once identified, dimensions are prioritized by BI systems by determining suitability for goals and objectives (English, 2005). According to Olson IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 80 (2003), the measurement of information quality effectiveness via the use of dimensions enables BI systems to focus on success from a decision-making perspective. A dimension is a way of classifying and prioritizing BI information and needs (McGilvray, 2008). According to McGilvray, 2008, dimensions are used to define, measure, and manage the quality of data and content for data storage. BI systems measure dimensions of information quality to establish procedures and standards for meeting needs, goals and objectives (Rodriguez et al., 2010). Oversimplifying dimensions or poorly implemented processes do not align with true BI needs and triggers false results for the decision-making process (Su et al., 2009). Thus it is critical that BI systems focus on key dimensions that benefit the information quality assurance process by identifying and prioritizing those in alignment with BI goals and objectives (Cong et al, 2007; English, 2009; Watson & Wixom, 2007) An information quality dimension provides a way to measure and manage the quality of data and information (McGilvray, 2008). According to McGilvray (2008), each dimension requires different tools, techniques, and processes to measure it. Differentiating the dimensions of quality helps match and business needs and goals (Caro et al., 2008; Stvilia et al., 2007). Dimensions that are the most meaningful to goals and objectives should be the focus; however, if a business is unsure where to begin information quality efforts, the dimensions of perception, relevance, and trust provide insight into issues by surveying knowledge workers and obtaining their point of view (Fisher et al., 2008; McGilvray, 2008). Those results articulate the BI problem and enable prioritization of the information quality efforts (Davenport & Harris, 2007). According to McGilvray (2008) and Stvilia et al. (2007), in order to plan the ways to assure information quality, understanding common information quality dimensions is requisite. Businesses begin with a list of common dimensions, such as those listed below, and prioritize IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 81 according to goals and objectives (Hakim, 2007a; McGilvray, 2008; Negash, 2008; Olson, 2003). According to McGilvray (2008), dimensions used to assess information quality are grouped into four categories, as follows: • Intrinsic Information Quality: Accuracy, Objectivity, Believability, Reputation • Contextual Information Quality: Relevancy, Value-Added, Timeliness, Completeness, Amount of Information • Representational Information Quality: Interpretability, Ease of Understanding, Concise Representation, Consistent Representation • Accessibility Information Quality: Accessibility, Access Security Identifying and prioritizing key dimensions. Information quality occurs along dimensions and is defined by the needs of the customer (Cong et al., 2007; McGilvray, 2008). Knowledge workers must understand the dimensions and the dynamic nature of information quality to effectively use identify and prioritize those useful for components of their decision- making processes (Negash, 2008). Understanding the key information quality dimensions is the first step to data quality assurance (Olson, 2003). Segregating data flaws by dimension allows companies to apply improvement techniques using information quality assurance tools to improve both the data and the processes that create and manipulate that information before it reaches the warehouse (English, 2009). Information quality assurance begins with understanding the dimensions and moreover, identifying the key dimensions that align with BI goals (Cong et al., 2007). The dimensions are absolute, but the perception of the dimensions defines information quality (Hakim, 2007a; Keeton et al., 2009). The potential success of BI strategies for improving and ensuring IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 82 successful decision-making processes lies in identifying, defining, and prioritizing information quality dimensions (McGilvray, 2008). Keeton et al. (2009) state that understanding the key information quality dimensions is the first step towards information quality assurance. Keeton et al. (2009) and Olson (2003) note that the ability to segregate unstructured data by dimension or classification allows analysts to apply improvement techniques using information quality tools to improve the quality of the information and the processes that create and manipulate that information. Selecting the dimensions of information quality to be quantified within the context of user, environment, and task is critical to information quality assurance within the context of BI (McGilvray, 2008; Olson, 2003). Dimensions are assigned a value and ranking for analyzing priorities, addressing limitations with the context of the unique BI system, and for realistically determining achievable goals for competitive advantages (Davenport & Harris, 2007; Keeton et al., 2009). Fisher et al. (2008) note that by assigning a dimension value and rank, a business can better manage information quality assurance to ensure that end-user needs are met in the pre- processing stage of data storage. Contextualizing key dimensions for assurance in the pre-processing stage. Information is critical for successful decision-making, and is effective when the quality of content is assured (English, 2005). The concept seems obvious, but the definition of information quality varies significantly depending on the business, the goal, or the objective (Olson, 2009). It is important that the information quality dimensions that best address BI needs and goals be chosen for successful data management; the scope of the effort required for a particular project is better assessed this way (Popovic et al., 2009; Rodriguez et al., 2010). The ultimate objective of assuring quality of content is establishing a data warehouse that contains relevant and accurate IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 83 information of a business environment (Sen & Sinha, 2007). Assuring quality of content in the pre-processing stage of data storage involves, at a minimum, data integrity and accuracy (Pipino et al., 2002). Identifying and prioritizing key dimensions in order to evaluate information quality and assure quality of content is effective when prescribed for each unique business environment dimension (Stvilia et al., 2007). Thus data quality is assured when measured along several dimensions and contextualized by unique BI goals and objectives (Popovic et al., 2009; Rodriguez et al., 2010). IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 84 Conclusions The purpose of this study is to address key dimensions of information quality, as identified in selected literature, necessary for data quality assurance within the context of BI (Hakim, 2007a; Jafar, 2010). The goal is to produce a framework for identifying and prioritizing key dimensions unique to each BI system’s goals and objectives to ensure integrity and consistency of information for assurance in the pre-processing stage of data storage. Kahn et al. (2002) provide a set of general guidelines for structuring a comprehensive information quality assurance framework, which includes the following steps: 1. Develop BI goals and objectives; 2. Identify and prioritize dimensions of information quality that align with goals and objectives; 3. Implement and maintain an assurance plan for all information quality processes and procedures; 4. Review and approval of documentation by appropriate knowledge workers; and, 5. Define and communicate each key dimension for data quality management to stakeholders. This study focuses on Step 2, identifying and prioritizing key dimensions of information quality assurance for data storage and management, for use within the context of each uniquely distinct BI system. Summary of 10 Widely Accepted Key Dimensions for Information Quality Assurance The fundamental key dimensions of information quality are those that most closely align with unique BI goals and objectives (Kahn et al., 2002). In fact, Davenport and Harris (2007) IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 85 and McGilvray (2008) note that established companies build on existing strengths by transforming dimensions of information quality into strategies after identifying those key to goal alignment. Thus identifying key dimensions of information quality and continuously prioritizing them based on current BI goals and objectives significantly contributes to an effective decision- making process and increases competitive advantages for BI (Negash, 2008). Panin (2006) and Piatetsky-Shapiro et al. (2009) note that investing in identifying and prioritizing dimensions of information quality distinguishes effective BI systems from ineffective ones. The value of information quality assurance, then, is not in the level of quality of content; rather, the value is in how it affects the decision-making and competitive advantage processes for BI (Kriegel et al., 2007). Table 4 presents a summary of the 10 most widely accepted key information quality dimensions for consideration at the pre-processing stage, to meet BI needs and goals. There are over 30 widely accepted dimensions of information quality; however, most experts in the field agree that while the process of prioritizing dimensions is unique to a specific BI system set of goals and objectives, those listed in Table 4 are key (Keeton et al., 2009; Knight & Burn, 2005). Table 4 Summary of Key Dimensions and Definitions of Information Quality Dimension Definitions Accessibility The extent to which data is retrieved as needed Accuracy A measure of correctness of the content of the data Completeness The extent to which data is not missing and is of sufficient breadth and depth for the task at hand Free of Error The extent to which data is correct and reliable Interpretability The extent to which data is in appropriate languages, symbols, and units, and the definitions are clear Objectivity The extent to which data is unbiased, unprejudiced, and impartial Relevancy The extent to which data is applicable and helpful for the task at hand Reliability The extent to which data is regarded as true and credible Timeliness The extent to which the data is sufficiently up-to-date for the task at hand Value Added The extent to which data is beneficial and provides advantages from its use IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 86 Two Selected Processes for Aligning Key Dimensions with Business Goals and Objectives Prioritizing the key dimensions, then, creates niches for BI systems based on timeliness and opportunity; being the first to determine and respond to market changes and needs increases competitive advantages (English, 2009). Selected literature indicates that aligning key dimensions with goals and objectives requires forethought by knowledge workers and decision makers (Cong et al., 2007; English, 2005; English, 2009; McGilvray, 2008; Olson, 2003; Stvilia et al., 2007). Awareness of fundamental key dimensions provides a logical structure for identifying and prioritizing the components that contribute to assuring information quality at the pre-processing stage for specific goals (Stvilia et al., 2007). Moreover, according to McGilvray (2008) it provides an understanding of a complex environment in which information quality problems are created, and enables organized thinking for BI systems to plan and create quality data and implement improvements as needed. Regardless of how data are structured, it is important that businesses are consistently clear on what dimensions are, and what dimensions are not, when defining BI needs during the assessment stage of the information quality assurance improvement cycle (Caro et al., 2008; Davenport & Harris, 2007; English, 2009). McGilvray (2008) describes a process for identifying key dimensions first, and then prioritizing those in alignment with specific goals and objectives. Furthermore, McGilvray (2008) states that once key dimensions are in place the process for the continuous assessment, maintenance, and improvement of information is critical for producing assuring information quality. The process consists of a set of concrete instructions for planning and implementing information and data quality improvement projects (McGilvray, 2008). According to McGilvray (2008), each step contains general principles, directions, advice, and examples for assessment, IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 87 awareness, and action. The first step is to define business need and identify and prioritize dimension to focus on what is relevant and critical to meet objectives (McGilvray, 2008). Stvilia et al. (2007) present another process, and propose that key dimensions must align with and be connected to the BI system to assure information quality. Stvilia et al. (2007) claim that incomplete, ambiguous, inaccurate, inconsistent, or redundant data that is not corrected in the pre-processing stage of data storage is a result of not identifying and prioritizing key dimensions. The central part of Stvilia et al.’s (2007) framework is a taxonomy of information quality dimensions. The taxonomy consists of 22 information quality dimensions organized into three categories based on information quality variance: intrinsic information quality (cultural norms and conventions); relational, or contextual, information quality (immediate context or object of information quality assessment); and, reputational information quality (cultural or community related structure). In addition to a taxonomy of information quality dimensions, the framework consists of a set of 41 general metric functions implemented as Java codes used to develop context-specific information quality metrics. The framework serves as a valuable knowledge resource and guide for assuring information quality by establishing connections among information quality dimensions. Moreover, the framework provides a predictive mechanism to identify information quality problems early on (Stvilia et al., 2007). According to Stvilia et al. (2007), the first step is to identify the business goals and objectives. Next, a set of relevant information quality dimensions is selected from the framework that aligns with goals. Finally, the information quality dimensions are aggregated into an index for each information quality dimension for assuring high-level quality in the pre-processing stage of data storage. IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 88 References Andersson, D., Fries, H., & Johansson, P. (2008). Business intelligence: The impact on decision support and decision making processes (Unpublished master’s thesis). Jonkoping University, Norway. Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-1159 Arkady, M. (2007). Data quality assessment. Bradley Beach, NJ: Technics Publications, LLC. Bell, C., & Smith, T. (2007). Critical evaluation of information sources. Retrieved from http://libweb.uoregon.edu/guides/findarticles/credibility.html Berkley, C., Bowers, S., Jones, M.B., Madin, J.S., & Schlidhauer, M. (2009). Improving data discovery for metadata repositories through semantic search. Complex, Intelligent and Software Intensive Systems, 16(19), 1152-1159. http://doi.ieeecomputersociety.org/10.1109/CISIS.2009.122 Busch, C., De Maret, P., Flynn, T., Kellum, R., Le, S., Meyers, B., Saunders, M., & White, R., (2005). Content analysis. Retrieved from http://writing.colostate.edu/guides/research/content Caro, A., Calero, C., Caballero, I., & Piattini, M. (2008) A proposal for a set of attributes relevant for web portal data quality, Software Quality Journal, 16(4), 513-542. doi:10.1007/s11219-008-9046-7 CiteSeer. About CiteSeer. Retrieved from CiteSeer Web site: http://citeseer.ist.psu.edu/about/site;jsessionid=6D156B1258F653C3D4CC178ABC6943 80 Cong, G., Fan, W., Geerts, F., Jia, X., & Shuai, M. (2007). Improving data quality: Consistency and accuracy. Proceedings of the 33rd International Conference on Very Large Databases IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 89 (VLDB), Vienna, Austria, 2007, 315-326. Retrieved from http://www.vldb.org/conf/2007/papers/research/p315-cong.pdf Creswell, J.W. (2009). Research design: Qualitative, quantitative, and mixed methods approaches (3rd ed.). Thousand Oaks, CA: SAGE Publications, Inc. Davenport, T.H., & Harris, J.G. (2007). The architecture of business intelligence. In Competing on analytics: The new science of winning. (chapter 8). Boston, MA: Harvard Business School Press. Retrieved from http://www.accenture.com/NR/rdonlyres/15DCFF6A- 4DE0-44D8-B778-630BE3A677A2/0/ArchBIAIMS.pdf English, L. (2005). Information quality for business intelligence and data mining: Assuring quality for strategic information uses. [White paper]. Retrieved from http://infoimpact.com/articles/IQBI&DataMining.pdf English, L. (2009). Information quality applied: Best practices for improving business information, processes and systems. New York, NY: John Wiley & Sons, Inc. Fayed, U.M., & Uthurusamy, R. (2002). Evolving data mining into solutions for insights. Communications of the ACM, 45(8), 28-21. Retrieved from http://sce.uhcl.edu/boetticher/ML_DataMining/p28-fayyad.pdf Fink, A. (2010). Conducting research literature reviews (3rd ed.). Thousand Oaks, CA: SAGE Publications, Inc. Fisher, C., Lauria, E., Chengalur-Smith, S., & Wang, R. (2008). Introduction to information quality (4th ed.). Cambridge, MA: MIT Press. Forcada, N., Casals, M., Fuertes, A., Gangolells, M., & Roca, X. (2010). A web-based system for sharing and disseminating research results: The underground construction case study. Automation in Construction, 19(4), 458-474. doi:10.1016/i.autocon.209.12.018 IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 90 Gallo, J. (2010, September 21). A business context for agile business intelligence [Web log comment]. Retrieved from http://www.b-eye-network.com/view/14384 Geist, M. (2008). Enhancing home computer user information security: Factors to consider in the design of anti-phishing applications. Retrieved from http://aim.uoregon.edu/research/pdfs/2008-geist.pdf Haag, S., Cummings, M., McCubbrey, D., Pinsonneault, A., & Donovan, R. (2006). Management information systems for the information age (3rd ed.). Whitby, Ontario, Canada: McGraw-Hill Ryerson. Hakim, L. (2007a). Information quality management: Theory and applications. Hershey, PA: Idea Group Publishing. Hakim, L. (2007b). Challenges of managing information quality in service organizations. Hershey, PA: Idea Group Publishing. Halonen, R., & Thomander, H. (2008). Measuring knowledge transfer success by D&M. Sprouts: Working Papers on Information Systems, 8(41). Retrieved from http://sprouts.aisnet.org/8-41 Hsieh, H.F., & Shannon, S.E. (2005). Three approaches to qualitative content analysis. Qualitative Health Research, 15(9), 1277-1288. doi: 10.1177/1049732305276687 IBM (2009). Business intelligence for business users: How IT can make business intelligence easy for everyone. [White paper]. Retrieved from http://public.dhe.ibm.com/software/data/sw- library/cognos/pdfs/whitepapers/wp_c8v4_bi_for_bus_users.pdf IBM. (2010). The new promise of business intelligence. [White paper]. Retrieved from http://www.itbusinessedge.com/offer.aspx?o=00630554BIwp&pc IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 91 =defoffsliverbi Jafar, M.J. (2010). A tools-based approach to teaching data mining. Journal of Information Technology Education: Innovations in Practice, 9, 2-24. Retrieved from http://jite.org/documents/Vol9/JITEv9IIPp001-024Jafar740.pdf Kahn, B.K., Strong, D.M., & Wang, R.Y. (2002). Information quality benchmarks: Product and service performance, Communications of the ACM, 45(4), 184-192. doi: 10.1145/505999.56007 Kanal, L.N. (2009). Problem-solving models and search strategies for pattern recognition. Pattern Analysis and Machine Intelligence, 1(2), 193-201. doi:10.1109/TPAMI.1979.4766905 Keeton, K., Mehra, P., & Wilkes, J. (2009). Do you know your IQ: A research agenda for information quality in systems. ACM Sigmetrics Performance Evaluation Review, 37(3), 1-6. Retrieved from http://www.sigmetrics.org/sigmetrics/workshops/papers_hotmetrics/session1_4.pdf Klein, B.D. (2002). When do users detect information quality problems on the World Wide Web? American Conference in Information Systems, 41(4), 9-18. Retrieved from http://sighci.org/amcis02/RIP/Klein.pdf Knight, S., & Burn, J. (2005). Developing a framework for assessing information quality on the World Wide Web. Informing Science Journal, 8(1), 159-172. Retrieved from http://inform.nu/Articles/Vol8/v8p159-172Knig.pdf Kriegel, H. P., Borgwardt, K.M., Kroger, P., Pryakhin, A., Schubert, M., & Zimek, A., (2007). Future trends in data mining, Data Mining and Knowledge Discovery, 15(1), 87-97. doi:10.1007/s10618-007-0067-9 IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 92 Lamont, J. (2010). Competitive intelligence: Capturing a wider view. KM World, 19(10), 14-15. Retrieved from http://www.kmworld.com/Articles/PrintArticle.aspx?ArticleID=70849 Lee, Y.W., Pipino, L.L., Funk, J.D., & Wang, R.Y. (2009). Journey to Data Quality. Cambridge, MA: MIT Press. Leedy, P.D., & Ormrod, J.E. (2010). Practical research: Planning and design (9th ed.). Upper Saddle River, NJ: Allyn & Bacon. Lefebvre, R. C. (2007). The new technology: The consumer as participant rather than target audience. SMq, 13(3), 31-42. Retrieved from http://www.scribd.com/doc/38464538/SMQ-The-Consumer-as-Participant-2007 Levy, Y., & Ellis, T.J. (2006). Towards a framework of literature review process in support of information systems research. Proceedings of the 2006 Informing Science and IT Education Joint Conference. Retrieved from http://www.informingscience.org/proceedings/InSITE2006/ProcLevy180.pdf Luckey, T.S. (2009). Key stages of disaster recovery planning for time-critical business information technology systems. Retrieved from http://aim.uoregon.edu/research/pdfs/2009-luckey.pdf Lupu, A.R., Razvan, B., Sabau, G., & Muntean, M. (2007). Influence factors of business intelligence in the context of ERP projects, International Journal of Education and Information Technologies, 2(1), 90-94. Retrieved from http://www.naun.org/journals/educationinformation/eit-15.pdf McGarry, K., (2005). A survey of interestingness measures for knowledge discovery. The Knowledge Engineering Review, 1(1), 1-24. doi:10.1017/S0269888905000408 IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 93 McGilvray, D.M. (2008). Executing data quality projects: Ten steps to quality data and trusted information. Burlington, MA: Morgan Kaufmann Publishers. Negash, S. (2008). Handbook on decision support systems 1: Business intelligence. In International handbooks on information systems. (chapter 45). Berlin, Germany: Springer-Verlag. doi: 10.1007/978-3-540-48713-5 Obenzinger, H. (2005). What can a literature review do for me? Retrieved from Stanford University: http//ual.stanford.edu/pdf/uar_literaturereviewhandout.pdf Olson, J.E. (2003). Data quality: The accuracy dimension. San Francisco, CA: Morgan Kaufmann Publishers. Olson, J.E. (2009). Database archiving: How to keep lot of data for a very long time. Burlington, MA: Morgan Kaufmann Publishers. Ormondroyd, J., Engle, M., & Cosgrave, T., (2009). Critically analyzing information sources. Retrieved from Cornell University, Olin & Uris Libraries Web site: http://olinuris.library.cornell.edu/ref/research/skill26.htm Panin, Z. (2006). Business intelligence in support of business strategy. Proceedings of the 7th WSEAS International Conference on Mathematics & Computers in Business & Economics, Croatia, 6, 19-23. Retrieved from http://www.wseas.us/e- library/conferences/2006cavtat/papers/528-109.pdf Parameter, D. (2010). Key performance indicators: Developing, implementing, and using winning KPIs. Hoboken, NJ: John Wiley & Sons, Inc. Piatetsky-Shapiro, G., Djeraba, C., Getoor, L., Grossman, R., Feldman, R., & Zaki, M. (2009). What are the grand challenges for data mining? SIGKDD Explorations, 8(2), 70-77. doi:10.1145/1233321.1233330 IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 94 Pipino, L.L., Lee, Y.W., & Wang, R.Y. (2002). Data quality assessment. Communications of the ACM, 45(4), 211-218. doi: 10.1145/505248.506010 Popovic, A., Coelho, P.S., & Jaklic, J. (2009). The impact of business intelligence system maturity on information quality. Information Research, 14(4), 1-14. Retrieved from http://informationr.net/ir/14-4/paper417.html Power, D.J. (2004). A brief history of decision support systems. DSSRResources.com, 4(1). Retrieved from http://dssresources.com/history/dsshistory.html Redman, T., & Daugherty, M. (2001). Data quality: The field guide. Burlington, MA: Elsevier, Inc. Rodriguez, C., Daniel, F., Casati, F., & Cappiello, C. (2010). Toward uncertain business intelligence: The case of key indicators. IEEE Internet Computing, 14(4), 32-40. http://doi.ieeecomputersociety.org/10.1109/MIC.2010.59 Sarbanes, P., & Oxley, M. (2002). A guide to the Sarbanes-Oxley act. Retrieved from http://www.soxlaw.com/ Sen, A., & Sinha, A.P. (2007). Toward developing data warehousing process standards: An ontology-based review of existing methodologies. IEEE Transactions on Systems, Man and Cybernetics: Part C, Applications and Reviews, 37(1), 17-31. http://dx.doi.org/10.1109/TSMCC.2006.886966 Seng, J.L., & Chen, T.C. (2010). An analytic approach to select data mining for business decisions. Expert Systems with Applications, 37(12), 8042-8057. http://dx.doi.org/10.1016/j.eswa.2010.05.083 Stacks, G., & Karper, E. (2008). Annotated bibliographies. Retrieved from The Owl at Purdue: http://owl.english.purdue.edu/owl/resource/614/01/ IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 95 Stvilia, B., Gasser, L., Twidale, M.B., & Smith, L.C. (2007). A framework for information quality assessment. Journal of the American Society for Information Science and Technology, 58(12), 1720-1733. doi:10.1002/asi.20652 Su, Y., Peng, J., & Jin, Z. (2009). Modeling information quality risk for data mining in data warehouses. Human & Ecological Risk Assessment, 15(2), 332-350. doi: 10.1109/ICISE.2009.755 Tang, J., Jin, R., & Zhang, J. (2008). A topic modeling approach and its integration into the random walk framework for academic search. Proceedings from Eighth IEEE International Conference on ICDM. doi:10.1109/ICDM.2008.71 Thiesse, F., Floerkemeir, C., Harrison, M., Michahelles, F., & Roduner, C. (2010). Technology, standards, and real-world deployments of the EPC network. IEEE Internet Computing, 2(9), 36-43. Retrieved from http://www.im.ethz.ch/publications/tech_standards_realworld_epc.pdf Ulrichsweb™ (n.d.). Ulrich’s Periodicals Directory. Retrieved from Ulrichsweb™ Web site: http://www.ulrichsweb.com.libproxy.uoregon.edu/ulrichsweb/Search/fullCitation.asp?na vPage=1&tab=1&serial_uid=196771&vendor=SFX& University of Colorado at Boulder (n.d.). How do I…? Retrieved from http://ucblibraries.colorado.edu/how/evaluate.htm University of North Carolina (n.d.). Writing center: Literature reviews. Retrieved from University of North Carolina at Chapel Hill, Writing Center Web site: http://www.unc.edu/depts/wcweb/handouts/literature_review.html IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 96 Wang, S., & Wang, H. (2007). Mining data quality in completeness. Proceedings of the 2007 International Conference on Information Quality (MIT IQ Conference Center), 1-6. doi:10.1.1.90.4260 Watson, H.J., & Wixom, B.H. (2007). The current state of business intelligence. Computer, 40(9), 96-99. http://dx.doi.org/10.1109/MC.2007.331 Web4All. (2010). Business intelligence: From data collection to data mining and analysis. Proceedings from the 7th International Cross-Disciplinary Conference on Web Accessibility. Retrieved from http://wps.prenhall.com/wps/media/objects/2519/2580469/addit_chmatl/TURBMC04_01 31854615App.pdf Wixom, B.H., & Watson, H.J. (2001). An empirical investigation of the factors affecting data warehousing success. MIS Quarterly, 25(1), 17-41. Retrieved from http://hinf551edwcase.wikispaces.com/file/view/3250957.pdf Zhao, Y., Chen, Y., & Yao, Y. (2006). User-centered interactive data mining. IEEE International Conference on Cognitive Informatics 2006, 457-466. http://dx.doi.org/10.1109/COGINF.2006.365532 IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 97 Appendix A – Search Record Detailed Record of Searches Search Engine / Database Search Terms Results: # Eligible Titles Found Comments ACM Digital Library Information + quality 117,347 12 This library is an excellent starting resource to search for the focused topic. Data + mining 57,595 14 Business + intelligence 26,091 9 Knowledge + discovery 44,302 8 Data + analytics 2,059 3   Data + warehouse 8,119 3     Competitive + advantage 4,221 6     Information + quality + mining + business + intelligence 3,159 9     Information + quality + assurance 9,084 11   Academic Search Premier Index – EBSCO HOST (UO Libraries) Information + quality 52,404 13 This index is a good resource for a starting point and is worth continued exploration with the focused topic.   Data + mining 12,150 7   Business + intelligence 5,733 8   Knowledge + discovery 4,353 4   Data + analytics 934 3   Data + warehouse 1,661 9   Competitive + advantage 799 4   Information + quality + mining + business + intelligence 1,441 11   Information + quality 1,766 9   IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 98 + assurance Search Engine / Database Search Terms Results: # Eligible Titles Found Comments CiteSeerx Search Index Information + quality 218,812 12 This search engine is a very good resource the topic. Data + mining 42,737 5 Business + intelligence 22,023 11 Knowledge + discovery 35,208 11 Data + analytics 83,255 15 Data + warehouse 8,021 3 Competitive + advantage 8,221 9 Information + quality + data mining + business + intelligence 91,337 7 Information + quality + assurance 247,558 15 ERIC Information + quality 3,021 1 This is not a very helpful resource for the focused topic. Data + mining 124 0 Business + intelligence 89 0 Knowledge + discovery 186 0 Data + analytics 13 0 Data + warehouse 33 0 Competitive + advantage 19 1 Information + quality + mining + business + intelligence 0 0 Information + quality + assurance 134 0 Google Scholar Advanced Information + quality 473,000 6 Possibly a good resource; worth continuing effort with this search engine especially with more defined parameters. Data + mining 567,000 5 Business + 312,000 7 IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 99 intelligence Search Engine / Database Search Terms Results: # Eligible Titles Found Comments Knowledge + discovery 689,000 9 Knowledge + discovery 142 2 Data + analytics 77 1 Data + warehouse 45 1 Competitive + advantage 291 2 Information + quality + mining + business + intelligence 105 1 Information + quality + assurance 180 2 IEEE Computer Science Digital Library Information + quality 106 6 This is a good resource for academic articles   Data + mining 202 3 Business + intelligence 68 1 Knowledge + discovery 142 5 Data + analytics 77 1 Data + warehouse 45 1 Competitive + advantage 187 3 Information + quality + mining + business + intelligence 105 2 Information + quality + assurance 180 3 Information + quality + assurance 33,886 9 Project Muse (UO Libraries) Information + quality 27,082 8 This is a good resource for the topic. Worth further exploration, especially with refined parameters. Data + mining 1,612 6 Business + intelligence 4,891 7 Knowledge discovery 12,308 7 IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 100 Search Engine / Database Search Terms Results: # Eligible Titles Found Comments Data + analytics 178 3 Data + warehouse 287 3 Competitive + advantage 344 1 Information + quality + mining + business + intelligence 175 5 Information + quality + assurance 1,437 4 Sage Journals Online Information + quality 147 4 This search engine is not a productive website for articles related to this topic. Data + mining + techniques 1,934 2 Business + intelligence 4,472 1 Knowledge + discovery 9,079 2 Data + analytics 385 3 Data + warehouse 719 3 Competitive + advantage 211 1 Information + quality + mining + business + intelligence 17 2 Information + quality + assurance 3,467 0 Web of Science (UO Libraries) Information + quality 72,744 7 This index is a good resource for academic articles. Data + mining 16,470 11 Business + intelligence 903 9 Knowledge + discovery 8,644 3 Data + analytics 303 6 Data + warehouse 1,255 5 Competitive + advantage 988 2 Information + quality + assurance 2,613 4 IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 101 Appendix B – References Selected for Coding Andersson, D., Fries, H., & Johansson, P. (2008). Business intelligence: The impact on decision support and decision making processes (Unpublished master’s thesis). Jonkoping University, Norway. Retrieved from http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-1159 Caro, A., Calero, C., Caballero, I., & Piattini, M. (2008) A proposal for a set of attributes relevant for web portal data quality, Software Quality Journal, 16(4), 513-542. doi:10.1007/s11219-008-9046-7 Cong, G., Fan, W., Geerts, F., Jia, X., & Shuai, M. (2007). Improving data quality: Consistency and accuracy. Proceedings of the 33rd International Conference on Very Large Databases (VLDB), Vienna, Austria, 2007, 315-326. Retrieved from http://www.vldb.org/conf/2007/papers/research/p315-cong.pdf Davenport, T.H., & Harris, J.G. (2007). The architecture of business intelligence. In Competing on analytics: The new science of winning. (chapter 8). Boston, MA: Harvard Business School Press. Retrieved from http://www.accenture.com/NR/rdonlyres/15DCFF6A- 4DE0-44D8-B778-630BE3A677A2/0/ArchBIAIMS.pdf English, L. (2005). Information quality for business intelligence and data mining: Assuring quality for strategic information uses. [White paper]. Retrieved from http://infoimpact.com/articles/IQBI&DataMining.pdf English, L. (2009). Information quality applied: Best practices for improving business information, processes and systems. New York, NY: John Wiley & Sons, Inc. Fisher, C., Lauria, E., Chengalur-Smith, S., & Wang, R. (2008). Introduction to information quality (4th ed.). Cambridge, MA: MIT Press. IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 102 Hakim, L. (2007a). Information quality management: Theory and applications. Hershey, PA: Idea Group Publishing. Jafar, M.J. (2010). A tools-based approach to teaching data mining. Journal of Information Technology Education: Innovations in Practice, 9, 2-24. Retrieved from http://jite.org/documents/Vol9/JITEv9IIPp001-024Jafar740.pdf Kahn, B.K., Strong, D.M., & Wang, R.Y. (2002). Information quality benchmarks: Product and service performance, Communications of the ACM, 45(4), 184-192. doi: 10.1145/505999.56007 Keeton, K., Mehra, P., & Wilkes, J. (2009). Do you know your IQ: A research agenda for information quality in systems. ACM Sigmetrics Performance Evaluation Review, 37(3), 1-6. Retrieved from http://www.sigmetrics.org/sigmetrics/workshops/papers_hotmetrics/session1_4.pdf Klein, B.D. (2002). When do users detect information quality problems on the World Wide Web? American Conference in Information Systems, 41(4), 9-18. Retrieved from http://sighci.org/amcis02/RIP/Klein.pdf Knight, S., & Burn, J. (2005). Developing a framework for assessing information quality on the World Wide Web. Informing Science Journal, 8(1), 159-172. Retrieved from http://inform.nu/Articles/Vol8/v8p159-172Knig.pdf Kriegel, H. P., Borgwardt, K.M., Kroger, P., Pryakhin, A., Schubert, M., & Zimek, A., (2007). Future trends in data mining, Data Mining and Knowledge Discovery, 15(1), 87-97. doi:10.1007/s10618-007-0067-9 Lee, Y.W., Pipino, L.L., Funk, J.D., & Wang, R.Y. (2009). Journey to Data Quality. Cambridge, MA: MIT Press. IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 103 Lefebvre, R. C. (2007). The new technology: The consumer as participant rather than target audience. SMq, 13(3), 31-42. Retrieved from http://www.scribd.com/doc/38464538/SMQ-The-Consumer-as-Participant-2007 Lupu, A.R., Razvan, B., Sabau, G., & Muntean, M. (2007). Influence factors of business intelligence in the context of ERP projects, International Journal of Education and Information Technologies, 2(1), 90-94. Retrieved from http://www.naun.org/journals/educationinformation/eit-15.pdf McGilvray, D.M. (2008). Executing data quality projects: Ten steps to quality data and trusted information. Burlington, MA: Morgan Kaufmann Publishers. Negash, S. (2008). Handbook on decision support systems 1: Business intelligence. In International handbooks on information systems. (chapter 45). Berlin, Germany: Springer-Verlag. doi: 10.1007/978-3-540-48713-5 Olson, J.E. (2003). Data quality: The accuracy dimension. San Francisco, CA: Morgan Kaufmann Publishers. Olson, J.E. (2009). Database archiving: How to keep lot of data for a very long time. Burlington, MA: Morgan Kaufmann Publishers. Panin, Z. (2006). Business intelligence in support of business strategy. Proceedings of the 7th WSEAS International Conference on Mathematics & Computers in Business & Economics, Croatia, 6, 19-23. Retrieved from http://www.wseas.us/e- library/conferences/2006cavtat/papers/528-109.pdf Piatetsky-Shapiro, G., Djeraba, C., Getoor, L., Grossman, R., Feldman, R., & Zaki, M. (2009). What are the grand challenges for data mining? SIGKDD Explorations, 8(2), 70-77. doi:10.1145/1233321.1233330 IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 104 Pipino, L.L., Lee, Y.W., & Wang, R.Y. (2002). Data quality assessment. Communications of the ACM, 45(4), 211-218. doi: 10.1145/505248.506010 Popovic, A., Coelho, P.S., & Jaklic, J. (2009). The impact of business intelligence system maturity on information quality. Information Research, 14(4), 1-14. Retrieved from http://informationr.net/ir/14-4/paper417.html Rodriguez, C., Daniel, F., Casati, F., & Cappiello, C. (2010). Toward uncertain business intelligence: The case of key indicators. IEEE Internet Computing, 14(4), 32-40. http://doi.ieeecomputersociety.org/10.1109/MIC.2010.59 Sen, A., & Sinha, A.P. (2007). Toward developing data warehousing process standards: An ontology-based review of existing methodologies. IEEE Transactions on Systems, Man and Cybernetics: Part C, Applications and Reviews, 37(1), 17-31. http://dx.doi.org/10.1109/TSMCC.2006.886966 Seng, J.L., & Chen, T.C. (2010). An analytic approach to select data mining for business decisions. Expert Systems with Applications, 37(12), 8042-8057. http://dx.doi.org/10.1016/j.eswa.2010.05.083 Stvilia, B., Gasser, L., Twidale, M.B., & Smith, L.C. (2007). A framework for information quality assessment. Journal of the American Society for Information Science and Technology, 58(12), 1720-1733. doi:10.1002/asi.20652 Su, Y., Peng, J., & Jin, Z. (2009). Modeling information quality risk for data mining in data warehouses. Human & Ecological Risk Assessment, 15(2), 332-350. doi: 10.1109/ICISE.2009.755 Watson, H.J., & Wixom, B.H. (2007). The current state of business intelligence. Computer, 40(9), 96-99. http://dx.doi.org/10.1109/MC.2007.331 IDENTIFYING AND PRIORITIZING INFORMATION QUALITY 105 Zhao, Y., Chen, Y., & Yao, Y. (2006). User-centered interactive data mining. IEEE International Conference on Cognitive Informatics 2006, 457-466. http://dx.doi.org/10.1109/COGINF.2006.365532