Data Mining & Data Warehousing

Data Mining, Privacy, And Data Security

Introduction: With more and more information accessible in electronic forms and available on the Web, and with increasingly powerful data mining tools being developed and put into  use, there are increasing concerns that data mining may pose a threat to our privacy and data security. However, it is important to note that most of the major data mining applications do not even touch personal data. Prominent examples include applications involving natural resources, the prediction of floods and droughts, meteorology, astronomy, geography, geology, biology, and other scientific and engineering data.

Furthermore, most studies in data mining focus on the development of scalable algorithms and also do not involve personal data. The focus of data mining technology is on the discovery of general patterns, not on specific information regarding individuals. In this sense, we believe that the real privacy concerns are with unconstrained access of individual records, like credit card and banking applications, for example, which must access privacy-sensitive information. For those data mining applications that do involve personal data, in many cases, simple methods such as removing sensitive IDs from data may protect the privacy of most individuals. Numerous data security–enhancing techniques have been developed recently. In addition, there has been a great deal of recent effort on developing privacy-preserving data mining methods. In this section, we look at some of the advances in protecting privacy and data security in data mining.

In 1980, the Organization for Economic Co-operation and Development (OECD) established a set of international guidelines, referred to as fair information practices. These guidelines aim to protect privacy and data accuracy. They cover aspects relating to data collection, use, openness, security, quality, and accountability. They include the following principles:

Purpose specification and use limitation: The purposes for which personal data are collected should be specified at the time of collection, and the data collected should not exceed the stated purpose. Data mining is typically a secondary purpose of the data collection. It has been argued that attaching a disclaimer that the data may also be used for mining is generally not accepted as sufficient disclosure of intent. Due to the exploratory nature of data mining, it is impossible to know what patterns may be discovered; therefore, there is no certainty over how they may be used.

Openness: There should be a general policy of openness about developments, practices, and policies with respect to personal data. Individuals have the right to know the nature of the data collected about them, the identity of the data controller (responsible for ensuring the principles), and how the data are being used.

Security Safeguards: Personal data should be protected by reasonable security safeguards against such risks as loss or unauthorized access, destruction, use, modification, or disclosure of data.

Individual Participation: An individual should have the right to learn whether the data controller has data relating to him or her, and if so, what that data is. The individual may also challenge such data. If the challenge is successful, the individual has the right to have the data erased, corrected, or completed. Typically, inaccurate data are only detected when an individual experiences some repercussion from it, such as the denial of credit or with holding of a payment. The organization involved usually cannot detect such inaccuracies because they lack the contextual knowledge necessary.