The term Knowledge Discovery in Databases, or KDD for short, refers to the broad process of finding knowledge in data, and emphasizes the “high-level” application of particular data mining methods.
The unifying goal of the KDD process is to extract knowledge from data in the context of large databases.
It does this by using data mining methods (algorithms) to extract (identify) what is deemed knowledge, according to the specifications of measures and thresholds, using a database along with any required preprocessing, subsampling, and transformations of that database.
Some people don’t differentiate data mining from knowledge discovery while others view data mining as an essential step in the process of knowledge discovery. Here is the list of steps involved in the knowledge discovery process −
- Data Cleaning− In this step, the noise and inconsistent data is removed.
- Data Integration− In this step, multiple data sources are combined.
- Data Selection− In this step, data relevant to the analysis task are retrieved from the database.
- Data Transformation− In this step, data is transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations.
- Data Mining− In this step, intelligent methods are applied in order to extract data patterns.
- Pattern Evaluation− In this step, data patterns are evaluated.
- Knowledge Presentation− In this step, knowledge is represented.
The following diagram shows the process of knowledge discovery −