Data Mining

DATA MINING and DATA SCIENCE FRAMEWORK

The art of data mining involves predicting future events by locating patterns, connections, and irregularities in large data sets. This valuable insight can be used to achieve a variety of business goals, including increased profitability, decreased expenses, increased customer engagement, risk mitigation, and other objectives.

FRAMEWORK

KDD

KDD(Knowledge Discovery in Databases)

The KDD process, as presented in is the process of using DM methods to extract what is deemed knowledge according to the specification of measures and thresholds, using a database along with any required pre-processing, sub sampling, and transformation of the database.

It helps businesses identify customer segments, detect market trends, and personalize marketing strategies. Financial institutions benefit from fraud detection and investment decision-making. In manufacturing, it optimizes production and supply chain management. Social media platforms use KDD for user behavior analysis and content personalization. In research, it uncovers patterns in scientific data.

SEMMA

SEMMA is a list of sequential steps developed by SAS Institute. The acronym SEMMA stands for Sample, Explore, Modify, Model, Assess, and refers to the process of conducting a data mining project.

SEMMA offers an easy to understand process, allowing an organized and adequate development and maintenance of DM projects. It thus confers a structure for his conception, creation and evolution, helping to present solutions to business problems as well as to find de DM business goals.

SEMMA
CRISP-DM

                   CRISP-DM

The CRoss Industry Standard Process for Data Mining (CRISP-DM) is a process model that serves as the base for a data science process.

CRISP-DM  that comprises of six stages:

-Business understanding 

-Data understanding

-Data preparation

-Evaluation 

-Deployment

CRISP-DM is extremely complete and documented. All his stages are duly organized, structured and defined, allowing that a project could be easily understood or revised.

OSEMN Approach

OSEMN stands for Obtain (collecting relevant data), Scrub (cleaning and preprocessing), Explore (analyzing and visualizing), Model (building predictive models), and Interpret (presenting insights). This approach helps extract actionable insights from large volumes of data quickly and make informed decisions.

This approach enables organizations to make data-driven decisions, identify patterns and trends, detect anomalies, optimize processes, and improve overall operational efficiency. By following the OSEMN framework, businesses can harness the power of real-time data to drive innovation, improve customer experiences, and achieve their strategic objectives.

OSEMN
TDSP

  TEAM DATA SCIENCE PROCESS

The Team Data Science Process (TDSP) is a method for developing predictive analytics solutions and intelligent applications in a cost-effective and timely manner. It emphasizes teamwork, collaboration, and continuous learning by providing guidance for different team roles to work together effectively. TDSP incorporates the best practices from Microsoft and other industry leaders to ensure successful implementation of data science initiatives. Its goal is to help organizations maximize the benefits of their analytics program.

Key components of the TDSP

  • A data science lifecycle definition
  • A standardized project structure.
  • Infrastructure and resources recommended for data science projects.
  • Tools and utilities recommended for project execution.

CONTACT US