Skip to content
Data 2 Decision

Data 2 Decision

With Machine Learning

  • Home
  • Data Integration
  • Data Analytics
    • Business Intelligence
    • Data Analysis
    • Data Visualization
    • Geospatial Analysis
  • Machine Learning
    • Data Pre-processing
    • Big Data Pre-processing
    • Feature Engineering
  • Projects

Introduction to Apache Spark

January 12, 2022January 13, 2022
Sammy Ongaya
Big Data Pre-processing

When it comes to big data storage and processing the tools used are Hadoop and Apache Spark. Hadoop is a distributed storage and processing engine that utilizes MapReduce. One of the limitations of Hadoop is speed of executing big data

Read more

Introduction to Big Data

January 11, 2022January 12, 2022
Sammy Ongaya
Big Data Pre-processing

In today’s world there is more data being generated than ever before. This is due to advancement in technology that has enabled faster processing and transmission of data. Big data is simply data that’s too big to fit in traditional

Read more

Data Augmentation

January 11, 2022May 15, 2022
Sammy Ongaya
Data Pre-processing

Data augmentation is a technique of generating extra data with the purpose of improving the performance of machine learning model. Most machine learning algorithms especially neural networks performs well with large and varied sets of data, sometimes the challenge lies

Read more

Data Leakage

January 11, 2022March 8, 2022
Sammy Ongaya
Data Pre-processing

More often Data Scientist and Machine learning engineers end up developing models that suffer from data leakage without easily noticing. The model performs perfectly well with high performance on validation set but fail while deployed to production. Data leakage is

Read more

Data Sampling

January 9, 2022January 11, 2022
Sammy Ongaya
Data Pre-processing

In this era of big data we often end up with large data that we need to analyse and model. This might take too much time to process, analyse and model. To avoid this we need to select a small

Read more

Outliers in Data

January 7, 2022May 15, 2022
Sammy Ongaya
Data Pre-processing

Outliers impact the quality of data. Outliers are data points that are far away from rest of majority data points. This might be as a results of measurement error or variability in population distribution. Some statistical measures such as mean

Read more

Handling Imbalanced Data

January 7, 2022March 7, 2022
Sammy Ongaya
Data Pre-processing

In machine learning classification problem imbalanced dataset affects the performance of the model. Imbalance data is when the classes in the target variable have unequal distributions in the dataset. If the target variable has two classes A and B, if

Read more

Handling Missing Data

January 5, 2022January 7, 2022
Sammy Ongaya
Data Pre-processing

Missing data is a common data quality issue that every data practitioner has to deal with. Missing data affects the accuracy of analysis and the performance of the model. Most machine learning algorithms are sensitive to missing data. In the

Read more

Introduction to Data Reduction

January 4, 2022March 7, 2022
Sammy Ongaya
Data Pre-processing

Data reduction is the process of reducing the amount of data. Data reduction can be in terms of the volume or number of features of the data. Reduction in volume can results in storage efficiency. In machine learning modelling we

Read more

Introduction to Data Transformation

January 4, 2022January 4, 2022
Sammy Ongaya
Data Pre-processing

Data Transformation is the process of converting data from one format to another format which is useful to business users for decision making. In Data management data transformation is a key component of data integration and pre-processing. Data transformation involves

Read more

Posts navigation

Older posts
Newer posts

Categories

  • Big Data Pre-processing
  • Business Intelligence
  • Data Analysis
  • Data Integration
  • Data Pre-processing
  • Data Visualization
  • Feature Engineering
  • Geospatial Analysis
  • Machine Learning
  • Projects
  • Uncategorized
Data 2 Decision
© 2023
Powered by WordPress
Theme: Masonic by ThemeGrill