Data augmentation is a technique of generating extra data with the purpose of improving the performance of machine learning model. Most machine learning algorithms especially neural networks performs well with large and varied sets of data, sometimes the challenge lies in collecting quality data. The benefits of Data augmentation is two-fold as it reduces the cost of acquiring new datasets and improves the performance of machine learning model due to variability and big enough data which reduces chances of model overfitting. . Data augmentation has been widely applied in the field of computer vision, natural language process and speech recognition. In this post we will briefly look at different data augmentation techniques as applied to computer vision.


Benefits of Data Augmentation

  1. Improves model performance. Some machine learning models works better with large data
  2. Reduces the chances of model overfitting. This is a results of large enough data and variability
  3. Reduces data imbalance in classification data
  4. Reduces the cost of acquiring new data

Data Augmentation Algorithms

  1. Neural style transfer
  2. Reinforcement learning
  3. Generative Adversarial Networks (GANs)

Data Augmentation Techniques in Images

  1. Cropping
  2. Grayscaling
  3. Colour modification
  4. Translation
  5. Flipping
  6. … and many other techniques


Data augmentation is an important task in machine learning process. It helps in improving the performance of machine learning model by reducing overfitting. In this post we have looked at what’s data augmentation, its benefits and various algorithms for data augmentation. In the next post we will look at Introduction to big data and data pre-processing with Spark. To learn about data leakage check our post here.

Data Augmentation

Post navigation

0 0 votes
Article Rating
Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x