Skip to content
Data 2 Decision

Data 2 Decision

With Machine Learning

  • Home
  • Data Integration
  • Data Analytics
    • Business Intelligence
    • Data Analysis
    • Data Visualization
    • Geospatial Analysis
  • Machine Learning
    • Data Pre-processing
    • Big Data Pre-processing
    • Feature Engineering
  • Projects

Tag: Data Pre-processing

Spark SQL with PySpark

January 14, 2022January 18, 2022
Sammy Ongaya
Big Data Pre-processing

Structured Query Language is a foundational relational databases language, it is the primary language for manipulating and organizing data in database systems and most familiar to many data practitioners. Spark provides us with SparkSQL which is responsible for executing SQL

Read more

Pandas on Spark with PySpark

January 14, 2022January 20, 2022
Sammy Ongaya
Big Data Pre-processing

Pandas is a powerful Python data analysis library that makes working with data easier. It comes with tons of functions for manipulating data. Pandas is a standard data analysis tool in Python. It supports DataFrames as its core data structures.

Read more

Spark DataFrames in PySpark

January 14, 2022January 14, 2022
Sammy Ongaya
Big Data Pre-processing

Spark DataFrame is a collection of items organized in rows and columns resembling a table in relational database. They are Sparks data structure implemented on top of Sparks Resilient Distributed Datasets (RDDs) and greatly optimized internally. DataFrames can be created

Read more

RDDs in Spark with PySpark

January 13, 2022January 20, 2022
Sammy Ongaya
Big Data Pre-processing

Resilient Distributed Dataset (RDD) is a fault-tolerant collection of elements partitioned across the nodes of the cluster and can be operated on in parallel. RDD is the main abstraction provided by Spark. RDD is created when a file in HDFS

Read more

Spark with Python

January 13, 2022January 13, 2022
Sammy Ongaya
Big Data Pre-processing

Spark is a popular open source, distributed and in-memory big data processing engine. It’s loved because of its speed and flexibility to support many programming languages such as Scala, Java, Python and R. Spark is originally written in Scala. The

Read more

Introduction to Apache Spark

January 12, 2022January 13, 2022
Sammy Ongaya
Big Data Pre-processing

When it comes to big data storage and processing the tools used are Hadoop and Apache Spark. Hadoop is a distributed storage and processing engine that utilizes MapReduce. One of the limitations of Hadoop is speed of executing big data

Read more

Introduction to Big Data

January 11, 2022January 12, 2022
Sammy Ongaya
Big Data Pre-processing

In today’s world there is more data being generated than ever before. This is due to advancement in technology that has enabled faster processing and transmission of data. Big data is simply data that’s too big to fit in traditional

Read more

Data Augmentation

January 11, 2022May 15, 2022
Sammy Ongaya
Data Pre-processing

Data augmentation is a technique of generating extra data with the purpose of improving the performance of machine learning model. Most machine learning algorithms especially neural networks performs well with large and varied sets of data, sometimes the challenge lies

Read more

Data Leakage

January 11, 2022March 8, 2022
Sammy Ongaya
Data Pre-processing

More often Data Scientist and Machine learning engineers end up developing models that suffer from data leakage without easily noticing. The model performs perfectly well with high performance on validation set but fail while deployed to production. Data leakage is

Read more

Data Sampling

January 9, 2022January 11, 2022
Sammy Ongaya
Data Pre-processing

In this era of big data we often end up with large data that we need to analyse and model. This might take too much time to process, analyse and model. To avoid this we need to select a small

Read more

Posts navigation

Older posts

Categories

  • Big Data Pre-processing
  • Business Intelligence
  • Data Analysis
  • Data Integration
  • Data Pre-processing
  • Data Visualization
  • Feature Engineering
  • Geospatial Analysis
  • Machine Learning
  • Projects
  • Uncategorized
Data 2 Decision
© 2023
Powered by WordPress
Theme: Masonic by ThemeGrill