Skip to content
Data 2 Decision

Data 2 Decision

With Machine Learning

  • Home
  • Data Integration
  • Data Analytics
    • Business Intelligence
    • Data Analysis
    • Data Visualization
    • Geospatial Analysis
  • Machine Learning
    • Data Pre-processing
    • Big Data Pre-processing
    • Feature Engineering
  • Projects

Category: Machine Learning

Feature Scaling

January 28, 2022January 28, 2022
Sammy Ongaya
Feature Engineering

Feature scaling is a technique of normalizing or standardizing data into a certain range suitable for fitting a machine learning algorithm. If we don’t scale our data for example we have a variable called age with values in range 12

Read more

Feature Encoding

January 27, 2022March 8, 2022
Sammy Ongaya
Feature Engineering, Uncategorized

One key challenge with most of the machine learning algorithm is the inability to work with categorical variables. They need the features to be converted to numeric form. The processing of transforming categorical features to numerical form is referred to

Read more

Feature Selection

January 24, 2022January 27, 2022
Sammy Ongaya
Feature Engineering

Feature selection is a feature engineering process (sometimes considered as data pre-processing method) of choosing the variables with high predictive power from the dataset. It can be conducted for both supervised and unsupervised learning. For supervised approach we statistically evaluate

Read more

Dimensionality Reduction

January 20, 2022March 15, 2022
Sammy Ongaya
Feature Engineering

Working with data with high dimension often possess a challenge commonly referred to as the “curse of dimensionality”. It’s difficult to analyse, visualize and model high-dimensional data hence the need to transform the data from high-dimension to low-dimension. The process

Read more

Feature Engineering

January 18, 2022March 22, 2022
Sammy Ongaya
Feature Engineering

Feature engineering is the process of using domain knowledge and scientific techniques to select and transform the most important features/variables that will yield an optimum machine learning or statistical model. The main goal of feature engineering is to increase the

Read more

Spark SQL with PySpark

January 14, 2022January 18, 2022
Sammy Ongaya
Big Data Pre-processing

Structured Query Language is a foundational relational databases language, it is the primary language for manipulating and organizing data in database systems and most familiar to many data practitioners. Spark provides us with SparkSQL which is responsible for executing SQL

Read more

Pandas on Spark with PySpark

January 14, 2022January 20, 2022
Sammy Ongaya
Big Data Pre-processing

Pandas is a powerful Python data analysis library that makes working with data easier. It comes with tons of functions for manipulating data. Pandas is a standard data analysis tool in Python. It supports DataFrames as its core data structures.

Read more

Spark DataFrames in PySpark

January 14, 2022January 14, 2022
Sammy Ongaya
Big Data Pre-processing

Spark DataFrame is a collection of items organized in rows and columns resembling a table in relational database. They are Sparks data structure implemented on top of Sparks Resilient Distributed Datasets (RDDs) and greatly optimized internally. DataFrames can be created

Read more

RDDs in Spark with PySpark

January 13, 2022January 20, 2022
Sammy Ongaya
Big Data Pre-processing

Resilient Distributed Dataset (RDD) is a fault-tolerant collection of elements partitioned across the nodes of the cluster and can be operated on in parallel. RDD is the main abstraction provided by Spark. RDD is created when a file in HDFS

Read more

Spark with Python

January 13, 2022January 13, 2022
Sammy Ongaya
Big Data Pre-processing

Spark is a popular open source, distributed and in-memory big data processing engine. It’s loved because of its speed and flexibility to support many programming languages such as Scala, Java, Python and R. Spark is originally written in Scala. The

Read more

Posts navigation

Older posts

Categories

  • Big Data Pre-processing
  • Business Intelligence
  • Data Analysis
  • Data Integration
  • Data Pre-processing
  • Data Visualization
  • Feature Engineering
  • Geospatial Analysis
  • Machine Learning
  • Projects
  • Uncategorized
Data 2 Decision
© 2023
Powered by WordPress
Theme: Masonic by ThemeGrill