Skip to content
Data 2 Decision

Data 2 Decision

With Machine Learning

  • Home
  • Data Integration
  • Data Analytics
    • Business Intelligence
    • Data Analysis
    • Data Visualization
    • Geospatial Analysis
  • Machine Learning
    • Data Pre-processing
    • Big Data Pre-processing
    • Feature Engineering
  • Projects

Category: Big Data Pre-processing

Spark SQL with PySpark

January 14, 2022January 18, 2022
Sammy Ongaya
Big Data Pre-processing

Structured Query Language is a foundational relational databases language, it is the primary language for manipulating and organizing data in database systems and most familiar to many data practitioners. Spark provides us with SparkSQL which is responsible for executing SQL

Read more

Pandas on Spark with PySpark

January 14, 2022January 20, 2022
Sammy Ongaya
Big Data Pre-processing

Pandas is a powerful Python data analysis library that makes working with data easier. It comes with tons of functions for manipulating data. Pandas is a standard data analysis tool in Python. It supports DataFrames as its core data structures.

Read more

Spark DataFrames in PySpark

January 14, 2022January 14, 2022
Sammy Ongaya
Big Data Pre-processing

Spark DataFrame is a collection of items organized in rows and columns resembling a table in relational database. They are Sparks data structure implemented on top of Sparks Resilient Distributed Datasets (RDDs) and greatly optimized internally. DataFrames can be created

Read more

RDDs in Spark with PySpark

January 13, 2022January 20, 2022
Sammy Ongaya
Big Data Pre-processing

Resilient Distributed Dataset (RDD) is a fault-tolerant collection of elements partitioned across the nodes of the cluster and can be operated on in parallel. RDD is the main abstraction provided by Spark. RDD is created when a file in HDFS

Read more

Spark with Python

January 13, 2022January 13, 2022
Sammy Ongaya
Big Data Pre-processing

Spark is a popular open source, distributed and in-memory big data processing engine. It’s loved because of its speed and flexibility to support many programming languages such as Scala, Java, Python and R. Spark is originally written in Scala. The

Read more

Introduction to Apache Spark

January 12, 2022January 13, 2022
Sammy Ongaya
Big Data Pre-processing

When it comes to big data storage and processing the tools used are Hadoop and Apache Spark. Hadoop is a distributed storage and processing engine that utilizes MapReduce. One of the limitations of Hadoop is speed of executing big data

Read more

Introduction to Big Data

January 11, 2022January 12, 2022
Sammy Ongaya
Big Data Pre-processing

In today’s world there is more data being generated than ever before. This is due to advancement in technology that has enabled faster processing and transmission of data. Big data is simply data that’s too big to fit in traditional

Read more

Categories

  • Big Data Pre-processing
  • Business Intelligence
  • Data Analysis
  • Data Integration
  • Data Pre-processing
  • Data Visualization
  • Feature Engineering
  • Geospatial Analysis
  • Machine Learning
  • Projects
  • Uncategorized
Data 2 Decision
© 2023
Powered by WordPress
Theme: Masonic by ThemeGrill