Skip to content
Data 2 Decision

Data 2 Decision

With Machine Learning

  • Home
  • Data Integration
  • Data Analytics
    • Business Intelligence
    • Data Analysis
    • Data Visualization
    • Geospatial Analysis
  • Machine Learning
    • Data Pre-processing
    • Big Data Pre-processing
    • Feature Engineering
  • Projects

Tag: Apache Spark

Spark SQL with PySpark

January 14, 2022January 18, 2022
Sammy Ongaya
Big Data Pre-processing

Structured Query Language is a foundational relational databases language, it is the primary language for manipulating and organizing data in database systems and most familiar to many data practitioners. Spark provides us with SparkSQL which is responsible for executing SQL

Read more

Pandas on Spark with PySpark

January 14, 2022January 20, 2022
Sammy Ongaya
Big Data Pre-processing

Pandas is a powerful Python data analysis library that makes working with data easier. It comes with tons of functions for manipulating data. Pandas is a standard data analysis tool in Python. It supports DataFrames as its core data structures.

Read more

RDDs in Spark with PySpark

January 13, 2022January 20, 2022
Sammy Ongaya
Big Data Pre-processing

Resilient Distributed Dataset (RDD) is a fault-tolerant collection of elements partitioned across the nodes of the cluster and can be operated on in parallel. RDD is the main abstraction provided by Spark. RDD is created when a file in HDFS

Read more

Spark with Python

January 13, 2022January 13, 2022
Sammy Ongaya
Big Data Pre-processing

Spark is a popular open source, distributed and in-memory big data processing engine. It’s loved because of its speed and flexibility to support many programming languages such as Scala, Java, Python and R. Spark is originally written in Scala. The

Read more

Introduction to Apache Spark

January 12, 2022January 13, 2022
Sammy Ongaya
Big Data Pre-processing

When it comes to big data storage and processing the tools used are Hadoop and Apache Spark. Hadoop is a distributed storage and processing engine that utilizes MapReduce. One of the limitations of Hadoop is speed of executing big data

Read more

Categories

  • Big Data Pre-processing
  • Business Intelligence
  • Data Analysis
  • Data Integration
  • Data Pre-processing
  • Data Visualization
  • Feature Engineering
  • Geospatial Analysis
  • Machine Learning
  • Projects
  • Uncategorized
Data 2 Decision
© 2023
Powered by WordPress
Theme: Masonic by ThemeGrill