COURSE INFORMATION

 
 

COURSE

Spark for Python

DESCRIPTION

With Spark, you can have a single engine where you can explore and play with large amounts of data, run machine learning algorithms and then use the same system to productionize your code.

PREREQUISITES

  • Knowledge of Python

  • Basic knowledge of Java

CURRICULUM

INTRODUCTION TO SPARK


  • What does Donald Rumsfeld have to do with data analysis?
  • Why is Spark so cool?
  • An introduction to RDDs - Resilient Distributed Datasets
  • Built-in libraries for Spark
  • Installing Spark
  • The PySpark Shell
  • Transformations and Actions
  • See it in Action : Munging Airlines Data with PySpark - I
  • [For Linux/Mac OS Shell Newbies] Path and other Environment Variables




RESILIENT DISTRIBUTED DATASETS


  • RDD Characteristics: Partitions and Immutability
  • RDD Characteristics: Lineage, RDDs know where they came from
  • What can you do with RDDs?
  • Create your first RDD from a file
  • Average distance travelled by a flight using map() and reduce() operations
  • Get delayed flights using filter(), cache data using persist()
  • Average flight delay in one-step using aggregate()
  • Frequency histogram of delays using countByValue()
  • See it in Action : Analyzing Airlines Data with PySpark - II




BASIC SEARCH & OPTIMIZATION ALGORITHMS


  • Brute-force search introduction
  • Brute-force search example
  • Stochastic search introduction
  • Stochastic search example
  • Hill climbing introduction
  • Hill climbing example




ADVANCED RDDS: PAIR RESILIENT DISTRIBUTED DATASETS


  • Special Transformations and Actions
  • Average delay per airport, use reduceByKey(), mapValues() and join()
  • Average delay per airport in one step using combineByKey()
  • Get the top airports by delay using sortBy()
  • Lookup airport descriptions using lookup(), collectAsMap(), broadcast()
  • See it in Action : Analyzing Airlines Data with PySpark - III




ADVANCED SPARK: ACCUMULATORS, SPARK SUBMIT, MAPREDUCE , BEHIND THE SCENES


  • Get information from individual processing nodes using accumulators
  • See it in Action : Using an Accumulator variable
  • Long running programs using spark-submit
  • See it in Action : Running a Python script with Spark-Submit
  • Behind the scenes: What happens when a Spark script runs?
  • Running MapReduce operations
  • See it in Action : MapReduce with Spark




JAVA AND SPARK


  • The Java API and Function objects
  • Pair RDDs in Java
  • Running Java code
  • Installing Maven
  • See it in Action : Running a Spark Job with Java




PAGERANK: RANKING SEARCH RESULTS


  • What is PageRank?
  • The PageRank algorithm
  • Implement PageRank in Spark
  • Join optimization in PageRank using Custom Partitioning
  • See it Action : The PageRank algorithm using Spark




SPARK SQL


  • Dataframes: RDDs + Tables
  • See it in Action : Dataframes and Spark SQL




MLLIB IN SPARK: BUILD A RECOMMENDATIONS ENGINE


  • Collaborative filtering algorithms
  • Latent Factor Analysis with the Alternating Least Squares method
  • Music recommendations using the Audioscrobbler dataset
  • Implement code in Spark using MLlib




SPARK STREAMING


  • Introduction to streaming
  • Implement stream processing in Spark using Dstreams
  • Stateful transformations using sliding windows
  • See it in Action : Spark Streaming




GRAPH LIBRARIES


  • The Marvel social network using Graphs




INTERVIEW WITH SINGAPORE EXPERT


  • Background of Expert
  • Information and Communication Technology in Singapore





COURSE FEES

$309.98

You may be eligible for up to 100% of subsidy for your course fees.

Check here.

INSTRUCTIONS FOR ENROLMENT

1. Check if you are eligible for subsidies here and follow the instructions.

2. Purchase the e-module below:

3. Purchase the face-to-face lesson below:

4. Once you have completed 90% of your e-module, the Gen Infiniti team will contact you to schedule your in-person lesson.

Drop us an email at connect@geninfinitiacademy.com if you face any issues or problems enrolling in the course.

 
 

QUICK LINKS

Home

About Us

Courses

Subsidies

CONNECT WITH US

connect@geninfinitiacademy.com

+65 6909 1888

+65 6909 1889

Customer Service Hotline: +65 8660 4591

VISIT US

7 Jurong West Avenue 5

#02-01 Singapore 649486

Facebook logo leading to Gen Infiniti Academy's Facebook Page
Instagram logo leading to Gen Infiniti Academy's Instagram Page
YouTube logo leading to Gen Infiniti Academy's YouTube Channel
Telegram logo leading to Gen Infiniti Academy's Telegram group

© 2020 Gen Infiniti Academy  |  Privacy Policy  |  Terms of Service  |  Refund Policy