Session Spotlight
Apache Spark Core Concepts: Introduction to Distributed Data Processing
Monday, January 16, 2023 - 1:00 PM CST, for 1 hour.
Regular, 60 minute presentation
Room: Campsite 4
spark
pyspark
big data
distributed computing
optimization
Big data is only getting bigger, and being able to make quick, data-driven decisions at scale is more important than ever. That’s why thousands of organizations in both industry and academia use Apache Spark for scalable computing. This talk introduces Spark concepts in an approachable, visual manner that will leave you with a strong foundation for using this powerful data processing and analytics engine.
Prerequisites
None - this talk is designed to be approachable by everyone.
Take Aways
- Learn strategies for optimizing Spark jobs
- Visualize data partitions, data shuffling, drivers & executors, and the layers of spark computation
- Gain a firm understanding of parallel processing frameworks like Apache Spark