Mastering Spark for Data Science Author:Andrew Morgan, Antoine Amend, Matthew Hallett, David George Unlock the complexities of lightning fast data scienceAbout This BookDevelop and apply advanced analytical techniques with SparkLearn how to tell a compelling story in data science using Spark's ecosystemExplore data at a scale and work with cutting edge data science methodsWho This Book Is ForThis book is for those who have beginner-level famil... more »iarity with the Spark architecture and data science applications, who are looking for a challenge and want to learn cutting edge techniques. This book assumes working knowledge of data science, common machine learning methods, and popular data science tools, and assumes you have previously run proof of concept studies and built prototypes.What You Will LearnLearn the design patterns that integrate Spark into with industrialized data science pipelinesUnderstand how commercial data scientists design scalable code and reusable code for data science servicesGet a grasp of the new cutting edge data science methods so you can study trends and causalityFind out how to use Spark as a universal ingestion engine tool and as a web scraperPractice the implementation of advanced topics in graph processing, such as community detection and contact chainingGet to know the best practices when performing Extended Exploratory Data Analysis, commonly used in commercial data science teamsGrasp advanced Spark concepts, as well as solution design patterns and integration architecturesDemonstrate powerful data science pipelinesGet detailed guidance on how to run Spark in productionIn DetailThe purpose of data science is to transform the world using data, and this goal is mainly achieved through disrupting and changing real processes in real industries. To operate at this level, you need to be able to build data science solutions of substance; ones that solve real problems, and that can run reliably enough for people to trust and act on. Spark has emerged as the big data platform of choice for data scientists.This book deep dives into Spark to deliver production-grade data science solutions that are innovative, disruptive, and reliable enough to be trusted. We demonstrate the process through exploring the construction of a sophisticated global news analysis service that uses Spark to generate continuous geopolitical and current affairs insights. We use the core Spark APIs and take a deep-dive into advanced libraries including: Spark SQL, visual streaming, MLlib, and more.We introduce advanced techniques and methods to help you build data science solutions, and show you how to construct commercial grade data products. Using a sequence of tutorials that deliver a working news intelligence service, we explain advanced Spark architectures, unveil sophisticated data science methods, demonstrate how to work with geographic data in Spark, and explain how to tune Spark algorithms so they scale linearly.« less