Description
Book Description
Scala has been witnessing wide-scale adoption over the past few years, particularly in the field of data science and analytics. Spark, which is built on Scala, has also gained recognition, and is now being used widely in production. This book is designed to help you leverage the power of Scala and Spark to make sense of big data.
Scala and Spark for Big Data Analytics begins by introducing you to Scala and helping you understand the object-oriented and functional programming concepts required for Spark application development. You’ll then move onto Spark and cover basic abstractions using Resilient Distributed Dataset (RDD) and DataFrame. This will help you develop scalable, fault-tolerant streaming applications by analyzing structured and unstructured data using SparkSQL, GraphX, and Spark structured streaming. In the sections to follow, you’ll explore advanced topics, such as monitoring, configuration, debugging, testing, and deployment, which will further help you to manage your data effectively.
After this, you’ll learn to use SparkR and PySpark APIs to develop impactful applications, and deploy Zeppelin to help you create interactive data analytics. Towards the concluding chapters, you’ll be able to use Alluxio to facilitate in-memory data processing.
By the end of this book, you’ll have a clear understanding of Spark and be able to perform full-stack data analytics regardless of the amount of data.
Key Features
- Experience Scala’s sophisticated type system, combining functional programming and object-oriented concepts
- Work on an array of applications, ranging from simple batch jobs to stream processing and machine learning
- Perform large-scale data analysis by exploring both common as well as complex use-cases
What you will learn
- Get an in-depth understanding of Scala collection APIs
- Work with RDD and DataFrame to learn Spark’s core abstractions
- Analyse structured and unstructured data using SparkSQL and GraphX
- Build scalable and fault-tolerant streaming applications using Spark structured streaming
- Discover machine-learning (ML) best practices for classification, regression, dimensionality reduction, and recommendation system to build predictive models with widely used algorithms in Spark MLlib and ML
- Develop clustering models to cluster a vast amount of data
- Get to grips with tuning, debugging, and monitoring Spark applications
- Deploy Spark applications on real clusters in Standalone, Mesos, and Yet Another Resource Negotiator (YARN)
Who this book is for
If you want to learn how to perform data analysis by harnessing the power of Spark, this is the book for you. Prior knowledge of Spark or Scala is not required. Programming experience (particularly with other Java virtual machine(JVM) languages) will be useful to help you grasp the concepts easily.
Table of Contents
- Introduction to Scala
- Object-Oriented Scala
- Functional Programming Concepts
- Collection APIs
- Tackle Big Data – Spark Comes to the Party
- Start Working with Spark – REPL and RDDs
- Special RDD Operations
- Introduce a Little Structure – Spark SQL
- Stream Me Up, Scotty – Spark Streaming
- Everything is Connected – GraphX
- Learning Machine Learning – Spark MLlib and Spark ML
- My Name is Bayes, Naive Bayes
- Time to Put Some Order – Cluster Your Data with Spark MLlib
- Text Analytics Using Spark ML
- Spark Tuning
- Time to Go to ClusterLand – Deploying Spark on a Cluster
- Testing and Debugging Spark
- PySpark and SparkR
Reviews
There are no reviews yet.