Big Data and Hadoop training course is designed to provide knowledge and skills to become a successful Hadoop Developer. In-depth knowledge of concepts such as Hadoop Distributed File System, Hadoop Cluster- Single and Multi node, Hadoop 2.x, Flume, Sqoop, Map-Reduce, PIG, Hive, Hbase, Zookeeper, Oozie etc. will be covered in the course. This course is designed for professionals aspiring to make a career in Big Data Analytics using Hadoop Framework. Software Professionals, Analytics Professionals, ETL developers, Project Managers, Testing Professionals are the key beneficiaries of this course. Other professionals who are looking forward to acquire a solid foundation of Hadoop Architecture can also opt for this course.

Introduction to Big Data

  • Overview of Big Data Technologies and its role in Analytics
  • Big Data challenges & solutions
  • Data Science vs Data Engineering
  • FOUR V's of Big Data given by Google.

Unix & Java

  • Introduction to UNIX shell.
  • Basic Commands of UNIX
  • Create
  • Copy
  • Move
  • Delete etc.
  • Basic of JAVA Programming Language
  • Architecture JVM, JRE, JIT
  • Control Structures
  • OOP's Concept in Java
  • String Classes/Array/Exception Handling
  • Collection Classes

Apache HDFS

  • Understanding the problem statement and challenges persisting to such large data to perceive the need of Distributed File System.
  • Understanding HDFS architecture to solve problems
  • Understanding configuration and creating directory structure to get a solution of the given problem statement
  • Setup appropriate permissions to secure data for appropriate users
  • Setting up Java Development with HDFS libraries to use HDFS Java APIs

Apache Map-Reduce

  • What is Map Reduce.
  • Input and output formats.
  • Data Types in Map Reduce.
  • Flow of Map Reduce Jobs.
  • Wordcount In Map Reduce.
  • How to use Custom Input Formats
  • Use case for Structure Data Sets.
  • Writing Custom Classes.


  • What is HIVE.
  • Architecture of HIVE.
  • Tables in Hive with Load Functions.
  • Query Optimization.
  • Partitioning and Bucketing.
  • Joins in HIVE.
  • Indexing In HIVE.
  • File Formats in HIVE.
  • How to read JSON files in HIVE


  • What is Sqoop.
  • Relation between SQL & Hadoop.
  • Performing Sqoop Import.
  • Incrementals and Conditional Imports
  • Performing Sqoop Export.


  • What is PIG & ETL.
  • Introduction to PIG Architecture.
  • Introduction of PIG Latin.
  • How to Perform ETL on any Kind of data(PIG Eats Everything)
  • Use cases of PIG.
  • Joins in PIG.
  • Co-grouping In PIG.

Introduction to NoSQL Database &OOZIE

  • What is HBASE.
  • Architecture of HBASE.
  • CRUD operations in HBASE
  • Retrival of HBASE Data.
  • Introduction of Apache Oozie (Scheduler tool)

Introduction to Programming in Scala

  • Basic data types and literals used
  • List the operators and methods used in Scala
  • Classes of Scala
  • Traits of Scala.
  • Control Structures in Scala.
  • Collection of Scala.
  • Libraries of Scala.

Introduction to Spark

  • Limitations of MapReduce in Hadoop Objectives
  • Batch vs. Real-time analytics
  • Application of stream processing
  • Spark vs. Hadoop Eco-system

Using RDD for Creating Applications in Spark

  • Features of RDDs
  • How to create RDDs
  • RDD operations and methods
  • Explain RDD functions and describe how to write different codes in Scala

Running SQL queries Using SparkQL

  • Explain the importance and features of SparkQL
  • Describe methods to convert RDDs to DataFrames
  • Explain concepts of SparkSQL
  • Describe the concept of hive integration

Spark ML Programming

  • Explain the use cases and techniques of Machine Learning (ML)
  • Describe the key concepts of Spark ML
  • Explain the concept of an ML Dataset, and ML algorithm, model selection via cross validation