Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time. Big data philosophy encompasses unstructured, semi-structured and structured data, however the main focus is on unstructured data. Big data "size" is a constantly moving target, as of 2012 ranging from a few dozen terabytes to many exabytes of data.

Overview of Big Data Technologies and its role in Analytics

Big Data challenges & solutions

Data Science vs Data Engineering

FOUR V's of Big Data given by Google.

Unix & Java

  • Introduction to UNIX shell.
  • Basic Commands of UNIX
  • Create
  • Copy
  • Move
  • Delete etc.
  • Basic of JAVA Programming Language
  • Architecture JVM, JRE, JIT
  • Control Structures
  • OOP's Concept in Java
  • String Classes/Array/Exception Handling
  • Collection Classes

Apache HDFS

  • Understanding the problem statement and challenges persisting to such large data to perceive the need of Distributed File System.
  • Understanding HDFS architecture to solve problems
  • Understanding configuration and creating directory structure to get a solution of the given problem statement
  • Setup appropriate permissions to secure data for appropriate users
  • Setting up Java Development with HDFS libraries to use HDFS Java APIs
  • Apache Map-Reduce

    • What is Map Reduce.
    • Input and output formats.
    • Data Types in Map Reduce.
    • Flow of Map Reduce Jobs.
    • Wordcount In Map Reduce.
    • How to use Custom Input Formats
    • Use case for Structure Data Sets.
    • Writing Custom Classes.

    APACHE HIVE

    • What is HIVE.
    • Architecture of HIVE.
    • Tables in Hive with Load Functions.
    • Query Optimization.
    • Partitioning and Bucketing.
    • Joins in HIVE.
    • Indexing In HIVE.
    • File Formats in HIVE.
    • How to read JSON files in HIVE.

    APACHE SQOOP

    • What is Sqoop.
    • Relation between SQL & Hadoop.
    • Performing Sqoop Import.
    • Incrementals and Conditional Imports
    • Performing Sqoop Export.

    PIG

    • What is PIG & ETL.
    • Introduction to PIG Architecture.
    • Introduction of PIG Latin.
    • How to Perform ETL on any Kind of data (PIG Eats Everything)
    • Use cases of PIG.
    • Joins in PIG.
    • Co-grouping In PIG.

    Introduction to NoSQL Database &OOZIE

    • What is HBASE.
    • Architecture of HBASE.
    • CRUD operations in HBASE
    • Retrival of HBASE Data.
    • Introduction of Apache Oozie (Scheduler tool)

    Introduction to Programming in Scala

    • Basic data types and literals used
    • List the operators and methods used in Scala
    • Classes of Scala
    • Traits of Scala.
    • Control Structures in Scala.
    • Collection of Scala.
    • Libraries of Scala.

    Introduction to Spark

    • Limitations of MapReduce in Hadoop Objectives
    • Batch vs. Real-time analytics
    • Application of stream processing
    • Spark vs. Hadoop Eco-system

    Using RDD for Creating Applications in Spark

    • Features of RDDs
    • How to create RDDs
    • RDD operations and methods
    • Explain RDD functions and describe how to write different codes in Scala

    Running SQL queries Using SparkQL

    • Explain the importance and features of SparkQL
    • Describe methods to convert RDDs to DataFrames
    • Explain concepts of SparkSQL
    • Describe the concept of hive integration

    Spark ML Programming

    • Explain the use cases and techniques of Machine Learning (ML)
    • Describe the key concepts of Spark ML
    • Explain the concept of an ML Dataset, and ML algorithm, model selection via cross validation
COMMENCING NEW BATCHES
ENQUIRY FORM
FOLLOW US ON
SUBSCRIBE TO OUR NEWSLETTER

WE ACCEPT ONLINE PAYMENTS
PAY ONLINE