Bigdata Training Apache Hadoop Spark scala training

Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications,The challenges include capture,cu-ration,storage,search,sharing,transfer,analysis and visualization.Big data is nothing but an assortment of huge and complex data that becomes very tedious to capture, store, process, retrieve and analyze it. Thanks to on-hand database management tools or traditional data processing techniques, things have become easier now. In fact, the concept of BIG DATAmay vary from company to company depending upon its size, capacity, competence, human resource, techniques and so on.

Hadoop is a framework that allows for the distributed processing of large data sets across dusters of commodity computers using a simple programming mode.It is an Open-source Data Management wit scale-out storage and distributed processing

Certified Bigdata Analyst

Certified Bigdata Analyst

Certified Bigdata Analyst

Bigdata training syllabus

Understanding Big Data and Hadoop

  • Limitations and Solutions of existing Data Analytics Architecture
  • Hadoop Features
  • Hadoop Ecosystem
  • Hadoop 2.x core components
  • Hadoop Storage: HDFS
  • Hadoop Processing: MapReduce Framework
  • Hadoop Different Distributions

Hadoop Architecture and HDFS

  • Hadoop 2.x Cluster Architecture
  • Federation and High Availability
  • A Typical Production Hadoop Cluster
  • Hadoop Cluster Modes
  • Common Hadoop Shell Commands
  • Hadoop 2.x Configuration Files
  • Single node cluster and Multi node cluster set up Hadoop Administration.

Hadoop MapReduce Framework

  • Topics-MapReduce Use Cases
  • Hadoop 2.x MapReduce Architecture
  • YARN MR Application Execution Flow,
  • Anatomy of MapReduce Program
  • Input Splits
  • Relation between Input Splits and HDFS Blocks
  • MapReduce: Combiner & Partitioner
  • Counters ,Distributed Cache
  • MRunit, Reduce Join
  • Custom Input Format
  • Sequence Input Format
  • Xml file Parsing using MapReduce.

Hive

  • Hive Background
  • Hive Vs Pig
  • Hive Architecture and Components
  • Metastore in Hive, Limitations of Hive
  • Comparison with Traditional Database
  • Hive Data Types and Data Models, Partitions and Buckets, Hive Tables(Managed Tables and External Tables), Importing Data, Querying Data, Managing Outputs, Hive Script, Hive UDF, Retail use case in Hive, Hive Demo on Healthcare Data set.
  • Hive QL: Joining Tables, Dynamic Partitioning
  • Custom Map/Reduce Scripts
  • Hive Indexes and views Hive query optimizers
  • Hive : Thrift Server, User Defined Functions, HBase: Introduction to NoSQL Databases and HBase, HBase v/s RDBMS, HBase Components, HBase Architecture, Run Modes & Configuration, HBase Cluster Deployment.

HBase

  • HBase Data Model
  • HBase Shell
  • HBase Client API
  • Data Loading Techniques
  • ZooKeeper Data Model
  • Zookeeper Service
  • Zookeeper, Demos on Bulk Loading
  • Getting and Inserting Data, Filters in HBase

APACHE SPARK & SCALA

  • Apache Spark & scala
  • What is Apache Spark
  • Spark Ecosystem
  • Spark Components
  • Spark a Polyglot
  • Why Scala
  • SparkContext
  • RDD

About Pig

  • MapReduce Vs Pig
  • Programming Structure in Pig
  • Pig Running Modes
  • Pig components, Pig Execution
  • Pig Latin Program, Data Models in Pig
  • Pig Data Types, Shell and Utility Commands, Pig Latin : Relational Operators, File Loaders, Group Operator, COGROUP Operator, Joins and COGROUP, Union, Diagnostic Operators, Specialized joins in Pig, Built In Functions ( Eval Function, Load and Store Functions, Math function, String Function, Date Function, Pig UDF, Piggybank, Parameter Substitution ( PIG macros and Pig Parameter substitution ), Pig Streaming, Testing Pig scripts with Punit, Aviation use case in PIG, Pig Demo on Healthcare Data set.

Oozie Sqoop and Flume

  • Flume and Sqoop
  • Oozie Components, Oozie Workflow
  • Scheduling with Oozie
  • Oozie Co-ordinator
  • Oozie Commands, Oozie Web Console
  • Oozie for MapReduce
  • PIG, Hive, and Sqoop, Combine flow of MR, PIG, Hive in Oozie, Hadoop Project Demo, Hadoop Integration with Talend.