Hadoop
Hadoop Online Training
- Understanding BigData
- What is Big Data?
- Big-Data characteristics
- Hadoop Distributions
- Hortonworks
- Cloudera
- Pivotal HD
- Greenplum
- Introduction to Apache Hadoop
- Flavors of Hadoop: Big-Insights, Google Query etc..
- Hadoop Eco-system components: Introduction
- MapReduce
- HDFS
- Apache Pig
- Apache Hive
- HBASE
- Apache Oozie
- FLUME
- SQOOP
- Apache Mahout
- KIJI
- LUCENE
- SOLR
- KiteSDK
- Impala
- Chukwa
- Shark
- Cascading
- Understanding Hadoop Cluster
- Hadoop Core-Components
- NameNode
- JobTracker
- TaskTracker
- DataNode
- SecondaryNameNode
- HDFS Architecture
- Why 64MB?
- Why Block?
- Why replication factor 3?
- Discuss NameNode and DataNode
- Discuss JobTracker and TaskTracker
- Typical workflow of Hadoop application
- Rack Awareness
- Network Topology
- Assignment of Blocks to Racks and Nodes
- Block Reports
- Heart Beat
- Block Management Service
- Anatomy of File Write
- Anatomy of File Read
- Heart Beats and Block Reports
- Discuss Secondary NameNode
- Usage of FsImage and Edits log
- Map Reduce Overview
- Best Practices to setup Hadoop cluster
- Cluster Configuration
- Core-default.xml
- Hdfs-default.xml
- Mapred-default.xml
- Hadoop-env.sh
- Slaves
- Masters
- Need of *-site.xml
- Map Reduce Framework
- Why Map Reduce?
- Use cases where Map Reduce is used
- Hello world program with Weather Use Case
- Setup environment for the programs
- Possible ways of writing Map Reduce program with sample codes find the best code and discuss
- Configured, Tool, GenericOptionParser and queues usage
- Demo for calculating maximum temperature and Minimum temperature
- Limitations of traditional way of solving word count with large dataset
- Map Reduce way of solving the problem
- Complete overview of MapReduce
- Split Size
- Combiners
- Multi Reducers
- Parts of Map Reduce
- Algorithms
- Apache Hadoop Single Node Installation Demo
- Namenode format
- Apache Hadoop Multi Node Installation Demo
- Add nodes dynamically to a cluster with Demo
- Remove nodes dynamically to a cluster with Demo
- Safe Mode
- Hadoop cluster modes
- Standalone Mode
- Psuedo distributed Mode
- Fully distributed mode
- Revision
- HDFS Practicals(HDFS Commands)
- Map Reduce Anatomy
- Job Submission
- Job Initialization
- Task Assignments
- Task Execution
- Schedulers
- Quiz
- Map Reduce Failure Scenarios
- Speculative Execution
- Sequence File
- Input File Formats
- Output File Formats
- Writable DataTypes
- Custom Input Formats
- Custom keys, Values usage of writables
- Walkthrough the installation process through the cloudera manager
- Example List, show sample example list for the installation
- Demo on teragen, wordcount, inverted index, examples
- Debugging Map Reduce Programs
- Map Reduce Advance Concepts
- Partitioning and Custom Partitioner
- Joins
- Multi outputs
- Counters
- MR unit testcases
- MR Design patterns
- Distributed Cache
- Command line implementation
- MapReduce API implementation
- Map Reduce Advance concepts examples
- Introduction to course Project
- Data loading techniques
- Hadoop Copy commands
- Put,get,copyFromLocal,copyToLocal,mv,chmod,rmr,rmr –skipTrash,distcp,ls,lsr,df,du,cp,moveFromLocal,moveToLocal,text,touhz,tail,mkdir,help
- Flume
- Sqoop
- Demo for Hadoop Copy Commands
- Sqoop Theory
- Demo for Sqoop
- Need of Pig?
- Why Pig Created?
- Introduction to skew Join
- Why go for Pig when Map Reduce is there?
- Pig use cases
- Pig built in operators
- Pig store schem
- Operators
- Load
- Store
- Dump
- Filter
- Distinct
- Group
- CoGroup
- Join
- Stream
- Foreach Generate
- Parallel
- Distinct
- Limit
- ORDER
- CROSS
- UNION
- SPLIT
- Sampling
- Complex
- Bag
- Tuple
- Atom
- Map
- Integers
- Float
- Chararray
- byteArray
- Double
- Describe
- Explain
- Illustrate
- Filter Function
- Eval Function
- Macros
- Demo
- Storage Handlers
- Pig Practicals and Usecases
- Demo using schema
- Demo using without schema
- Hive Background
- What is Hive?
- Pig Vs Hive
- Where to Use Hive?
- Hive Architecture
- Metastore
- Hive execution modes
- External, Manged, Native and Non-native tables
- Hive Partitions
- Dynamic Partitions
- Static Partitions
- Buckets
- Hive DataModel
- Hive DataTypes
- Primitive
- Complex
- Create Managed Table
- Load Data
- Insert overwrite table
- Insert into Local directory
- CTAS
- Insert Overwrite table select
- Inner Joins
- Outer Joins
- Skew Joins
- Multi-table Inserts
- Multiple files, directories, table inserts
- Serde
- View
- Index
- UDF
- UDAF
- Hive Practicals
- Oozie Architecture
- Workflow designing in Oozie
- Oozie practicals
- YARN Architecture
- Hadoop Classic vs YARN
- YARN Demo
- Flume Architecture
- Flume Practicals
- Zoo Keeper
- Introduction to NOSQL Databases
- NOSql Landscapes
- Introduction to HBASE
- HBASE vs RDBMS
- Create Table on HBASE using HBASE shell
- Where to use HBASE?
- Where not to use HBASE?
- Write Files to HBASE
- Major Components of HBASE
- HBase Master
- HRegionServer
- HBase Client
- Zookeeper
- Region
- HBase Practicals
- HBASE –ROOT- Catalog table
- CAP Theorm
- Compaction
- Sharding
- Sparse Datastore
- Cassandra Architecture
- Big Table and Dynamo
- Distributed Hash Table, P2P Fault Tolerant
- Data Modelling
- Column Families
- Installation Demo on Cassandra
- Practicals
- Real time Project Analysis
- Design
- Implementation
- Execution
- Debugging
- Optimization Techniques
- Which one to use where
- Amazon Web Services(Hadoop on Cloud) – Installations for MultiNode
- EMR and S3
- Storm Architecture
- Real time use case with Storm
- Spark
- What is Spark?
- Understanding Spark
- Spark Architecture
- RDD
- Hadoop RDD
- RDDs Partitioning
- Lazy Evaluation
- Caching
- Spark Context
- Map, flatMap, filter
- Actions
- Serialization
- Scala
- Scala Features
- Scala Functions
- Collections and Combiners
- Spark with Scala
- Spark with Yarn
- Spark on Cluster mode
- Spark CLI
- Spark programming with Java API
- Spark Streaming
- Spark SQL
- Spark SQL Context
- Spark SQL with Hive
- Spark MLib Algorithms(K-Means, Clustering,..)
- Spark GraphX Overview
- Hands On and Usecases
- Impala Architecture
- Impala Practicals
- Adhoc Querying in Impala
- Compression Techniques
- Snappy
- LZO
- Bgzip
- Image processing in Hadoop
- Certification Preparation Guidelines
- Best Practices to setup Hadoop cluster
- Commissioning and Decommissioning Nodes
- Benchmarking the Hadoop cluster
- Admin monitoring tools
- Routine Admin tasks
- Kafka Architecture
- Kafka Usecase Execution
No comments:
Post a Comment
Note: only a member of this blog may post a comment.