Self Learning Kit on HADOOP
What is APACHE HADOOP?
Apache Hadoop is an open source framework that is used to efficiently store and process large volume datasets and data intensive computations that are humungous in size. It provides a software platform for distributed storage that can handle large volumes of structured and unstructured data. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.
The Hadoop framework is mostly written in Java, with some native code in C. It was developed by Doug Cutting and Mike Cafarella in 2006.
Do we need to pay even a single dime in Learning APACHE HADOOP ?
Answer to this Question is Clear No. Below is a Track we need to follow with all the free learning resources on the Web.
Learning HADOOP Online for Free involves 9 key steps as below:
- History, Origins and Naming of Hadoop
- Understanding What is Hadoop?
- Scalability and Hadoop. Why Hadoop Scales?
- SQL Comparison with Hadoop
- HDFS File Sytem
- HADOOP Framework Architecture, Ecosystem
- Hadoop Installation
- Running and Processing Hadoop Jobs
- Performing Data Computations with Hadoop
History and Origins of Hadoop
Naming of HADOOP
Understanding HADOOP as a Layman in 5 minutes:
What is HADOOP ? Simplified Concept
Why HADOOP Scales?
How does Hadoop Compare with Traditional SQL?
HDFS Explained in 90 Seconds (Hadoop Distributed File System)
Issues in HDFS – What happens if a node fails?
Issue of Data Redundancy in HDFS
How NameNode helps in HDFS?
Recommendation:
Complete Playlist – Intro to Hadoop and MapReduce by Udacity
HADOOP Architecture and Framework
The APACHE HADOOP Framework or Ecosystem is composed of the 5 main modules:
- HDFS (Hadoop Distributed File System) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster;
- YARN – (Yet Another Resource Negotiator, introduced in 2012) a platform responsible for managing computing resources in clusters and using them for scheduling users’ applications; It schedules jobs and tasks.
- MapReduce –A framework that helps programs do the parallel computation on data. It is an implementation of the MapReduce programming model (converts it into a dataset that is processed and computed in key value pairs) for large-scale data processing.
- Hadoop Common – contains libraries and utilities needed by other Hadoop modules;
- Hadoop Ozone– (introduced in 2020) An object store for Hadoop
HDFS Explained for a Layman
Important Guides and Resources for Hadoop Installations
The Installation of Hadoop is divided into basic steps:1.) Install Java environment, 2.) After that Install SSH, 3.) Then Install HomeBrew 4.) Install Hadoop through HomeBrew and Finally 5.) Health Check. Below are important online resources and Guides that would help you in installation of HADOOP
- How to Install Hadoop on Mac OS X El Capitan
- Setup Hadoop HDFS on MAC
- Install Hadoop on Mac – Ultimate Step by Step Guide
- How to install Hadoop on Mac OS
- Hadoop Installation on Mac
- Install Hadoop 3.2.1 on Windows 10 Step by Step Guide
- Installing Hadoop 3.2.1 Single node cluster on Windows 10
- Single Node Cluster Hadoop Installation on Windows
- How to Install and Configure Hadoop on Ubuntu 20.04
- Hadoop Installation: Setting up a Single Node Hadoop Cluster
- Setting Up A Multi Node Cluster In Hadoop 2.X
REFER What is Hadoop & How to install Hadoop on MacOS ?
List of Free Online Big Data HADOOP Courses
- Intro to Hadoop and MapReduce by UDACITY/CLOUDERA
- Deploying a Hadoop Cluster by UDACITY – Analyze Data with Hadoop and MapReduce
- Hadoop Platform and Application Framework by UC SanDiego on Coursera
- Big Data Specialisation by UC SanDiego on Coursera
Free Hadoop Tutorials and Articles
Popular Free Hadoop Video Tutorials on Youtube
- Learn Hadoop In 10 Hours-Hadoop Tutorial For Beginners by EDUREKA
- Hadoop Tutorial For Beginners 2022, Full Course In 10 Hours by SIMPLILEARN
- Big Data & Hadoop Full Course-Hadoop Training by INTELLIPAAT
Most Popular Hadoop Communities on LINKEDIN
- Hadoop Users
- Apache Hadoop India Community
- Hadoop Developers
- Apache Hadoop Professionals
- Big Data Hadoop | Spark | Kafka Jobs
Important HADOOP Discussions on QUORA
- How should I start learning Hadoop?
- How long does it usually take to learn Hadoop?
- What are the prerequisites to learn Hadoop for a newbie?
Common Commercial Applications of Hadoop are:
- Analysis of Log or Clickstream data of Customer in Real time
- Retail Analytics- Targeted Promotions and Marketing.
- Targeted Online Advertising by Analysis of Facebook, Twitter, Instagram Data
- Sentiment Analysis and Customer Segmentation
- Machine learning, Image processing
- Real Time Message Processing
- Web Crawling