Introduction to Hadoop
Apache™ Hadoop® is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. As an open-source framework, Hadoop can be installed on standard servers or industry standard servers. Hardware can be added or replaced in a cluster and VMs (Virtual Machines) can also be cloned. Hadoop is economical in the sense that costs are relatively since the software is common across the infrastructure.
In the Hadoop framework lies MapReduce, a software programming framework that is an integral part of Hadoop. MapReduce provides a framework that utilises 2 Hadoop functions. The Map function, and the Reduce function. The first is the map job, which takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). The reduce job takes the output from a map as input and combines those data tuples into a smaller set of tuples. As the sequence of the name MapReduce implies, the reduce job is always performed after the map job.(1) In addition to simplifying the processing of big data sets, MapReduce also provides programmers with a common method of defining and orchestrating complex processing tasks across clusters of computers.
As Hadoop supports current databases and analytical infrastructures, there is no need to worry about Hadoop displacing any information. Hadoop can handle datasets and tasks that can be a problem for a legacy database. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Hadoop clusters are known for boosting the speed of data analysis applications. They also are highly scalable: If a cluster's processing power is overwhelmed by growing volumes of data, additional cluster nodes can be added to increase throughput. Hadoop clusters also are highly resistant to failure because each piece of data is copied onto other cluster nodes, which ensures that the data is not lost if one node fails(2).
The base Apache Hadoop framework is compromised of the following modules:
- Hadoop Common – contains libraries and utilities needed by other Hadoop modules. 
- Hadoop Distributed File System (HDFS) – HDFS is a distributed Java-based file system for storing large volumes of data. HDFS is a scalable, fault tolerant, distributed storage system that works closely with a wide variety of concurrent data access applications, coordinated by YARN (MapReduce). 
- Hadoop YARN – a resource management platform responsible for managing computing resources in clusters and using them for scheduling of users applications. 
- Hadoop MapReduce – an implementation of the MapReduce programming model for large scale data processing. 
An excellent course on learning the basics of Hadoop is offered by the Big Data University. I highly recommend this course as a starting point to learning and understanding Big Data. The course teaches you the basics of Apache Hadoop and the concept of Big Data. As the course is free, it provides all the testing materials and software for free, and it is now accredited by IBM meaning you can get a professionally recognised IBM Badge, certified by Pearson VUE. More information is available here.
1. IBM - What is MapReduce [Internet]. Www-01.ibm.com. 2016 [cited 20 April 2016]. Available from: https://www-01.ibm.com/software/data/infosphere/hadoop/mapreduce/
2. What is Hadoop cluster? - Definition from WhatIs.com [Internet]. SearchBusinessAnalytics. 2016 [cited 20 April 2016]. Available from: http://searchbusinessanalytics.techtarget.com/definition/Hadoop-cluster