Introduction To Hadoop

Name: _____________________

Date: _____________________

Instructions: Answer all questions. Write your answers clearly in the space provided.

Question 1:

Hadoop achieves reliability by replicating the data across multiple hosts and hence does not require . . . . . . . . storage on hosts.

A. RAID
B. Standard RAID levels
C. ZFS
D. Operating system
Answer: _________
Question 2:

. . . . . . . . is a framework for performing remote procedure calls and data serialization.

A. Drill
B. BigTop
C. Avro
D. Chukwa
Answer: _________
Question 3:

All of the following accurately describe Hadoop, EXCEPT . . . . . . . .

A. Open-source
B. Real-time
C. Java-based
D. Distributed computing approach
Answer: _________
Question 4:

Sun also has the Hadoop Live CD . . . . . . . . project, which allows running a fully functional Hadoop cluster using a live CD.

A. OpenOffice.org
B. OpenSolaris
C. GNU
D. Linux
Answer: _________
Question 5:

Above the file systems comes the . . . . . . . . engine, which consists of one Job Tracker, to which client applications submit MapReduce jobs.

A. MapReduce
B. Google
C. Functional programming
D. Facebook
Answer: _________
Question 6:

. . . . . . . . is the most popular high-level Java API in Hadoop Ecosystem

A. Scalding
B. HCatalog
C. Cascalog
D. Cascading
Answer: _________
Question 7:

What license is Hadoop distributed under?

A. Apache License 2.0
B. Mozilla Public License
C. Shareware
D. Commercial
Answer: _________
Question 8:

Facebook Tackles Big Data With . . . . . . . . based on Hadoop.

A. 'Project Prism'
B. 'Prism'
C. 'Project Big'
D. 'Project Data'
Answer: _________
Question 9:

Point out the wrong statement.

A. Hardtop processing capabilities are huge and its real advantage lies in the ability to process terabytes & petabytes of data
B. Hadoop uses a programming model called "MapReduce", all the programs should conform to this model in order to work on the Hadoop platform
C. The programming model, MapReduce, used by Hadoop is difficult to write and test
D. All of the mentioned
E. Elastic MapReduce (EMR) is Facebook's packaged Hadoop offering
F. Amazon Web Service Elastic MapReduce (EMR) is Amazon's packaged Hadoop offering
G. Scalding is a Scala API on top of Cascading that removes most Java boilerplate
H. All of the mentioned
Answer: _________
Question 10:

. . . . . . . . has the world's largest Hadoop cluster.

A. Apple
B. Datamatics
C. Facebook
D. None of the mentioned
Answer: _________
Question 11:

IBM and . . . . . . . . have announced a major initiative to use Hadoop to support university courses in distributed computer programming.

A. Google Latitude
B. Android (operating system)
C. Google Variations
D. Google
Answer: _________
Question 12:

Point out the correct statement.

A. Hive is not a relational database, but a query engine that supports the parts of SQL specific to querying data
B. Hive is a relational database with SQL support
C. Pig is a relational database with SQL support
D. All of the mentioned
E. Hadoop do need specialized hardware to process the data
F. Hadoop 2.0 allows live stream processing of real-time data
G. In the Hadoop programming framework output files are divided into lines or records
H. None of the mentioned
I. Hadoop is an ideal environment for extracting and transforming small volumes of data
J. Hadoop stores data in HDFS and supports data compression/decompression
K. The Giraph framework is less useful than a MapReduce job to solve graph and machine learning
L. None of the mentioned
Answer: _________
Question 13:

As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including . . . . . . . .

A. Improved data storage and information retrieval
B. Improved extract, transform and load features for data integration
C. Improved data warehousing functionality
D. Improved security, workload management, and SQL support
Answer: _________
Question 14:

Which of the following platforms does Hadoop run on?

A. Bare metal
B. Debian
C. Cross-platform
D. Unix-like
Answer: _________
Question 15:

. . . . . . . . is general-purpose computing model and runtime system for distributed data analytics.

A. Mapreduce
B. Drill
C. Oozie
D. None of the mentioned
Answer: _________
Question 16:

What was Hadoop written in?

A. Java (software platform)
B. Perl
C. Java (programming language)
D. Lua (programming language)
Answer: _________
Question 17:

Hadoop is a framework that works with a variety of related tools. Common cohorts include . . . . . . . .

A. MapReduce, Hive and HBase
B. MapReduce, MySQL and Google Apps
C. MapReduce, Hummer and Iguana
D. MapReduce, Heron and Trumpet
Answer: _________
Question 18:

What was Hadoop named after?

A. Creator Doug Cutting's favorite circus act
B. Cutting's high school rock band
C. The toy elephant of Cutting's son
D. A sound Cutting's laptop made during Hadoop development
Answer: _________
Question 19:

The Hadoop list includes the HBase database, the Apache Mahout . . . . . . . . system, and matrix operations.

A. Machine learning
B. Pattern recognition
C. Statistical classification
D. Artificial intelligence
Answer: _________
Question 20:

What is the primary function of Apache ZooKeeper in Hadoop?

A. Configuration management
B. Real-time data processing
C. High-speed data ingestion
D. Data storage in HBase
Answer: _________
Question 21:

Which programming paradigm is central to Hadoop processing?

A. Object-oriented programming (OOP)
B. Procedural programming
C. Declarative programming
D. Functional programming
Answer: _________
Question 22:

What is the purpose of the ResourceManager in Hadoop YARN?

A. Manage storage layer
B. Manage computation resources
C. Execute MapReduce jobs
D. Manage Hadoop ecosystem tools
Answer: _________
Question 23:

Which Hadoop ecosystem tool is used for querying and analyzing large datasets?

A. Pig
B. Sqoop
C. Hive
D. HBase
Answer: _________
Question 24:

What is the role of the Hadoop JobTracker in MapReduce?

A. Manage resource allocation
B. Manage task execution in MapReduce jobs
C. Manage HDFS metadata
D. Manage ZooKeeper configurations
Answer: _________
Question 25:

In Hadoop, what is the purpose of the DataNode?

A. Store data blocks
B. Manage computation resources
C. Execute MapReduce jobs
D. Manage HDFS metadata
Answer: _________
Question 26:

Which Apache project in Hadoop is designed for real-time data streaming?

A. Spark
B. Flume
C. HBase
D. Drill
Answer: _________
Question 27:

What is the function of the Hadoop ResourceManager?

A. Manage resource allocation
B. Store data blocks
C. Execute MapReduce jobs
D. Manage Hadoop ecosystem tools
Answer: _________
Question 28:

In Hadoop, what is the purpose of the Hadoop Distributed File System (HDFS)?

A. Real-time data processing
B. Store and manage large volumes of data
C. Distribute computation resources
D. Manage configuration files
Answer: _________
Question 29:

Which component of Hadoop is responsible for resource management?

A. ResourceManager
B. NameNode
C. DataNode
D. Secondary NameNode
Answer: _________
Question 30:

What is Apache ZooKeeper used for in Hadoop?

A. Data ingestion
B. Configuration management
C. Querying and analysis
D. Real-time data processing
Answer: _________
Question 31:

What is the purpose of Apache Ambari in the Hadoop ecosystem?

A. Data processing
B. Data integration
C. Cluster management and monitoring
D. Scripting in Hadoop
Answer: _________
Question 32:

Which Apache project is commonly used for real-time data processing in Hadoop?

A. Apache Spark
B. Apache Flume
C. Apache HBase
D. Apache Drill
Answer: _________
Question 33:

What does the term "Hadoop ecosystem" refer to?

A. The hardware infrastructure
B. The software stack built on top of Hadoop
C. The process of installing Hadoop
D. The networking configuration
Answer: _________
Question 34:

In Hadoop, what does the term "JobTracker" refer to?

A. Manages resource allocation
B. Manages task execution in MapReduce jobs
C. Manages HDFS metadata
D. Manages ZooKeeper configurations
Answer: _________
Question 35:

What is the purpose of the Hadoop Secondary NameNode?

A. Manage computation resources
B. Store a backup of the entire HDFS metadata
C. Failover for the NameNode
D. Manage ZooKeeper configurations
Answer: _________
Question 36:

Which Apache project is used for workflow automation in Hadoop?

A. Oozie
B. Flume
C. Spark
D. NiFi
Answer: _________
Question 37:

What is the main function of the Hadoop Distributed File System (HDFS)?

A. Real-time data processing
B. Store and manage large volumes of data
C. Distribute computation resources
D. Manage configuration files
Answer: _________
Question 38:

Which Hadoop ecosystem tool is used for managing and monitoring Hadoop clusters?

A. Ambari
B. Pig
C. Drill
D. Cassandra
Answer: _________
Question 39:

In the context of Hadoop, what does the term "shuffle" refer to?

A. A data compression technique
B. The process of distributing data to reducers
C. A phase in MapReduce processing
D. An optimization in HBase storage
Answer: _________
Question 40:

. . . . . . . . can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data.

A. MapReduce
B. Mahout
C. Oozie
D. All of the mentioned
Answer: _________
Question 41:

Hive also support custom extensions written in . . . . . . . .

A. C#
B. Java
C. C
D. C++
Answer: _________
Question 42:

. . . . . . . . hides the limitations of Java behind a powerful and concise Clojure API for Cascading.

A. Scalding
B. HCatalog
C. Cascalog
D. All of the mentioned
Answer: _________
Question 43:

The Pig Latin scripting language is not only a higher-level data flow language but also has operators similar to . . . . . . . .

A. SQL
B. JSON
C. XML
D. All of the mentioned
Answer: _________
Question 44:

. . . . . . . . is a platform for constructing data flows for extract, transform, and load (ETL) processing and analysis of large datasets.

A. Pig Latin
B. Oozie
C. Pig
D. Hive
Answer: _________
Question 45:

. . . . . . . . jobs are optimized for scalability but not latency.

A. Mapreduce
B. Drill
C. Oozie
D. Hive
Answer: _________
Question 46:

According to analysts, for what can traditional IT systems provide a foundation when they're integrated with big data technologies like Hadoop?

A. Big data management and data mining
B. Data warehousing and business intelligence
C. Management of Hadoop clusters
D. Collecting and storing unstructured data
Answer: _________
Question 47:

Which of the following genres does Hadoop produce?

A. Distributed file system
B. JAX-RS
C. Java Message Service
D. Relational Database Management System
Answer: _________
Question 48:

What is Hadoop primarily designed for?

A. Real-time data processing
B. Batch processing of large datasets
C. Structured data storage
D. In-memory caching
Answer: _________
Question 49:

What is the core component of Hadoop responsible for distributed storage?

A. YARN
B. HDFS
C. MapReduce
D. Hive
Answer: _________
Question 50:

Which programming language is commonly used for writing MapReduce programs?

A. Python
B. Java
C. Ruby
D. C++
Answer: _________
Question 51:

What does HDFS stand for in the context of Hadoop?

A. Hadoop Distributed File System
B. High-level Data Format System
C. Hadoop Data Flow System
D. High-Density File Storage
Answer: _________
Question 52:

In Hadoop, what is the purpose of a NameNode?

A. Store data blocks
B. Manage computation resources
C. Manage metadata of HDFS
D. Execute MapReduce jobs
Answer: _________
Question 53:

What is the default replication factor in HDFS?

A. 1
B. 2
C. 3
D. 4
Answer: _________
Question 54:

Which Apache project is used for data ingestion from relational databases?

A. Sqoop
B. Flume
C. Pig
D. Hive
Answer: _________
Question 55:

What does the term "MapReduce" refer to in Hadoop?

A. A specific Hadoop cluster
B. A programming model and processing engine
C. A data storage format in Hadoop
D. A type of Hadoop server
Answer: _________
Question 56:

In Hadoop, what is the primary role of the ResourceManager in YARN?

A. Manage storage layer
B. Manage computation resources
C. Execute MapReduce jobs
D. Manage Hadoop ecosystem tools
Answer: _________
Question 57:

Which Hadoop component is suitable for handling and processing structured data?

A. HDFS
B. Hive
C. HBase
D. MapReduce
Answer: _________

Answer Key

1: A
2: C
3: B
4: B
5: A
6: D
7: A
8: A
9: C, E
10: C
11: D
12: A, F, J
13: D
14: C
15: A
16: C
17: A
18: C
19: A
20: A
21: D
22: B
23: C
24: B
25: A
26: B
27: A
28: B
29: A
30: B
31: C
32: A
33: B
34: B
35: B
36: A
37: B
38: A
39: C
40: A
41: B
42: C
43: A
44: C
45: D
46: A
47: A
48: B
49: C
50: B
51: A
52: C
53: C
54: A
55: B
56: B
57: B