Worksheet Options

Include Answer Key

Include Explanations

Introduction To Hadoop

Name: _____________________

Date: _____________________

Instructions: Answer all questions. Write your answers clearly in the space provided.

Question 1:

Hadoop achieves reliability by replicating the data across multiple hosts and hence does not require . . . . . . . . storage on hosts.

A. RAID

B. Standard RAID levels

C. ZFS

D. Operating system

Answer: _________

Question 2:

. . . . . . . . is a framework for performing remote procedure calls and data serialization.

A. Drill

B. BigTop

C. Avro

D. Chukwa

Answer: _________

Question 3:

All of the following accurately describe Hadoop, EXCEPT . . . . . . . .

A. Open-source

B. Real-time

C. Java-based

D. Distributed computing approach

Answer: _________

Question 4:

Sun also has the Hadoop Live CD . . . . . . . . project, which allows running a fully functional Hadoop cluster using a live CD.

A. OpenOffice.org

B. OpenSolaris

C. GNU

D. Linux

Answer: _________

Question 5:

Above the file systems comes the . . . . . . . . engine, which consists of one Job Tracker, to which client applications submit MapReduce jobs.

A. MapReduce

B. Google

C. Functional programming

D. Facebook

Answer: _________

Question 6:

. . . . . . . . is the most popular high-level Java API in Hadoop Ecosystem

A. Scalding

B. HCatalog

C. Cascalog

D. Cascading

Answer: _________

Question 7:

What license is Hadoop distributed under?

A. Apache License 2.0

B. Mozilla Public License

C. Shareware

D. Commercial

Answer: _________

Question 8:

Facebook Tackles Big Data With . . . . . . . . based on Hadoop.

A. 'Project Prism'

B. 'Prism'

C. 'Project Big'

D. 'Project Data'

Answer: _________

Question 9:

Point out the wrong statement.

A. Hardtop processing capabilities are huge and its real advantage lies in the ability to process terabytes & petabytes of data

B. Hadoop uses a programming model called "MapReduce", all the programs should conform to this model in order to work on the Hadoop platform

C. The programming model, MapReduce, used by Hadoop is difficult to write and test

D. All of the mentioned

E. Elastic MapReduce (EMR) is Facebook's packaged Hadoop offering

F. Amazon Web Service Elastic MapReduce (EMR) is Amazon's packaged Hadoop offering

G. Scalding is a Scala API on top of Cascading that removes most Java boilerplate

H. All of the mentioned

Answer: _________

Question 10:

. . . . . . . . has the world's largest Hadoop cluster.

A. Apple

B. Datamatics

C. Facebook

D. None of the mentioned

Answer: _________

Question 11:

IBM and . . . . . . . . have announced a major initiative to use Hadoop to support university courses in distributed computer programming.

A. Google Latitude

B. Android (operating system)

C. Google Variations

D. Google

Answer: _________

Question 12:

Point out the correct statement.

A. Hive is not a relational database, but a query engine that supports the parts of SQL specific to querying data

B. Hive is a relational database with SQL support

C. Pig is a relational database with SQL support

D. All of the mentioned

E. Hadoop do need specialized hardware to process the data

F. Hadoop 2.0 allows live stream processing of real-time data

G. In the Hadoop programming framework output files are divided into lines or records

H. None of the mentioned

I. Hadoop is an ideal environment for extracting and transforming small volumes of data

J. Hadoop stores data in HDFS and supports data compression/decompression

K. The Giraph framework is less useful than a MapReduce job to solve graph and machine learning

L. None of the mentioned

Answer: _________

Question 13:

As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including . . . . . . . .

A. Improved data storage and information retrieval

B. Improved extract, transform and load features for data integration

C. Improved data warehousing functionality

D. Improved security, workload management, and SQL support

Answer: _________

Question 14:

Which of the following platforms does Hadoop run on?

A. Bare metal

B. Debian

C. Cross-platform

D. Unix-like

Answer: _________

Question 15:

. . . . . . . . is general-purpose computing model and runtime system for distributed data analytics.

A. Mapreduce

B. Drill

C. Oozie

D. None of the mentioned

Answer: _________

Question 16:

What was Hadoop written in?

A. Java (software platform)

B. Perl

C. Java (programming language)

D. Lua (programming language)

Answer: _________

Question 17:

Hadoop is a framework that works with a variety of related tools. Common cohorts include . . . . . . . .

A. MapReduce, Hive and HBase

B. MapReduce, MySQL and Google Apps

C. MapReduce, Hummer and Iguana

D. MapReduce, Heron and Trumpet

Answer: _________

Question 18:

What was Hadoop named after?

A. Creator Doug Cutting's favorite circus act

B. Cutting's high school rock band

C. The toy elephant of Cutting's son

D. A sound Cutting's laptop made during Hadoop development

Answer: _________

Question 19:

The Hadoop list includes the HBase database, the Apache Mahout . . . . . . . . system, and matrix operations.

A. Machine learning

B. Pattern recognition

C. Statistical classification

D. Artificial intelligence

Answer: _________

Question 20:

What is the primary function of Apache ZooKeeper in Hadoop?

A. Configuration management

B. Real-time data processing

C. High-speed data ingestion

D. Data storage in HBase

Answer: _________

Question 21:

Which programming paradigm is central to Hadoop processing?

A. Object-oriented programming (OOP)

B. Procedural programming

C. Declarative programming

D. Functional programming

Answer: _________

Question 22:

What is the purpose of the ResourceManager in Hadoop YARN?

A. Manage storage layer

B. Manage computation resources

C. Execute MapReduce jobs

D. Manage Hadoop ecosystem tools

Answer: _________

Question 23:

Which Hadoop ecosystem tool is used for querying and analyzing large datasets?

A. Pig

B. Sqoop

C. Hive

D. HBase

Answer: _________

Question 24:

What is the role of the Hadoop JobTracker in MapReduce?

A. Manage resource allocation

B. Manage task execution in MapReduce jobs

C. Manage HDFS metadata

D. Manage ZooKeeper configurations

Answer: _________

Question 25:

In Hadoop, what is the purpose of the DataNode?

A. Store data blocks

B. Manage computation resources

C. Execute MapReduce jobs

D. Manage HDFS metadata

Answer: _________

Question 26:

Which Apache project in Hadoop is designed for real-time data streaming?

A. Spark

B. Flume

C. HBase

D. Drill

Answer: _________

Question 27:

What is the function of the Hadoop ResourceManager?

A. Manage resource allocation

B. Store data blocks

C. Execute MapReduce jobs

D. Manage Hadoop ecosystem tools

Answer: _________

Question 28:

In Hadoop, what is the purpose of the Hadoop Distributed File System (HDFS)?

A. Real-time data processing

B. Store and manage large volumes of data

C. Distribute computation resources

D. Manage configuration files

Answer: _________

Question 29:

Which component of Hadoop is responsible for resource management?

A. ResourceManager

B. NameNode

C. DataNode

D. Secondary NameNode

Answer: _________

Question 30:

What is Apache ZooKeeper used for in Hadoop?

A. Data ingestion

B. Configuration management

C. Querying and analysis

D. Real-time data processing

Answer: _________

Question 31:

What is the purpose of Apache Ambari in the Hadoop ecosystem?

A. Data processing

B. Data integration

C. Cluster management and monitoring

D. Scripting in Hadoop

Answer: _________

Question 32:

Which Apache project is commonly used for real-time data processing in Hadoop?

A. Apache Spark

B. Apache Flume

C. Apache HBase

D. Apache Drill

Answer: _________

Question 33:

What does the term "Hadoop ecosystem" refer to?

A. The hardware infrastructure

B. The software stack built on top of Hadoop

C. The process of installing Hadoop

D. The networking configuration

Answer: _________

Question 34:

In Hadoop, what does the term "JobTracker" refer to?

A. Manages resource allocation

B. Manages task execution in MapReduce jobs

C. Manages HDFS metadata

D. Manages ZooKeeper configurations

Answer: _________

Question 35:

What is the purpose of the Hadoop Secondary NameNode?

A. Manage computation resources

B. Store a backup of the entire HDFS metadata

C. Failover for the NameNode

D. Manage ZooKeeper configurations

Answer: _________

Question 36:

Which Apache project is used for workflow automation in Hadoop?

A. Oozie

B. Flume

C. Spark

D. NiFi

Answer: _________

Question 37:

What is the main function of the Hadoop Distributed File System (HDFS)?

A. Real-time data processing

B. Store and manage large volumes of data

C. Distribute computation resources

D. Manage configuration files

Answer: _________

Question 38:

Which Hadoop ecosystem tool is used for managing and monitoring Hadoop clusters?

A. Ambari

B. Pig

C. Drill

D. Cassandra

Answer: _________

Question 39:

In the context of Hadoop, what does the term "shuffle" refer to?

A. A data compression technique

B. The process of distributing data to reducers

C. A phase in MapReduce processing

D. An optimization in HBase storage

Answer: _________

Question 40:

. . . . . . . . can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data.

A. MapReduce

B. Mahout

C. Oozie

D. All of the mentioned

Answer: _________

Question 41:

Hive also support custom extensions written in . . . . . . . .

A. C#

B. Java

C. C

D. C++

Answer: _________

Question 42:

. . . . . . . . hides the limitations of Java behind a powerful and concise Clojure API for Cascading.

A. Scalding

B. HCatalog

C. Cascalog

D. All of the mentioned

Answer: _________

Question 43:

The Pig Latin scripting language is not only a higher-level data flow language but also has operators similar to . . . . . . . .

A. SQL

B. JSON

C. XML

D. All of the mentioned

Answer: _________

Question 44:

. . . . . . . . is a platform for constructing data flows for extract, transform, and load (ETL) processing and analysis of large datasets.

A. Pig Latin

B. Oozie

C. Pig

D. Hive

Answer: _________

Question 45:

. . . . . . . . jobs are optimized for scalability but not latency.

A. Mapreduce

B. Drill

C. Oozie

D. Hive

Answer: _________

Question 46:

According to analysts, for what can traditional IT systems provide a foundation when they're integrated with big data technologies like Hadoop?

A. Big data management and data mining

B. Data warehousing and business intelligence

C. Management of Hadoop clusters

D. Collecting and storing unstructured data

Answer: _________

Question 47:

Which of the following genres does Hadoop produce?

A. Distributed file system

B. JAX-RS

C. Java Message Service

D. Relational Database Management System

Answer: _________

Question 48:

What is Hadoop primarily designed for?

A. Real-time data processing

B. Batch processing of large datasets

C. Structured data storage

D. In-memory caching

Answer: _________

Question 49:

What is the core component of Hadoop responsible for distributed storage?

A. YARN

B. HDFS

C. MapReduce

D. Hive

Answer: _________

Question 50:

Which programming language is commonly used for writing MapReduce programs?

A. Python

B. Java

C. Ruby

D. C++

Answer: _________

Question 51:

What does HDFS stand for in the context of Hadoop?

A. Hadoop Distributed File System

B. High-level Data Format System

C. Hadoop Data Flow System

D. High-Density File Storage

Answer: _________

Question 52:

In Hadoop, what is the purpose of a NameNode?

A. Store data blocks

B. Manage computation resources

C. Manage metadata of HDFS

D. Execute MapReduce jobs

Answer: _________

Question 53:

What is the default replication factor in HDFS?

A. 1

B. 2

C. 3

D. 4

Answer: _________

Question 54:

Which Apache project is used for data ingestion from relational databases?

A. Sqoop

B. Flume

C. Pig

D. Hive

Answer: _________

Question 55:

What does the term "MapReduce" refer to in Hadoop?

A. A specific Hadoop cluster

B. A programming model and processing engine

C. A data storage format in Hadoop

D. A type of Hadoop server

Answer: _________

Question 56:

In Hadoop, what is the primary role of the ResourceManager in YARN?

A. Manage storage layer

B. Manage computation resources

C. Execute MapReduce jobs

D. Manage Hadoop ecosystem tools

Answer: _________

Question 57:

Which Hadoop component is suitable for handling and processing structured data?

A. HDFS

B. Hive

C. HBase

D. MapReduce

Answer: _________

Answer Key

1: A

2: C

3: B

4: B

5: A

6: D

7: A

8: A

9: C, E

10: C

11: D

12: A, F, J

13: D

14: C

15: A

16: C

17: A

18: C

19: A

20: A

21: D

22: B

23: C

24: B

25: A

26: B

27: A

28: B

29: A

30: B

31: C

32: A

33: B

34: B

35: B

36: A

37: B

38: A

39: C

40: A

41: B

42: C

43: A

44: C

45: D

46: A

47: A

48: B

49: C

50: B

51: A

52: C

53: C

54: A

55: B

56: B

57: B