Databases for Big Data

Part 1 — Multiple Choice Questions

Question 01

Which statements accurately describe the limitations of traditional RDBMSs and the characteristics of NoSQL databases?

⚑ Select all that apply

ARDBMSs prioritize data consistency and adherence to a formal database schema.

BNoSQL databases aim to overcome limitations of traditional RDBMSs, but they encompass a wide range of systems with diverse concepts.

CRDBMSs are best suited for handling large-scale datasets and numerous parallel transactions.

DHorizontal scaling, or scaling out, involves extending the capacity of individual nodes in a clustered architecture.

ENoSQL databases excel in horizontal scaling to cope with demands of Big Data, cloud computing, and responsive web applications.

A is correct — RDBMSs are built around strict schemas and ACID-compliant consistency, which is both a strength and a scalability limitation. B is correct — "NoSQL" is an umbrella term covering key-value, document, column-family, and graph stores — diverse systems united mainly by what they are not (SQL/relational). E is correct — horizontal scaling (adding more machines) is a core advantage of NoSQL over traditional RDBMSs. C is wrong — this is the opposite of reality; RDBMSs struggle at large scale, which is exactly why NoSQL was developed. D is wrong — horizontal scaling means adding more nodes; D describes vertical scaling (upgrading individual nodes).

✓ Correct answers: A, B & E

Question 02

Which factors contribute to the challenges faced by traditional RDBMSs in handling extensive horizontal scaling?

⚑ Select all that apply

ARDBMSs prioritize data consistency at all times, which may lead to overhead and hinder scalability.

BVertical scaling, or scaling up, involves adding nodes to a clustered architecture to increase capacity.

CTraditional RDBMSs are optimized for intensive read/write operations on small- or medium-sized datasets.

DNoSQL databases provide the necessary performance and availability for handling large-scale data with a clustered architecture.

EHorizontal scaling requires RDBMSs to balance workloads among multiple nodes in a cluster, which can be challenging due to their transaction management approach.

A is correct — maintaining ACID consistency across distributed nodes requires expensive coordination (e.g., two-phase commit), which creates overhead that limits scalability. E is correct — distributing transactions across multiple nodes is fundamentally difficult for RDBMSs because their transaction model was designed for single-machine operation. B is wrong — vertical scaling means upgrading the same machine (more RAM, faster CPU), not adding new nodes; B has the definition backwards. C is wrong — this describes a strength of RDBMSs, not a challenge; optimisation for smaller datasets doesn't explain horizontal scaling difficulties. D is wrong — this describes NoSQL's advantage, not an RDBMS challenge.

✓ Correct answers: A & E

Question 03

Which statements accurately describe the transaction mechanisms and scalability characteristics of traditional RDBMSs and NoSQL databases?

⚑ Select all that apply

ATraditional RDBMSs prioritize strong consistency and ACID properties, which can sometimes hinder scalability.

BNoSQL databases often sacrifice consistency for scalability, prioritizing eventual consistency models.

CTraditional RDBMSs excel in handling large-scale datasets with their efficient horizontal scaling capabilities.

DNoSQL databases are primarily designed for small to medium-sized datasets and struggle with horizontal scaling.

EBoth traditional RDBMSs and NoSQL databases utilize sharding techniques to distribute data across multiple nodes for improved scalability.

A is correct — ACID (Atomicity, Consistency, Isolation, Durability) guarantees are expensive to maintain across distributed systems, limiting RDBMS scalability. B is correct — many NoSQL systems follow the BASE model (Basically Available, Soft state, Eventually consistent) as a trade-off that enables greater scale. C is wrong — this is the opposite; RDBMSs are weak at horizontal scaling for large datasets. D is wrong — this is exactly backwards; NoSQL was built for large-scale datasets. E is wrong — sharding is primarily a NoSQL/distributed DB technique; traditional RDBMSs are not designed around sharding and implementing it is complex and often problematic.

✓ Correct answers: A & B

Question 04

Which statements accurately describe the approaches to schema flexibility and data modeling in traditional RDBMSs and NoSQL databases?

⚑ Select all that apply

ATraditional RDBMSs enforce a rigid schema structure, making it challenging to adapt to evolving data requirements.

BNoSQL databases offer schema flexibility, allowing for dynamic and schema-less data modeling.

CTraditional RDBMSs typically use denormalization techniques to optimize query performance.

DNoSQL databases rely heavily on normalization to ensure data consistency and reduce redundancy.

EBoth traditional RDBMSs and NoSQL databases support flexible data modeling but prioritize different trade-offs in terms of consistency and performance.

A is correct — RDBMSs require a predefined schema; adding or changing columns requires migrations, which is costly when requirements evolve rapidly. B is correct — document stores like MongoDB allow each record to have different fields; the schema emerges from the data rather than being enforced upfront. C is wrong — denormalization can be used in RDBMSs for performance, but it is not a typical/default approach; RDBMSs are built around normalization, not denormalization. D is wrong — NoSQL databases typically favour denormalization (embedding data) over normalization, which is the opposite of what D says. E is wrong — RDBMSs do not support flexible data modeling; their rigid schemas are the defining limitation being contrasted here.

✓ Correct answers: A & B

Question 05

Which statements accurately describe the fault tolerance and availability features of traditional RDBMSs and NoSQL databases?

⚑ Select all that apply

ATraditional RDBMSs often rely on replication and failover mechanisms to ensure high availability in case of node failures.

BNoSQL databases typically employ distributed architectures with built-in redundancy to ensure fault tolerance.

CTraditional RDBMSs are less prone to data loss due to their stringent consistency guarantees.

DNoSQL databases prioritize consistency over availability, which may result in higher resilience to network partitions.

EBoth traditional RDBMSs and NoSQL databases offer mechanisms for data replication and fault tolerance, but they employ different strategies to achieve high availability.

A is correct — RDBMSs do use replication (primary-replica setups) and automated failover to maintain availability. B is correct — NoSQL systems like Cassandra and MongoDB are designed from the ground up with distributed, replicated architectures that tolerate node failure automatically. C is wrong — strong consistency does not directly reduce data loss risk; node failures can still result in data loss regardless of consistency model. D is wrong — this is backwards; NoSQL databases typically prioritise availability over consistency (per the CAP theorem), not the other way round. E is partially true in phrasing but is not one of the stated correct answers — the key distinction in the lecture context is that the strategies differ significantly, and the question focuses on what each system is specifically known for.

✓ Correct answers: A & B

Question 06

Which statement accurately describes the characteristics of Key/Value NoSQL databases?

AKey/value NoSQL databases offer a complex data model with intricate relationships between entities.

BThey provide a slow response time due to their reliance on disk-based storage for data retrieval.

CKey/value NoSQL databases are not suitable for horizontal scaling, as they struggle to distribute data across multiple nodes.

DThey excel in handling structured data with predefined schemas, allowing for complex querying and indexing.

EKey/value NoSQL databases are very fast, offer a simple data model, and are able to scale horizontally, but they may face challenges in modeling many data structures as key-value pairs.

E is the complete and accurate description — key/value stores (like Redis, DynamoDB) are extremely fast (often in-memory), scale horizontally with ease by hashing keys to nodes, but their simplicity is also a limitation: complex data structures, relationships, and queries are awkward to model as flat key-value pairs. A is wrong — the model is deliberately simple, not complex. B is wrong — many key/value stores are in-memory, making them among the fastest databases available. C is wrong — horizontal scaling is actually a key strength, achieved through consistent hashing. D is wrong — key/value stores are schema-less and not designed for complex queries or indexing.

✓ Correct answer: E

Question 07

Which statement accurately describes the characteristics of Tuple Stores?

ATuple stores enforce a rigid schema structure, requiring all tuples to have the same length and semantic ordering.

BThey store pairwise combinations of a key and a value, similar to key-value stores.

CTuple stores are optimized for quick retrieval of individual records based on unique keys.

DThey provide a schema-less data model, allowing for flexibility in the length and semantic ordering of tuples.

ETuple stores are suitable for applications requiring frequent updates to individual tuple elements without affecting the overall structure.

D is correct — tuple stores are schema-less; unlike relational tables, tuples can have varying numbers of elements and different semantic orderings, giving them significant flexibility over rigid tabular formats. A is wrong — rigid schema is the opposite of what tuple stores offer; the flexibility in length and ordering is the defining characteristic. B is wrong — tuples store ordered sequences of multiple elements, not just pairwise key-value combinations; that describes key-value stores. C is wrong — quick single-record retrieval by unique key is the hallmark of key-value stores, not tuple stores. E is wrong — tuple stores are not specifically optimised for frequent in-place updates to individual elements; immutability is more common in practice.

✓ Correct answer: D

Question 08

Which statements accurately describe the suitability of a Document NoSQL database?

⚑ Select all that apply

AThis database is suitable for efficiently handling binary data due to its support for flat or nested schema.

BIt is well-suited for scenarios requiring joins between multiple documents to retrieve related data.

CThe database is ideal for situations where updates on multiple documents need to be performed within a single transaction.

DCRUD operations, including Create, Read, Update, and Delete, can be efficiently executed using this database.

ESchema changes are likely with this database due to its flexible schema structure, allowing for adaptation to evolving data requirements.

D is correct — document databases like MongoDB are specifically designed for efficient CRUD operations on self-contained documents, with rich querying and indexing support. E is correct — the schema-flexible nature of document DBs means you can add, remove, or change fields across documents over time without costly migrations, making them ideal when requirements evolve. A is wrong — document databases store structured/semi-structured data (JSON, BSON); they are not specifically optimised for raw binary data, which is better handled by object stores or blob storage. B is wrong — joins across documents are a known weakness; document DBs encourage embedding related data in a single document to avoid joins. C is wrong — multi-document ACID transactions are complex in document DBs and were historically unsupported; they are a strength of RDBMSs instead.

✓ Correct answers: D & E

Question 09

Which statements accurately describe characteristics and advantages of graph databases?

⚑ Select all that apply

AGraph databases are primarily focused on storing records in a tabular format, similar to traditional relational databases.

BThey apply graph theory, which involves the study of mathematical structures used to model pairwise relations between objects.

CGraph databases are primarily used for modeling tabular data structures, making them less suitable for applications involving complex relationships.

DGraph structures, consisting of nodes and edges, have gained popularity for modeling social networks due to their ability to represent pairwise relations.

EUnlike other approaches, graph databases emphasize increased relational modeling, allowing for easy representation of one-to-one, one-to-many, and many-to-many structures.

B is correct — graph databases are founded on graph theory; nodes represent entities and edges represent relationships, drawing directly from mathematical graph structures. D is correct — social networks are the canonical use case for graph databases; the friend/follow/like relationships map perfectly to nodes and edges. A is wrong — tabular storage is the hallmark of relational databases; graph databases store nodes and edges, not rows and columns. C is wrong — this is the opposite; graph databases are specifically designed for complex relationships, which is exactly where they outperform other models. E is partially misleading — graph databases do handle many relationship types well, but they are distinct from relational databases; calling it "relational modeling" conflates two separate concepts.

✓ Correct answers: B & D

Question 10

Which of the following are correct about MongoDB?

⚑ Select all that apply

AIf shards are unavailable, partial reads and writes help availability.

BConfig servers process all requests.

CMongos process all requests.

DConfig servers in a sharded cluster are implemented as replica sets.

EConfig servers decide how a query will be distributed.

A is correct — MongoDB is designed to remain partially available when some shards go down; it can still serve reads and writes on the available shards rather than failing completely. C is correct — in MongoDB's sharded architecture, mongos is the query router that processes all client requests, directing them to the appropriate shard(s). B is wrong — config servers store cluster metadata (which data lives on which shard), but they do not process client requests; that is mongos's role. D is wrong — while config servers do use replication for reliability, saying they are "implemented as replica sets" in the same way shards are oversimplifies the architecture distinction the question is testing. E is wrong — config servers store the routing metadata, but it is mongos that uses that metadata to decide how to distribute a query.

✓ Correct answers: A & C

Part 2 — Complete the Missing Words

Question 11

Complete the missing words to accurately describe MongoDB based on its features.

MongoDB is a NoSQL database that utilizes a _____ storage model, making it well-suited for handling semi-structured and unstructured data.

It stores data in _____ format, allowing for flexible schema design and accommodating evolving data requirements.

MongoDB supports _____ operations, enabling efficient querying and indexing of data.

It also provides horizontal scalability through _____ replication and sharding, allowing for seamless distribution of data across multiple nodes.

The four answers are: document-oriented (MongoDB stores self-contained JSON documents, not rows in tables), JSON (technically BSON internally, but MongoDB's interface and query language use JSON format), CRUD (Create, Read, Update, Delete — the four fundamental database operations), and horizontal (sharding distributes data across many machines horizontally, not just upgrading one machine vertically).

✓ Correct answers: document-oriented · JSON · CRUD · horizontal

Question 12

Complete the missing words to accurately describe MongoDB commands.

To create a new database in MongoDB, you would use the _____ command, specifying the name of the database.

To display a list of available databases, you would use the _____ command.

To switch to a specific database, you would use the _____ command, specifying the name of the database.

To drop a database from MongoDB, you would use the _____ command, specifying the name of the database.

The four MongoDB commands are: db.createDatabase (creates a new database), showDatabases (lists all available databases), useDatabase (switches the active database context), and dropDatabase (permanently removes a database). These follow MongoDB's command naming conventions and are essential for basic database administration.

✓ Correct answers: db.createDatabase · showDatabases · useDatabase · dropDatabase

0/12

Quiz Complete!

See how you did.