Introduction to Data Science

Question 01

What is data science about?

AAnalyzing small data sets to improve personal decision-making

BExtraction of useful information and knowledge from large volumes of data to improve business decision-making

CCollecting data for statistical analysis only

DCreating data visualization tools

EUsing data to make informed decisions

Data science is fundamentally about extracting meaningful insights from large data sets (B) to support smarter decisions, while at its broadest it also encompasses using any data to make better decisions (E). Options A–D are either too narrow (stats only, viz tools) or miss the scale aspect (small datasets). Data science typically deals with large, complex data and aims to create value from it.

✓ Correct answers: B & E

Question 02

Which of the following is/are NOT task(s) included in using data?

ACollect

BStore

CManage

DSample

ESerialize

The core lifecycle tasks for using data are Collect → Store → Manage → Analyze → Use. Sampling (D) is a statistical technique applied within analysis, not a top-level data task. Serialization (E) is a programming/storage encoding concept (like JSON or Pickle), not a standard data workflow step. Both are real operations but don't belong to the canonical task taxonomy taught in this course.

✓ Correct answers: D & E

Question 03

Which of the following describes "Machine-generated" data?

AData generated from human interactions with systems

BData generated from software systems and hardware devices

CData generated from social media interactions

DData generated from emails and messages

EData generated from sensors

Machine-generated data comes directly from automated systems without direct human authorship. Software logs, hardware device outputs (B), and IoT sensor readings (E) are classic examples — they're produced continuously and automatically. Social media, emails, and messages are created by humans (even if delivered digitally), so they're classified as human-generated data.

✓ Correct answers: B & E

Question 04

Which of the following is an example of unstructured data?

ARelational database tables

BBanking transactions

CElectronic health records

DTextual or binary data stored as BLOBs in a DBMS

EStructured datasets managed using a DBMS

Unstructured data lacks a predefined schema or model. BLOBs (Binary Large OBjects) stored in a database (D) — such as images, PDFs, videos, or raw text — have no inherent row/column structure. All other options (relational tables, banking records, EHRs, DBMS-managed datasets) have well-defined schemas that make them structured or semi-structured data.

✓ Correct answer: D

Question 05

What are the 5 V's of Big Data?

AVolume, Velocity, Variety, Veracity, Value

BVolume, Versatility, Variety, Validity, Value

CVelocity, Versatility, Veracity, Value, Variety

DVolume, Versatility, Validity, Value, Veracity

EVolume, Velocity, Versatility, Variety, Value

The canonical 5 V's framework characterises Big Data challenges: Volume (scale of data), Velocity (speed of generation), Variety (different types/formats), Veracity (trustworthiness/quality of data), and Value (the business worth extracted). Words like "Versatility" or "Validity" don't belong to the standard framework — watch out for these distractors.

✓ Correct answer: A

Question 06

Which of the following is a task of Prescriptive Analytics?

ALinear Regression

BStatistical hypothesis testing

CGraph-theoretic computations

DLinear programming

ESequence rule mining

Prescriptive analytics answers "What should we do?" — it recommends optimal actions. Linear programming (D) is a classic optimisation technique used to prescribe best decisions. Graph-theoretic computations (C) (e.g., shortest path, network flow) are used to prescribe routing or resource allocation decisions. Linear regression and hypothesis testing are predictive/descriptive, and sequence rule mining is predictive/pattern-discovery.

✓ Correct answers: C & D

Question 07

What does association rule mining aim to discover?

APredictive models for future trends

BStatistical significance in datasets

CRelationships between items in a dataset

DSequence patterns in data

EClustering of similar data points

Association rule mining (e.g., the Apriori algorithm) finds co-occurrence relationships between items — things that tend to appear together. The classic example: "customers who buy bread and butter also tend to buy milk." This is expressed as rules like bread, butter → milk, with metrics like support and confidence. It doesn't predict sequences (D, which is sequence mining) or cluster points (E, which is clustering).

✓ Correct answer: C

Question 08

What is the main goal of descriptive analytics?

ATo answer why something happened

BTo predict what will happen in the future

CTo determine what actions to take

DTo summarize and describe past data

ETo optimize data processing techniques

The four analytics types each answer a different question: Descriptive → "What happened?" (D), Diagnostic → "Why did it happen?" (A), Predictive → "What will happen?" (B), Prescriptive → "What should we do?" (C). Descriptive analytics uses tools like dashboards, reports, and summary statistics to characterise historical data — it's the foundation for all other analytics types.

✓ Correct answer: D

Question 09

What type of data is generated from social media interactions?

AMachine-generated data

BHuman-generated data

CStructured data

DUnstructured data

EMetadata

Social media content (posts, comments, likes, photos, videos) is created by humans (B), making it human-generated. It is also predominantly unstructured (D) — free-form text, images, and multimedia that don't fit neatly into rows and columns. While social platforms store some structured metadata (timestamps, user IDs), the content itself is unstructured. It is not machine-generated (A), which would imply automated/sensor origin.

✓ Correct answers: B & D

Question 10

Which of the following questions is NOT an example of asking good questions from the data?

AWhat patterns can you learn from a given dataset?

BWhat do people really want to know?

CWhat datasets might get you to your answers?

DHow to ignore irrelevant data?

EHow to group similar data points together?

Good data science questions are framed around what you want to discover or achieve. Questions A, B, C, and E are all goal-oriented: finding patterns, understanding user needs, sourcing the right data, or grouping data. Option D — "How to ignore irrelevant data?" — is a technical preprocessing step, not a question that guides the analytics process. Good questions should drive what you're investigating, not pre-emptively filter what you'll look at.

✓ Correct answer: D