Q1. Which of the following is an example of unstructured data?
A) Excel table
B) Relational database
C) Email text
D) CSV file
Q2. What does data integration aim to provide?
A) Data duplication
B) Unified view of data
C) Data deletion
D) Data compression
Q3. Schema matching identifies
A) Data storage location
B) Data size
C) Query speed
D) Semantic relationships between schemas
Q4. Which is NOT a goal of data preprocessing?
A) Improve data quality
B) Remove noise
C) Increase data duplication
D) Standardize formats
Q5. In mediator-based integration, users query
A) Local schema
B) Global schema
C) Data warehouse
D) Both a and b
Q6. What is a key limitation of structured data?
A) No schema
B) Hard to query
C) Schema rigidity
D) No storage
Q7. Which component translates queries in mediator architecture?
A) Mediator
B) Wrapper
C) Database
D) API
Q8. Which approach does NOT store data centrally?
A) Data warehouse
B) Mediator-based integration
C) Data marts
D) OLAP system
Q9. Given the following schemas, Which is the correct mapping?
Schema A: Customer(ID, Name, Email)
Schema B: Client(ClientID, FullName, ContactEmail)
A) ID → FullName, Name → ClientID, Email → ContactEmail
B) ID → ClientID, Name → FullName, Email → ContactEmail
C) ID → ContactEmail, Name → ClientID, Email → FullName
D) No valid mapping
Q10. Two schemas contain: Schema A: Product(Price) and Schema B: Item(Cost)
What is the relationship between "Price" and "Cost"?
A) Structural match
B) Semantic match
C) Duplicate attribute
D) No relation
Q11. Which case represents structural heterogeneity?
A) Same data, different meaning.
B) Same attribute, different units.
C) Different schema structures.
D) Missing values.
Q12. A company collects the following data: Customer transactions (tables), Social media comments, and JSON API responses. Which classification of data is correct?
A) All are structured.
B) Structured, unstructured, semi-structured.
C) Unstructured, structured, structured.
D) Semi-structured, structured, unstructured.
Q13. A website's system integrates data from: Hospital database, Bank system, and e-commerce industry. Each system is managed independently with its own rules. What is the correct description?
A) Centralized data system.
B) Distributed but controlled system.
C) Single schema environment.
D) Autonomous distributed data sources.
Q14. Which scenario BEST fits structured data?
A) Medical image archive.
B) Email inbox.
C) Relational database of patients.
D) Video streaming platform.
Q15. A dataset contains: Missing values, Different date formats, and Duplicate records. Which preprocessing step BEST addresses these issues?
A) Data cleaning.
B) Data preprocessing.
C) Both a and b.
D) None of these.
Q16. Which component is responsible for accessing individual data sources?
A) Global schema.
B) Mediator.
C) Wrapper.
D) Query interface.
Q17. Before performing schema matching between two systems, preprocessing is required mainly to:
A) Remove inconsistencies and standardize data.
B) Increase data size.
C) Replace schema mapping.
D) Eliminate data source.
Q18. What is the main role of machine learning in schema matching?
A) Store data in databases.
B) Automatically learn schema correspondences.
C) Replace data preprocessing.
D) Remove duplicate data.
Q19. An automated schema matching system produces several incorrect matches. What is the BEST approach to improve accuracy?
A) Use semi-automatic matching.
B) Manual mapping.
C) Automatic mapping.
D) Ignore the results.
Q20. Which of the following is a major challenge in data integration?
A) Data uniformity across all systems.
B) Single centralized database.
C) Identical data formats.
D) Structural and semantic heterogeneity.
تسريب اليوم ميد تكامل ومخازن البيانات اليوم