Research questions
In general, a research question’s nature depends on the goal and type of the pursued analysis. Table 1 provides a useful framework (Source: fig 3.9 in Francom 2024).
Type | Aims | Approach | Evaluation |
---|---|---|---|
Exploratory | Explore: gain insight | Inductive, data-driven, and iterative | Associative |
Predictive | Predict: validate associations | Semi-deductive, data-/ theory-driven, and iterative | Model performance, feature importance, and associative |
Inferential | Explain: test hypotheses | Deductive, theory-driven, and non-iterative | Causal |
EXPLORATORY Research questions
Question 1
Is there a pattern in the WB project document corpus1 that reveals non-random variation in the frequency of certain words, phrases, or policy concepts over time?
1 WB project documents refer here to Project Development Objectives (PDOs), which are brief descriptive texts.
Hypothesis
The hypothesis is that the WB project document corpus exhibits non-random variation in the frequency of specific policy concepts2 over time. This question is approached through a data-driven analysis, where patterns observed in the text data inspire the data interrogation rather than starting with a predetermined assumption.
2 Concepts include “policy focus,” “sector,” “strategy,” or “emerging priority” within the development funding landscape.
Question 2
Is there any external input (whether captured in other official documents or not) that could help “explain” or correlate with any observed non-random patterns in the text data?
For instance, a sudden rise in popularity of a particular policy goal or catchphrase might influence the choice of Project Development Objectives (PDO) in operations over a specific period.
Hypothesis
Framing this question for empirical investigation, I explored the possible correlation between the World Development Reports (WDR)3 and the frequency trends of specific sector and theme words in the PDO text data. In this analysis, the “alternative” hypothesis being tested is that the WDR has a “traction effect” on the PDOs of subsequent fiscal years.
3 WDRs (World Development Reports) are the flagship reports that the World Bank Group has been publishing annually since 1978.
PREDICTIVE Research questions
Question 3
Given the incomplete feature tagging in the WB project document corpus, can predictive classification techniques help address such data limitations?
Hypothesis
The hypothesis is that certain machine learning (ML) techniques can serve to enhance the quality of the source data. Some illustrative analysis has been conducted to predict the missing sector
or theme
tags, based on the text of the PDO description, plus other available metadata variables, testing various ML algorithms.
For now, the primary aim of the study is to EXPLORE (e.g., trends over time in phrase occurrence) and, to a lesser extent, to PREDICT (e.g., using ML to improve the quality of metadata variables). Potential follow-up questions will be shaped by the findings of this initial exploratory phase.