Research questions

In general, a research question’s nature depends on the goal and type of the pursued analysis. Table 1 provides a useful framework (Source: fig 3.9 in Francom 2024).

Table 1: Overview of analysis types
Type Aims Approach Evaluation
Exploratory Explore: gain insight Inductive, data-driven, and iterative Associative
Predictive Predict: validate associations Semi-deductive, data-/ theory-driven, and iterative Model performance, feature importance, and associative
Inferential Explain: test hypotheses Deductive, theory-driven, and non-iterative Causal

EXPLORATORY Research questions

Question 1

Is there a pattern in the WB project document corpus1 that reveals non-random variation in the frequency of certain words, phrases, or policy concepts over time?

1 WB project documents refer here to Project Development Objectives (PDOs), which are brief descriptive texts.

Hypothesis

The hypothesis is that the WB project document corpus exhibits non-random variation in the frequency of specific policy concepts2 over time. This question is approached through a data-driven analysis, where patterns observed in the text data inspire the data interrogation rather than starting with a predetermined assumption.

2 Concepts include “policy focus,” “sector,” “strategy,” or “emerging priority” within the development funding landscape.

Question 2

Is there any external input (whether captured in other official documents or not) that could help “explain” or correlate with any observed non-random patterns in the text data?

For instance, a sudden rise in popularity of a particular policy goal or catchphrase might influence the choice of Project Development Objectives (PDO) in operations over a specific period.

Hypothesis

Framing this question for empirical investigation, I explored the possible correlation between the World Development Reports (WDR)3 and the frequency trends of specific sector and theme words in the PDO text data. In this analysis, the “alternative” hypothesis being tested is that the WDR has a “traction effect” on the PDOs of subsequent fiscal years.

3 WDRs (World Development Reports) are the flagship reports that the World Bank Group has been publishing annually since 1978.

PREDICTIVE Research questions

Question 3

Given the incomplete feature tagging in the WB project document corpus, can predictive classification techniques help address such data limitations?

Hypothesis

The hypothesis is that certain machine learning (ML) techniques can serve to enhance the quality of the source data. Some illustrative analysis has been conducted to predict the missing sector or theme tags, based on the text of the PDO description, plus other available metadata variables, testing various ML algorithms.


For now, the primary aim of the study is to EXPLORE (e.g., trends over time in phrase occurrence) and, to a lesser extent, to PREDICT (e.g., using ML to improve the quality of metadata variables). Potential follow-up questions will be shaped by the findings of this initial exploratory phase.

References

Francom, Jerid. 2024. An Introduction to Quantitative Text Analysis for Linguistics: Reproducible Research Using R. 1st ed. London: Routledge. https://doi.org/10.4324/9781003393764.