Skip to content

ML models for tax revenue forecasting in Bangladesh

Dr. Nasim Ahmed :

Accurate tax collection forecasts are crucial for effective fiscal planning, budget stability, and overall macroeconomic health.

In Bangladesh, where revenue performance significantly influences public investment, social programs, and debt management, improving these forecasts can help minimize fiscal surprises and enable more informed policymaking.

Machine learning (ML) provides tools for modeling complex, nonlinear relationships in fiscal data and for incorporating high-frequency, alternative datasets (e.g., trade flows, electronic transactions).

ML methods like tree ensembles, gradient boosting, and neural networks (including LSTM/GRU for sequences) are designed to learn nonlinear mappings and interactions from data and have shown improved accuracy in tax and corporate tax-payment forecasting across various settings.

ML also enables the combination of different data sources (macroeconomic series, customs throughput, VAT receipts, and electronic payment traces) to improve short-term and medium-term predictions.

The primary dependent variable is monthly or quarterly tax revenue, including total amounts and a breakdown by tax type: income tax, VAT, and customs duties. The National Board of Revenue (NBR) and the Ministry of Finance (MoF) regularly publish revenue reports with detailed breakdowns that are useful for setting model targets and benchmarking.

Additional covariates should include:
High-frequency economic indicators include proxies for industrial production, import/export volumes (customs throughput), electricity consumption, and port container traffic.

Fiscal policy and administrative signals include tax rate changes, major amnesty programs, enforcement efforts, and public holidays. Financial indicators cover inflation, exchange rates, interest rates, and credit growth.

Behavioral and administrative data consist of the number of returns filed electronically, audit yields, and arrears collections. Alternative data sources include POS/e-payment volumes and big-data proxies for consumption, where legally accessible.

Data cleaning and feature engineering are essential, including adjustments for seasonality, treatment of structural breaks like policy reforms or one-time collections, lagged features to track tax payment timing, and smoothing noisy monthly data.

The NBR’s published collection series and fiscal reports offer baseline series and targets for model training and validation.

A multi-model approach typically works best: start with simple statistical baselines like seasonal ARIMA and exponential smoothing as benchmarks, then compare those with machine learning models.

Tree-based ensembles (XGBoost, Random Forest, LightGBM): handle mixed data types, are resistant to outliers, and provide feature-importance metrics. Prior work applying XGBoost to the Bangladesh revenue series demonstrated promising predictive performance for total revenue.

Neural sequence models (LSTM, GRU) are ideal for capturing temporal dependencies and long-term memory in tax payments (e.g., quarterly corporate tax installments).

Hybrid approaches: combine ARIMA for modeling linear seasonality and trends with machine learning models to capture residual nonlinear structures. Ensembles and stacking: often lower forecast errors by blending complementary models.

Probabilistic forecasting: use quantile regression forests or Bayesian neural networks to produce prediction intervals, which are crucial for effective fiscal risk management.

Model choice should balance predictive accuracy with interpretability. Tree ensembles offer transparent variable importance and partial dependence plots useful for policymakers; deep nets may improve accuracy but require more data and careful regularization.

Divide historical data into rolling windows to preserve temporal order (walk-forward validation). Evaluate using metrics relevant to fiscal planning: mean absolute error, mean absolute percentage error, and especially the coverage of prediction intervals.

Perform back testing around shock periods (e.g., pandemic year, tax reforms, strikes) to assess robustness.

Stress-test models with counterfactuals such as simulated VAT rate changes, import shocks, or customs disruptions.
Implementing ML forecasting in Bangladesh requires both institutional and technical steps.

Data integration pipeline: automated ingestion of NBR monthly returns, customs data, macroeconomic releases, and e-transaction logs with clear data governance and privacy safeguards.

Dashboarding and scenario tools: interactive dashboards that display point forecasts, intervals, and scenario outcomes (e.g., lower imports, higher inflation).

Model governance: version control, documentation, periodic retraining, and back-testing protocols to identify model drift.

Despite the promise, several constraints remain significant. Data quality and the informal economy: large informal sectors and underreporting make observed collections a noisy indicator of actual tax capacity. ML cannot fix core measurement errors without additional data.

Structural breaks and policy shifts: sudden tax reforms, exemption changes, or institutional disruptions (such as strikes or reorganization of revenue agencies) cause regime shifts that can weaken model performance unless they are specifically accounted for.

Causal interpretation: ML provides accurate associations but does not alone establish causal elasticities; policymakers still need causal analysis for policy decisions.

Ethics and privacy: using transaction-level data requires strict safeguards to ensure compliance with privacy and legal standards.

Resource constraints: developing and maintaining ML systems demands investment in IT infrastructure and human capital.

ML can enhance short-term accuracy, offer probabilistic risk estimates, and incorporate new data streams, helping fiscal actors manage uncertainty and develop adaptive policies.

Combining ML forecasts with economic judgment and causal policy analysis will produce the most reliable and policy-relevant revenue estimates, leading to better budgets and more resilient public finances.

(The author: Associate Professor of Public Policy Bangladesh Institute of Governance and Management Affiliated with the University of Dhaka)