Technical writing
The features behind Voidly's 7-day shutdown forecast: political calendar, sanctions timelines, and network telemetry
The shutdown-forecasting article covers the model architecture — the ARIMA + XGBoost ensemble, per-country calibration, and the reliability scoring that sits alongside every published probability. This article is about what the model actually sees: how 47 input features are constructed for each (country, forecast_date) pair before a single prediction is made.
The key insight behind the feature set is that internet shutdowns almost never happen randomly. They are correlated with political stress events — elections under authoritarian pressure, mass protest movements, international crises — and network telemetry often shows early warning signs hours before a government announces or executes a shutdown. The feature engineering task is to convert that intuition into a consistent, automatable signal across 200 countries with varying data quality and institutional transparency.
What follows is a section-by-section breakdown of all four feature groups: political calendar, sanctions and diplomatic pressure, network telemetry, and historical pattern features. For each group we show representative code, explain the data sources, and note where coverage gaps force the model to handle missingness explicitly.
Feature categories overview
The 47 features are organized into four groups. The table below shows how many features each group contributes and which single feature within each group carries the highest XGBoost importance score.
| Feature group | Feature count | Top feature (by XGBoost importance) |
|---|---|---|
| Political calendar | 14 | days_to_nearest_election |
| Sanctions / diplomatic pressure | 11 | ofac_designation_30d |
| Network telemetry | 13 | probe_measurement_rate_delta |
| Historical pattern | 9 | shutdown_history_3y |
Political calendar features
Political events are the strongest leading indicators of internet shutdowns in the training corpus. The political calendar group contains 14 features, but three dominate XGBoost importance rankings: election proximity, election competitiveness, and protest intensity.
days_to_nearest_election
For every country, Voidly ingests election dates from two sources: the Electoral Integrity Project API and Wikipedia election calendars, scraped monthly and diff-checked against the prior month's version. The feature is signed: a value of -3 means the election was three days ago; a value of +5 means the election is five days away. Shutdowns cluster heavily in the -3 to +2 day window relative to election day — a pattern consistent across every high-risk region in the training corpus.
election_competitiveness_score
A 0–1 score derived from two external indices: the EIU Democracy Index and the V-Dem electoral fraud index. The score is highest in hybrid regimes where elections occur but are neither free nor fully controlled — the category with the highest empirical shutdown rate. Fully authoritarian states with no real elections and full democracies with no shutdown history both produce lower scores. The feature captures the "competitive enough to threaten the incumbent but not free enough to stop interference" dynamic that precedes most election-adjacent shutdowns.
protest_intensity_7d
A rolling 7-day GDELT protest event count, normalized by country population. GDELT events are filtered to CAMEO codes in the 14x series (protest / demonstration events), then weighted by news source diversity: an event reported by 20 different outlets receives higher weight than the same event reported by one state-adjacent agency. The result is a daily intensity score per country that rises meaningfully during genuine protest waves and stays low during ordinary political activity.
import requests
from datetime import date, timedelta
def build_political_features(country_code: str, forecast_date: date) -> dict:
"""
Constructs political calendar features for a (country, date) pair.
Returns a dict of feature_name -> float for the XGBoost input vector.
"""
features = {}
# --- days_to_nearest_election ---
# Pull from Electoral Integrity Project API (primary) + Wikipedia (fallback)
eip_elections = fetch_eip_elections(country_code)
wiki_elections = fetch_wikipedia_elections(country_code)
all_elections = merge_election_calendars(eip_elections, wiki_elections)
if all_elections:
deltas = [(e - forecast_date).days for e in all_elections]
# Signed: negative = past, positive = future
# Use the nearest election in either direction
features['days_to_nearest_election'] = min(deltas, key=abs)
else:
features['days_to_nearest_election'] = float('nan') # handled by XGBoost default split
# --- election_competitiveness_score ---
eiu_score = EIU_DEMOCRACY_INDEX.get(country_code, float('nan'))
vdem_fraud = VDEM_ELECTORAL_FRAUD.get(country_code, float('nan'))
if not (eiu_score != eiu_score or vdem_fraud != vdem_fraud): # nan check
# High fraud + mid-democracy = high competitiveness risk
features['election_competitiveness_score'] = vdem_fraud * (1 - abs(eiu_score - 0.5))
else:
features['election_competitiveness_score'] = float('nan')
# --- protest_intensity_7d ---
# GDELT query: CAMEO code 14x events for country, past 7 days
window_start = forecast_date - timedelta(days=7)
gdelt_url = (
f"https://api.gdeltproject.org/api/v2/doc/doc"
f"?query=sourceCountry:{country_code}+eventRootCode:14"
f"&mode=artlist&maxrecords=250&startdatetime={window_start:%Y%m%d}000000"
f"&enddatetime={forecast_date:%Y%m%d}235959&format=json"
)
resp = requests.get(gdelt_url, timeout=10)
articles = resp.json().get('articles', []) if resp.ok else []
# Weight by unique source domain count (diversity proxy)
domains = {a['domain'] for a in articles}
raw_count = len(articles)
diversity_weight = min(len(domains) / max(raw_count, 1) * 2, 1.0)
population = POPULATION_LOOKUP.get(country_code, 1_000_000)
features['protest_intensity_7d'] = (raw_count * diversity_weight) / (population / 1_000_000)
return featuresSanctions and diplomatic pressure features
The intuition behind the sanctions feature group is that external diplomatic pressure is a leading indicator of the political crisis conditions that precede shutdowns. OFAC designations are among the earliest public signals that a government is under serious international pressure — they often precede the protest escalation or political crisis that eventually triggers a connectivity event.
ofac_designation_30d
Count of new OFAC SDN designations for individuals or entities in the target country in the past 30 days. This feature is driven by the same daily conditional GET pipeline described in the OFAC SDN integration article — the system tracks additions to the SDN list and tags each new entry with a country code derived from the entity's nationality, program, and address fields. A surge of new designations against a country's officials or state-linked entities is a meaningful signal of escalating pressure.
ofac_total_sdn_count
Total SDN entries associated with the country as of the forecast date. This is a baseline-level feature rather than a time-delta feature: countries with a large existing SDN footprint (Iran, Russia, North Korea, Syria) have a higher structural shutdown risk that ofac_designation_30d alone would not capture in quiet periods between designation surges.
un_resolution_90d
Count of UN Security Council resolutions mentioning the country in the past 90 days, sourced from the UN Digital Library API. Resolutions are a lagging rather than leading indicator — they typically follow crises already underway — but they correlate with sustained international pressure that can precede escalation.
diplomatic_isolation_score
Derived from UN General Assembly voting alignment scores using a DW-NOMINATE-style analysis of UNGA roll-call votes. Countries that consistently vote against a majority coalition receive a higher isolation score, which correlates empirically with higher shutdown propensity in the training data. The score is updated annually as each UNGA session concludes.
from datetime import date, timedelta
from typing import Optional
def build_sanctions_features(country_code: str, forecast_date: date) -> dict:
"""
Constructs OFAC / sanctions features for a (country, date) pair.
Queries the pre-processed SDN change feed stored in the local Postgres instance.
"""
features = {}
window_30d_start = forecast_date - timedelta(days=30)
window_90d_start = forecast_date - timedelta(days=90)
# ofac_designation_30d: new SDN entries in the past 30 days
rows_30d = DB.execute(
"""
SELECT COUNT(*) AS cnt
FROM sdn_change_feed
WHERE country_code = %s
AND change_type = 'ADD'
AND change_date >= %s
AND change_date < %s
""",
(country_code, window_30d_start, forecast_date)
).fetchone()
features['ofac_designation_30d'] = rows_30d['cnt'] if rows_30d else 0
# ofac_total_sdn_count: total active SDN entries as of forecast_date
rows_total = DB.execute(
"""
SELECT COUNT(*) AS cnt
FROM sdn_entries
WHERE country_code = %s
AND effective_date <= %s
AND (removal_date IS NULL OR removal_date > %s)
""",
(country_code, forecast_date, forecast_date)
).fetchone()
features['ofac_total_sdn_count'] = rows_total['cnt'] if rows_total else 0
# un_resolution_90d: UNSC resolutions mentioning the country in past 90 days
rows_un = DB.execute(
"""
SELECT COUNT(*) AS cnt
FROM un_security_council_resolutions
WHERE %s = ANY(country_mentions)
AND resolution_date >= %s
AND resolution_date < %s
""",
(country_code, window_90d_start, forecast_date)
).fetchone()
features['un_resolution_90d'] = rows_un['cnt'] if rows_un else 0
# diplomatic_isolation_score: annual UNGA vote alignment score
features['diplomatic_isolation_score'] = UNGA_ISOLATION_SCORES.get(
(country_code, forecast_date.year), float('nan')
)
return featuresNetwork telemetry features
The network telemetry group contains the most operationally immediate signals — features that can change within hours as a shutdown begins. Because they are computed from real-time data streams, they require careful baseline normalization to distinguish genuine anomalies from routine measurement variance.
bgp_withdrawal_rate_7d
Count of BGP prefix withdrawals from country ASNs over the past 7 days, normalized by the expected routing table size for that country (the 90-day median of active prefixes). Data comes from RIPE NCC RIS and RouteViews, aggregated via the IODA API. A high value means the country's ASNs have been withdrawing routes at an elevated rate relative to their normal BGP behavior — a pattern that precedes full shutdowns when the withdrawals are clustered rather than distributed across unrelated ASes.
probe_measurement_rate_delta
The ratio of the current probe measurement rate (measurements per hour from probes in the country) to the 30-day baseline rate. A sudden drop below 0.6 — meaning the probes in a country are completing fewer than 60% of their normal measurement volume — is a strong precursor signal. The mechanism is that network throttling or partial shutdowns cause probes to time out or fail to establish connections, reducing their throughput before a full shutdown cuts them off entirely. This feature uses Voidly's own probe telemetry rather than external data sources, which gives it a higher freshness cadence than most other features in the set.
blocking_rate_trend
The linear regression slope of the daily blocking rate over the past 14 days, measured in units of blocking-rate-per-day. A rising slope above +0.01 per day indicates that an increasing fraction of probe measurements are returning blocking signals — DNS interference, TLS interruption, or HTTP anomalies — which predicts further escalation. A flat or declining slope is neutral or negative for shutdown probability.
throttling_incident_count_7d
Count of confirmed throttling incidents in the past 7 days, defined as sustained bandwidth collapse on one or more named services without a corresponding BGP withdrawal. Throttling before a full shutdown is a documented pattern in both Iran and Russia — the government uses targeted bandwidth suppression as a warning or soft measure before authorizing a harder connectivity cut. This feature captures that escalation ladder.
-- TimescaleDB query: probe_measurement_rate_delta for a (country, date) pair
-- Computes the ratio of recent measurement rate to 30-day baseline
-- Run at forecast_date for each country in the scoring batch
WITH recent AS (
SELECT
country_code,
COUNT(*) AS measurements_recent
FROM probe_measurements
WHERE country_code = :country_code
AND measured_at >= :forecast_date::date - INTERVAL '24 hours'
AND measured_at < :forecast_date::date
GROUP BY country_code
),
baseline AS (
SELECT
country_code,
COUNT(*) / 30.0 AS measurements_per_day_baseline
FROM probe_measurements
WHERE country_code = :country_code
AND measured_at >= :forecast_date::date - INTERVAL '30 days'
AND measured_at < :forecast_date::date - INTERVAL '1 day'
GROUP BY country_code
)
SELECT
r.country_code,
r.measurements_recent,
b.measurements_per_day_baseline,
CASE
WHEN b.measurements_per_day_baseline = 0 THEN NULL
ELSE r.measurements_recent / b.measurements_per_day_baseline
END AS probe_measurement_rate_delta
FROM recent r
JOIN baseline b USING (country_code);A probe_measurement_rate_delta of 1.0 means the country's probes are running at normal throughput. A value of 0.4 means only 40% of normal measurement volume is completing — the threshold below which the feature starts contributing meaningfully to shutdown probability is 0.6, corresponding to a 40% drop from baseline.
Historical pattern features
The historical pattern group captures what a country's own shutdown record implies about future probability. The strongest effect in the training data is simple recidivism: a country that has executed an internet shutdown before is far more likely to do so again than a country with no shutdown history, even controlling for political regime type.
shutdown_history_3y
Count of verified shutdown events in the past 3 years, sourced from the Voidly event archive and cross-checked against OONI, CensoredPlanet, and Freedom House's Freedom on the Net database. Only events reaching Corroborated or Confirmed tier in the Voidly verification system are counted. The feature is a raw count rather than a binary flag because the count itself carries information: a country with 8 shutdowns in 3 years (Myanmar post-coup) is categorically different from one with 1 (a country experimenting for the first time).
annual_shutdown_cycle
A cyclical encoding of day-of-year using sine and cosine transforms. Some countries — notably Iran and Ethiopia — show seasonal shutdown patterns correlated with recurring political cycles: exam periods, protest anniversaries, or legislative calendars. A simple day-of-year integer feature would not capture the cyclical topology (day 1 and day 365 are adjacent, not far apart), so the sin/cos pair is used to represent position on the annual circle without discontinuity.
import math
from datetime import date
def compute_annual_shutdown_cycle(forecast_date: date) -> tuple[float, float]:
"""
Returns (sin_doy, cos_doy) for the forecast_date's day-of-year.
The pair encodes cyclical position on the annual calendar without
discontinuity at the year boundary.
Day 1 (Jan 1) -> sin=0.0, cos=1.0
Day 91 (Apr 1) -> sin≈1.0, cos≈0.0
Day 182 (Jul 1) -> sin≈0.0, cos≈-1.0
Day 274 (Oct 1) -> sin≈-1.0, cos≈0.0
"""
doy = forecast_date.timetuple().tm_yday # 1-366
days_in_year = 366 if _is_leap_year(forecast_date.year) else 365
angle = 2 * math.pi * doy / days_in_year
return math.sin(angle), math.cos(angle)
def _is_leap_year(year: int) -> bool:
return year % 4 == 0 and (year % 100 != 0 or year % 400 == 0)
# Usage in the feature pipeline:
sin_doy, cos_doy = compute_annual_shutdown_cycle(forecast_date)
features['annual_shutdown_cycle_sin'] = sin_doy
features['annual_shutdown_cycle_cos'] = cos_doydays_since_last_shutdown
An exponential decay feature based on time since the last verified shutdown event. Recent shutdowns raise short-term probability more than distant ones — a country that shut down 10 days ago is more likely to be in an ongoing crisis than one where the last event was 900 days ago. The decay is parameterized with a half-life of 90 days, which was selected via cross-validation on the training corpus.
XGBoost feature importance
Feature importance is measured using mean absolute SHAP values across the 200-country holdout test set, averaged over all country-day pairs in a one-year evaluation window. SHAP (SHapley Additive exPlanations) values are used rather than the XGBoost built-in gain importance because SHAP is consistent across model architectures and does not overweight high-cardinality features.
| Rank | Feature | Mean |SHAP| | Interpretation |
|---|---|---|---|
| 1 | days_to_nearest_election | 0.41 | Proximity to election day is the single strongest predictor across all country-class combinations |
| 2 | shutdown_history_3y | 0.38 | Recidivism: prior shutdowns are the best indicator of future ones |
| 3 | protest_intensity_7d | 0.29 | Active protest conditions raise probability sharply, especially when combined with election proximity |
| 4 | probe_measurement_rate_delta | 0.24 | The only real-time telemetry feature in the top 5; a drop below 0.6 is a strong precursor |
| 5 | ofac_designation_30d | 0.21 | OFAC surge often precedes the political crisis that triggers the shutdown |
| 6 | election_competitiveness_score | 0.20 | Hybrid regimes with contested elections drive this feature's contribution |
| 7 | bgp_withdrawal_rate_7d | 0.18 | BGP telemetry is informative but less so than probe rate delta — it arrives later in the sequence |
| 8 | diplomatic_isolation_score | 0.16 | Structural isolation correlates with shutdown propensity beyond any single event |
| 9 | days_since_last_shutdown | 0.14 | Recency decay on last event; most useful for ongoing-crisis detection |
| 10 | throttling_incident_count_7d | 0.12 | Throttling-to-shutdown escalation pattern, most predictive for Iran and Russia |
The most surprising finding in the SHAP analysis is the relative ranking of ofac_designation_30d (SHAP 0.21) versus the BGP withdrawal features (SHAP 0.18). The intuition is that OFAC sanctions often precede, rather than accompany, the political crisis that leads to a shutdown: sanctions create economic pressure, which creates political crisis, which creates conditions for a connectivity event. BGP withdrawal, by contrast, is a concurrent or lagging signal — by the time routes are being withdrawn, the shutdown decision has already been made. The forecasting utility of BGP is higher for real-time detection than for 7-day prediction, which is why probe_measurement_rate_delta — which can drop hours before an announcement — ranks higher among the telemetry features.
Feature availability per country
Not all 47 features are available for all 200 countries. Coverage varies substantially by feature group, and the missingness pattern is not random — it is correlated with internet infrastructure maturity and geopolitical visibility, which are themselves correlated with shutdown risk.
| Feature group | Coverage (% of countries) | Primary gap |
|---|---|---|
| Political calendar | 97% | Three Pacific microstates with no consistent election calendar data |
| GDELT protest signal | 100% | GDELT covers all countries; very low news volume acts as a zero rather than a gap |
| OFAC sanctions | 100% | SDN list covers all countries; zero designations is a valid feature value |
| BGP telemetry (IODA) | 73% | Pacific island nations and several Caribbean states have insufficient RIPE NCC route-collector coverage |
| Voidly probe rate | 81% | Countries with fewer than 2 active probes produce unreliable rate deltas |
Countries missing BGP telemetry (27% of the 200-country set) are mostly Pacific island nations — Tuvalu, Kiribati, Marshall Islands, Nauru — where RIPE NCC route-collector coverage is limited or absent. XGBoost handles missing values natively via learned default splits: during training, the model learns which direction to send samples with a missing feature based on which direction produces better predictions for that feature. For BGP missingness, the learned default consistently sends those samples toward the "low BGP signal" side of splits, which is appropriate given that the missing countries are also the ones where BGP-based shutdowns are least historically observed.
What the feature set does not capture
The features described above are designed to predict probability, not to explain cause. A country with a high 7-day shutdown probability could be heading toward a deliberate government-ordered connectivity cutoff, or it could be facing a severe infrastructure event that happens to share statistical properties with intentional shutdowns. The model does not distinguish between these cases — it predicts the outcome, not the mechanism.
Several important signals are structurally absent from the feature set:
- Internal government communications. The most direct leading indicator of a shutdown — a cabinet order, a ministry directive — is classified. No open-source feature engineering can substitute for it.
- Commercial infrastructure failures. Large-scale BGP outages caused by misconfiguration (the Facebook October 2021 event is the canonical case) can produce feature vectors that resemble pre-shutdown patterns. We apply a post-hoc outage classifier that uses BGP withdrawal topology (self-originated vs. transit-withdrawn) to flag likely infrastructure events, but it is not a solved problem.
- Planned maintenance windows. Some countries' telecommunications regulators publish planned maintenance schedules, and we filter those using telecoms regulatory filings where they are available. Coverage is patchy: the feature is filtered for roughly 40% of countries and simply absent for the rest.
- Social media platform self-regulation. A platform voluntarily reducing service in a country — which has happened in a small number of cases — looks similar to externally forced throttling from a probe measurement perspective. The feature set has no signal that distinguishes platform-initiated from government-mandated reduction.
These limitations are surfaced in the API response via the data_quality_flagsfield, which lists which feature groups had missing or flagged inputs for a given country-date pair. The per-country reliability score visible on every Voidly dashboard reflects, in part, how many of these gaps affect the forecast for that country.
For the model that uses these features — the 7-day ARIMA + XGBoost ensemble architecture, per-country calibration, and reliability scoring: Seven-day internet shutdown forecasting: how Voidly predicts connectivity outages →
For how OFAC SDN designations are ingested — the daily conditional GET, alias explosion, and the change feed that produces the sanctions_designation_30d feature: OFAC SDN integration in the Federal Regulatory Data Hub: conditional GET, entity normalization, and sub-second screening →
For the BGP routing signals that produce the bgp_withdrawal_rate_7d feature: BGP routing signals and internet shutdown detection: how Voidly uses IODA data →
For the country-level censorship score — the 90-day rolling aggregation that feeds the blocking_rate_trend feature: Voidly's country-level censorship score: aggregating 2.2B probe measurements into the global index →