Six years

Six years observing the open internet.

AI Analytics has been operating since 2020. Below is the public-facing trajectory: when we started measuring, when the probe network expanded, when we shipped what. Internal milestones aren't listed.

2027

53 milestones
  1. Q1Writing

    BLS OEWS occupational wage statistics database deep-dive published

    Long-form analysis of BLS OEWS — 1.1M establishment panel survey, 830 SOC detailed occupations, 23 major groups, national/state/MSA/nonmetro geographies, 10th/25th/75th/90th wage percentile distributions, industry-occupation matrices (same job wildly different pay across sectors), BLS API v2 OEWS series ID construction (OEU prefix), and May-publication lag caveat.

  2. Q1Writing

    CFPB consumer complaint database and financial product enforcement deep-dive published

    Long-form analysis of CFPB complaint database — Dodd-Frank 2010 origin/2012 launch/3M+ complaints, 12-product taxonomy (mortgage/credit card/student loan/debt collection/credit reporting), credit bureau surge (Equifax/Experian/TransUnion 200K+ each 2020-2023), 15-day company response window, ~60% narrative rate, CARES Act forbearance errors, and CFPB public REST API Python workflow.

  3. Q1Writing

    FARA foreign agent registration database and DOJ enforcement deep-dive published

    Long-form analysis of DOJ FARA database — 1938/1966 statutory history, RA-1/NSD-3 registration structure, LDA Section 613(h) exemption gap, 2017-2022 enforcement surge (2 prosecutions 1966-2015 vs. 7 indictments post-Mueller), Manafort/Podesta/Flynn/Barrack/Skadden cases, Saudi Arabia $450M+ since 2016, eFARA 2021 launch, and efile.fara.gov API Python workflow.

  4. Q1Writing

    SEC Form 4 insider trading EDGAR database deep-dive published

    Long-form analysis of SEC Form 4 — Section 16(a) 1934 Exchange Act/SOX 2002 2-business-day rule, officer/director/10%+ owner filers, P/S/A/D/M/F/G transaction codes, Rule 10b5-1 safe harbor mechanics and 2023 SEC amendments (90-day cooling-off, single-trade plan cap), open-market purchase predictive signal research, Elon Musk Twitter filing delays/SEC investigation, STOCK Act congressional trading, and EDGAR submissions API Python workflow.

  5. Q1Writing

    NLRB union election and unfair labor practice database deep-dive published

    Long-form analysis of NLRB elections and ULP enforcement — NLRA Sections 7/8/9, RC/RD/RM petition types, 2014/2023 quickie election rules, 47% union win rate (2022), ULP CA/CB/CC charge types, Gissel bargaining orders, Amazon LDJ5 ALU victory (2022), Starbucks 400+ petitions and Cemex order, UAW Volkswagen Chattanooga, WGA/SAG-AFTRA AI bargaining, and NLRB case management CSV/JSON API Python workflow.

  6. Q1Writing

    NTSB aviation accident database and probable cause investigation deep-dive published

    Long-form analysis of NTSB aviation accident database — NTSB vs. FAA jurisdictional split, accident/incident thresholds, go-team deployment, CAROL database 90K+ records, probable cause coding schema, GA (1,200+ accidents/year) vs. commercial rates, Colgan Air 3407 CRM/fatigue rule, Southwest 1380 CFM56 fan blade fatigue, Boeing 737 MAX MCAS/ACAR design, Alaska Airlines 737 MAX 9 door plug ASB (Jan 2024), NTSB CSV download Python workflow.

  7. Q1Writing

    NOAA Storm Events Database 50-year weather disaster record deep-dive published

    Long-form analysis of NOAA NCEI Storm Events Database — 48 event types (NWS Directive 10-1605), records back to 1950 (tornado-only) and 1996 (all events), FIPS-coded county records, damage estimation K/M/B methodology, Joplin EF5 2011/April 2011 Super Outbreak (360 tornadoes/321 deaths), Hurricane Harvey 2017 ($125B), Hurricane Ian 2022 ($112B), NOAA Billion-Dollar Disaster threshold ($1B CPI-adjusted), NWS local office → NCEI pipeline, and Storm Events CSV download Python workflow.

  8. Q1Writing

    USAID ForeignAssistance.gov foreign aid disbursement database deep-dive published

    Long-form analysis of ForeignAssistance.gov — Foreign Assistance Act 1961, 12 disbursing agencies, Economic/Humanitarian/Security/Democracy/Health categories, top recipients (Israel/Egypt/Ukraine/Jordan), PEPFAR $7B+ HIV/AIDS programming, MCC compact structure (threshold/scorecard/compact phases), ODA vs. OOF OECD DAC classification, DOGE-era funding freezes, and ForeignAssistance.gov API Python workflow.

  9. Q1Writing

    NHTSA vehicle safety complaints and ODI defect investigation database deep-dive published

    Long-form analysis of NHTSA complaints — ODI early warning reporting (EWR threshold: 25 deaths/250 injuries), 3M+ VOQS database complaints, preliminary evaluation (PE)/engineering analysis (EA) investigation pipeline, Takata ammonium nitrate airbag rupture (67M recall/largest US auto recall), Toyota SUA ECM/floor mat recall (9M vehicles), GM ignition switch 2.6M recall/13 deaths, TREAD Act 2000, and NHTSA complaints API Python workflow.

  10. Q1Writing

    EPA RCRA hazardous waste database and RCRAInfo compliance tracking deep-dive published

    Long-form analysis of EPA RCRA — RCRA 1976/HSWA 1984 statutory basis, generator classification (LQG ≥1,000 kg/SQG 100-999 kg/VSQG <100 kg per month), cradle-to-grave manifest tracking, ~1,500 permitted TSDFs, F/K listed wastes vs. D characteristic wastes (TCLP), corrective action 3,700-site universe, GE Hudson River PCB Superfund and Honeywell cases, and ECHO API RCRA facility search Python workflow.

  11. Q1Writing

    EIA Form 860 US power plant and generator inventory database deep-dive published

    Long-form analysis of EIA Annual Electric Generator Report — mandatory survey all utility-scale plants, 25,000+ generating units/8,000+ plants, NERC interconnection regions, coal retirements (102 GW retired 2010-2023), solar additions (130 GW utility-scale 2024), wind additions (140 GW 2024), ownership concentration (NextEra/Duke/Southern), interconnection queue 2,000 GW backlog, and EIA Open Data API v2 Python workflow.

  12. Q1Writing

    NCES IPEDS higher education statistics database deep-dive published

    Long-form analysis of NCES IPEDS — 12 survey components (IC/HD/EF/C/GR/F/SFA/SAL/HR/EAP/OM/ADM), 6,000 Title IV institutions, HEA mandatory reporting, graduation rate 150%/200% time cohorts, CIP code 6-digit classification, Delta Cost Project financial analysis, HBCU/MSI classifications, College Scorecard cross-reference, and Urban Institute Education Data Portal API Python workflow.

  13. Q1Writing

    OFAC sanctions civil penalty enforcement database deep-dive published

    Long-form analysis of OFAC civil penalty settlements — IEEPA/TWEA/UNPA authorities, SDN List XML structure (10K+ entries), BNP Paribas $963M/Standard Chartered $639M/UniCredit $611M/Deutsche Bank $629M enforcement actions, voluntary self-disclosure 50% reduction, CAATSA secondary sanctions, SDN XML Python workflow with program-type analysis.

  14. Q1Writing

    ORI federal research misconduct database deep-dive published

    Long-form analysis of HHS Office of Research Integrity — 42 CFR Part 93 FFP definition, inquiry/investigation/ORI review/DAB appeal process, Eric Poehlman prison sentence/Dipak Das 145-count finding/image manipulation trend, ori.hhs.gov/case_summary database, Retraction Watch 47K retractions cross-reference, PubPeer, and annual ORI report data Python workflow.

  15. Q1Writing

    USASpending subaward federal funding flow database deep-dive published

    Long-form analysis of USASpending subaward transparency — FFATA 2006/DATA Act 2014 legal basis, FSRS reporting threshold ($30K), sub-grant flow (CDC/NIH/HUD/FEMA state-to-local), sub-contract supply chain (DOD prime-to-small business), FAIN/PIID cross-reference, OMB Uniform Guidance 2 CFR Part 200, and USASpending API v2 subaward search Python workflow.

  16. Q1Writing

    FEC Super PAC and dark money outside spending database deep-dive published

    Long-form analysis of FEC independent expenditure data — Citizens United 2010/SpeechNow.org Super PAC creation/McCutcheon 2014, Schedule E/Form 5/Form 9 filing structure, 501(c)(4) dark money (Crossroads GPS $70M 2012, Arabella Advisors, AFP), electioneering communications 30/60-day windows, OpenSecrets IRS-990 cross-reference methodology, and FEC API Super PAC query Python workflow.

  17. Q1Writing

    Congressional roll call vote database and DW-NOMINATE ideology scoring deep-dive published

    Long-form analysis of VoteView/Congress.gov voting data — Poole-Rosenthal DW-NOMINATE two-dimensional scoring, 1st-118th Congress coverage, Dimension 1 liberal-conservative axis, polarization trend (party overlap collapse post-1980), AUMF 2001/ACA/TCJA landmark votes, party unity scores, ProPublica Congress API, and Congress.gov API Python workflow.

  18. Q1Writing

    Grants.gov federal grant opportunity database deep-dive published

    Long-form analysis of Grants.gov and federal competitive grants — E-Government Act 2002, 26 grant-making agencies, 30K opportunities/year, competitive vs. formula distinction, CFDA→SAM.gov Assistance Listings, IIJA/IRA 2022 funding surge ($1.2T/$369B), SF-424 application mechanics, indirect cost rate negotiation, and Grants.gov v1 search API Python workflow.

  19. Q1Writing

    EPA Safe Drinking Water Act violations and water system database deep-dive published

    Long-form analysis of EPA SDWIS — 150,000 PWSs (community/TNCWS/NCWS), 49-state primacy system, MCL/MRDL/TT/M&R violation types, PFAS 2024 MCL 4 ppt final rule (66K-80K systems affected), Flint Michigan lead crisis, private wells (42M Americans/no SDWA coverage), environmental justice hotspots, ECHO/SNC database, and EPA ECHO API Python workflow.

  20. Q1Writing

    Regulations.gov rulemaking dockets and public comment database deep-dive published

    Long-form analysis of Regulations.gov — E-Government Act 2002, 170+ agencies, 25M+ comments, APA notice-and-comment rulemaking framework (ANPRM→NPRM→Final Rule), OIRA/EO 12866 significant rules, mass comment campaigns (net neutrality 22M/fake identity fraud, EPA Clean Power Plan 4.3M), v4 REST API with api.data.gov key, and Regulations.gov API docket search Python workflow.

  21. Q1Writing

    FHWA Highway Performance Monitoring System road condition database deep-dive published

    Long-form analysis of HPMS — 4.1M miles US public roads, IRI pavement condition thresholds (Good <95/Fair 95-170/Poor >170 in/mi), NHPP performance management, 2022 C&P Report (43% Good / 6% Poor Interstate), IIJA $110B road funding, ASCE C- infrastructure grade, and FHWA Highway Statistics published-table Python workflow.

  22. Q1Writing

    FAA Civil Aviation Registry airmen and aircraft database deep-dive published

    Long-form analysis of FAA Registry — Airmen Certification (700K active pilots, ATP/CPL/PPL/CFI/Part 107 certificate types, BasicMed 2017, Class 1/2/3 medicals) and Aircraft Registration (300K aircraft, N-number, make/model, airworthiness class, experimental/kit-built), FBI/DEA surveillance aircraft journalism, and FAA releasable airmen ZIP CSV Python workflow.

  23. Q1Writing

    DOE AFDC EV charging station and alternative fuel infrastructure database deep-dive published

    Long-form analysis of DOE Alternative Fuels Station Locator — 180K+ stations (EVSE ports, hydrogen, CNG, E85), NEVI $5B formula program (50-mile spacing, 150kW DCFC, 97% uptime, NACS mandate), connector standards evolution (J1772/CHAdeMO/CCS/NACS-J3400), Tesla network integration, coverage deserts, and developer.nrel.gov AFDC API Python workflow.

  24. Q1Writing

    USGS US Wind Turbine and Solar PV energy infrastructure database deep-dive published

    Long-form analysis of USWTDB and USPVDB — 72K+ wind turbines (USGS/LBNL/AWEA joint database, GPS coordinates, hub height/rotor diameter/capacity attributes, quarterly updates), utility-scale solar PV (EIA 860 enriched with satellite imagery), 140 GW US wind + 100 GW solar installed, IRA 2022 tax credit extension, and eersc.usgs.gov USWTDB API Python workflow.

  25. Q1Writing

    SBA 7(a) and 504 small business loan programs database deep-dive published

    Long-form analysis of SBA loan disclosure data — 7(a) guarantee mechanics (75-85%), 504 three-party structure (bank/CDC/borrower), SBA Express/CAPLines/Export programs, PPP $800B+ forgivable loan disclosure and fraud ($100B+ improper payments), size standards (13 CFR Part 121), top lenders (Live Oak/JPMorgan/Newtek), and data.sba.gov loan-level CSV Python workflow.

  26. Q1Writing

    US Attorney federal prosecution database deep-dive published

    Long-form analysis of USAO press releases and statistical reports — 94 districts, ~5,700 AUSAs, ~79,600 FY2022 defendants, 90% conviction rate, DOJ press release JSON API, PACER case cross-reference, SDNY/EDVA/CDCA notable district profiles, drug/corruption/fraud/civil rights/national security offense categories, and CourtListener RECAP free docket Python workflow.

  27. Q1Writing

    SAMHSA substance abuse and mental health treatment database deep-dive published

    Long-form analysis of SAMHSA CBHSQ data — NSDUH (67,500 respondents, 48.7M SUD, 57.8M mental illness, 1-in-8 treatment rate), TEDS-A/TEDS-D 2M admissions, N-MHSS 12,000 facilities, X-waiver/MATE Act buprenorphine reform, 988 Lifeline 5M+ contacts, SABG $2B block grant, BHSIS API/findtreatment.gov Python workflow.

  28. Q1Writing

    PHMSA gas and liquid pipeline safety incident database deep-dive published

    Long-form analysis of PHMSA significant incident data — 2.7M miles US pipeline network, OPSWEB Forms 7100.1/7100.2/7000.1, San Bruno 2010 ($1.6B)/Colonial Pipeline 2016/Enbridge Kalamazoo 2010 (843K gallons)/Aliso Canyon 2015 incidents, $266,015/day civil penalty maximum, MAP-21/PIPES Act enforcement, and PHMSA CSV bulk download Python workflow.

  29. Q1Writing

    CDC foodborne disease outbreak surveillance database deep-dive published

    Long-form analysis of the CDC FDOSS — NORS reporting system, pathogen distribution (norovirus/Salmonella/Listeria/E.coli O157), single vs. multi-state outbreak definition, FSMA preventive controls, 2011 Jensen Farms Listeria/2018 romaine E.coli/2023 Salmonella cantaloupe cases, and data.cdc.gov Socrata API outbreak Python workflow.

  30. Q1Writing

    OSHA 300A establishment injury rate database deep-dive published

    Long-form analysis of OSHA Injury Tracking Application 300A summary data — TRC and DART rate formulas, recordable vs. reportable distinction, electronic submission mandate, industry NAICS benchmarks, Amazon warehouse injury controversy, COVID-19 recordability ruling, and data.osha.gov Socrata API industry rate comparison Python workflow.

  31. Q1Writing

    DOJ Civil Rights Division police reform and enforcement database deep-dive published

    Long-form analysis of DOJ CRT enforcement — Special Litigation Section pattern-or-practice authority (34 USC 12601), consent decree mechanics, Ferguson/Chicago/Minneapolis/Louisville agreements, voting rights Section 2 enforcement, PREA/ADA Title II/fair housing enforcement, and DOJ press release API Python workflow.

  32. Q1Writing

    USDA ERS food and agricultural economics database deep-dive published

    Long-form analysis of USDA Economic Research Service data — farm income and wealth statistics (net farm income), Food Expenditure Series, COLI food price indices, RUCC rural-urban continuum codes, food security annual supplement (CPS), commodity supply-and-use tables, and USDA ERS API Python workflow.

  33. Q1Writing

    CMS Medicare Part D prescriber drug spending database deep-dive published

    Long-form analysis of CMS Part D prescriber data — MMA 2003 Part D structure, ~25M-row prescriber-by-drug dataset, ProPublica Dollars for Docs cross-reference, opioid outlier and pill mill detection, IRA 2022 drug negotiation, preclusion list, and data.cms.gov Socrata API opioid prescribing Python workflow.

  34. Q1Writing

    DEA controlled substance registrant enforcement database deep-dive published

    Long-form analysis of DEA registrant revocation enforcement — 1.8M registrant system, OTSC/ISO/final order authorities, 21 USC 824 grounds, Florida pill mill enforcement, Walgreens $80M penalty, PDMP interconnect, and Federal Register API Python workflow.

  35. Q1Writing

    NRC Reactor Oversight Process nuclear safety database deep-dive published

    Long-form analysis of the NRC ROP — seven safety cornerstones, Green/White/Yellow/Red performance indicator thresholds, Significance Determination Process, four-column action matrix, ~93-reactor fleet, Diablo Canyon reversal, SMR status, and NRC PI data Python workflow.

  36. Q1Writing

    CFTC commodity market enforcement database deep-dive published

    Long-form analysis of CFTC enforcement actions — Dodd-Frank OTC swap expansion, LIBOR $9B+/FX manipulation/spoofing/FTX $12.7B/Binance $2.7B cases, consent order process, $350M+ whistleblower program, and embedded 30-case historical penalty dataset Python workflow.

  37. Q1Writing

    HMDA mortgage lending disparities database deep-dive published

    Long-form analysis of the Home Mortgage Disclosure Act LAR data — post-2018 expanded fields, HOLC historical redlining to City National $31M modern settlement, denial rate disparities by race, FFIEC API bulk access, and census-tract denial rate Python analysis.

  38. Q1Writing

    CMS hospital cost report financial database deep-dive published

    Long-form analysis of HCRIS Form CMS-2552 — Worksheets S/A/B/C/D/E/G/S-10, cost-to-charge ratios, DSH/GME/capital reimbursement implications, critical access hospital cost-based payment, and data.cms.gov Socrata API Python workflow for operating margin and charity care analysis.

  39. Q1Writing

    FEC campaign finance Matters Under Review enforcement database deep-dive published

    Long-form analysis of FEC MUR enforcement — bipartisan deadlock structure, five FECA violation types, Sam Bankman-Fried/$93M straw donor scheme, Citizens United/McCutcheon precedents, MUR process timeline, and FEC legal API Python workflow.

  40. Q1Writing

    IRS Criminal Investigation tax fraud prosecution database deep-dive published

    Long-form analysis of IRS CI enforcement — ~2,000 special agents, 90%+ conviction rate, tax evasion/TFRD/FATCA/IDTRF/narcotics case categories, Capone/Snipes/Manafort/Bankman-Fried cases, John Doe summonses, cryptocurrency seizures including $4.5B Bitfinex hack, and DOJ Tax Division API Python workflow.

  41. Q1Writing

    CDC NNDSS notifiable disease surveillance database deep-dive published

    Long-form analysis of the National Notifiable Diseases Surveillance System — 120+ reportable conditions, CSTE/CDC disease lists, PulseNet/GenomeTrakr foodborne surveillance, STI/HIV/TB/Lyme/measles case counts, COVID-19 surveillance evolution, and data.cdc.gov Socrata API Python workflow.

  42. Q1Writing

    OSHA workplace safety violations database deep-dive published

    Long-form analysis of OSHA enforcement citation data — serious/willful/repeated/failure-to-abate violation types, top 10 most-cited CFR standards, penalty reduction factors, Amazon/Tesla/heat illness high-profile cases, and data.osha.gov Socrata API Python workflow.

  43. Q1Writing

    GAO federal audit reports database deep-dive published

    Long-form analysis of Government Accountability Office reports database — product types, audit categories, DOD financial audit failure, COVID relief improper payments, High Risk List, open recommendations, and GAO/EFTS API Python workflow.

  44. Q1Writing

    FCC Universal Licensing System spectrum database deep-dive published

    Long-form analysis of the FCC ULS — 10M+ active licenses, service code taxonomy, spectrum auction history ($81B C-band), AM/FM/TV broadcast class table, AT&T/Verizon/T-Mobile/Dish licensee profiles, FCC License View API, and state AM/FM query Python workflow.

  45. Q1Writing

    UFLPA entity list and forced labor supply chain enforcement deep-dive published

    Long-form analysis of the Uyghur Forced Labor Prevention Act Entity List — FLETF administration, rebuttable presumption standard, Xinjiang polysilicon/cotton/tomato supply chains, CBP detention mechanics, XPCC sanctions, and cross-reference with OFAC CSL Python workflow.

  46. Q1Writing

    FinCEN BSA anti-money laundering enforcement database deep-dive published

    Long-form analysis of FinCEN Bank Secrecy Act civil enforcement — AML four pillars, CTR/SAR requirements, Travel Rule, HSBC $1.9B/BNP $8.9B/Deutsche Bank $630M/Binance $4.3B cases, CTA beneficial ownership database, and embedded historical penalty dataset Python workflow.

  47. Q1Writing

    SAM.gov government contractor debarment database deep-dive published

    Long-form analysis of SAM.gov exclusions — FAR 9.4 debarment/suspension/voluntary exclusion types, grounds under FAR 9.406-2, Boeing/$615M/KBR/Theranos cases, OIG LEIE cross-listing, bulk CSV extract, and active exclusion Python workflow.

  48. Q1Writing

    FHWA National Bridge Inventory infrastructure database deep-dive published

    Long-form analysis of the NBI inspection program — 1967 Silver Bridge NBIS origin, 0-9 condition rating scale, Sufficiency Rating, I-35W/Fern Hollow/Key Bridge case studies, IIJA Bridge Formula Program $27.5B, and poor-condition bridge Python analysis.

  49. Q1Writing

    NIH research grants RePORTER database deep-dive published

    Long-form analysis of the NIH Research Portfolio Online Reporting Tools — 27 institutes and centers, R01/K/F/P/U/SBIR grant mechanisms, RCDC disease categorization, payline and peer review, indirect cost rates, COVID funding surge, ARPA-H creation, and RePORTER API v2 Python workflow.

  50. Q1Writing

    USDA SNAP program database deep-dive published

    Long-form analysis of the Supplemental Nutrition Assistance Program — Thrifty Food Plan reform, ABAWD work requirements (Fiscal Responsibility Act 2023), emergency allotment end, automatic stabilizer mechanics, payment error rates, retailer trafficking, FRED API participation trend Python workflow.

  51. Q1Writing

    FEMA disaster declarations database deep-dive published

    Long-form analysis of the Stafford Act disaster declaration system — DR/EM/FMAG types, Public Assistance categories A-G, Individual Assistance IHP, HMGP/BRIC/FMA hazard mitigation, OpenFEMA API, COVID DR-4480 nationwide declaration, and historical decade/incident-type/state trend Python workflow.

  52. Q1Writing

    CMS Hospital Compare database deep-dive published

    Long-form analysis of the CMS Hospital Compare quality reporting program — Overall Star Rating methodology, HCAHPS 10 domains, HRRP readmission penalties, PSI-90 safety indicators, HAC Reduction Program, value-based purchasing, and data.cms.gov Socrata API Python workflow.

  53. Q1Writing

    DOL OFLC H-1B visa wage disclosure database deep-dive published

    Long-form analysis of the Office of Foreign Labor Certification LCA disclosure data — H-1B, H-2A, H-2B, and PERM programs, employer wage attestation mechanics, top body-shop employers, AEWR rates, and quarterly Excel workflow.

2026

242 milestones
  1. Q4Writing

    PACER federal court database deep-dive published

    Long-form analysis of PACER/CM/ECF court document system — 1 billion documents, per-page fee controversy, RECAP mirror, CourtListener API, Carl Malamud public.resource.org, and federal opinion keyword search Python workflow.

  2. Q4Writing

    BIS export enforcement database deep-dive published

    Long-form analysis of Bureau of Industry and Security export control enforcement — EAR, Entity List, Denied Persons List, Unverified List, ZTE $1.19B penalty, Huawei foreign direct product rule, Russia 2022 controls, October 2022 semiconductor controls, and Consolidated Screening List Python workflow.

  3. Q4Writing

    Treasury Daily Treasury Statement database deep-dive published

    Long-form analysis of the BFS Daily Treasury Statement — Tables I/II/III structure, real-time tax receipt tracking, debt ceiling X-date estimation, Social Security/Medicare payment day spikes, FiscalData API, and 30-day operating balance Python workflow.

  4. Q4Writing

    Census SAIPE county-level poverty and income estimates deep-dive published

    Long-form analysis of the Small Area Income and Poverty Estimates program, Title I-A funding allocation formula, Census Bureau API, and county-level poverty Python workflow.

  5. Q4Writing

    EEOC workplace discrimination charge database deep-dive published

    Long-form analysis of EEOC charge data across all six federal employment statutes — Title VII, ADA, ADEA, EPA, Genetic Information, Pregnancy Discrimination — covering charge process, resolution outcomes, and FY2010-2023 trend data.

  6. Q4Writing

    FDA FAERS adverse event reporting system analysis published

    Long-form analysis of the FDA Adverse Event Reporting System — MedDRA hierarchy, EBGM signal detection algorithm, 26 million case submissions, and openFDA API Python workflow for pharmacovigilance research.

  7. Q4Writing

    NHTSA FARS traffic fatality database analysis published

    Long-form analysis of the Fatality Analysis Reporting System — 50 years of crash data, 2 million fatalities, NHTSA API access patterns, and state-level fatality rate Python analysis.

  8. Q4Writing

    CPSC consumer product recall database deep-dive published

    Long-form article on the CPSC (Consumer Product Safety Commission) recall database: CPSC established by Consumer Product Safety Act 1972 (CPSA), independent federal agency regulating ~15,000 types of consumer products; CPSC jurisdiction includes household products, furniture, children products, toys, electronics, clothing, recreational equipment; excluded from CPSC jurisdiction: food and drugs (FDA), automobiles (NHTSA), firearms (ATF), aircraft (FAA), medical devices (FDA); ~9,800 recalls in the database since 1973; Section 15 voluntary recall (negotiated with CPSC, most common) vs. Section 9 mandatory recall (rare, requires formal rulemaking); CPSA Section 15(b) imposes 24-hour reporting obligation when a manufacturer/importer/retailer obtains information that a product creates a substantial product hazard; CPSIA 2008 (Consumer Product Safety Improvement Act, triggered by Chinese toy lead paint scandal 2007 -- 1.5M+ units recalled): mandatory third-party testing for children products, Children Product Certificate (CPC) and General Conformity Certificate (GCC), total lead limit 100 ppm (surface 90 ppm), phthalate limits, permanent tracking label requirement; SaferProducts.gov: public database of consumer incident reports submitted by consumers, healthcare professionals, medical examiners, and child fatality review panels; manufacturers given 10-day comment window before publication; NEISS-AIP (National Electronic Injury Surveillance System) as CPSC injury surveillance network in ~100 hospital emergency departments; recall delays: average 12-18 months from first incident to recall; Safe Sleep for Babies Act 2022: banned inclined sleepers and crib bumpers after multiple infant deaths; furniture stability mandatory standard 2023 targeting dresser tip-overs; notable recalls: Fisher-Price Rock n Play infant sleeper (4.7M units, 32 infant deaths, recalled April 2019), IKEA MALM dresser tip-over (29M units North America, 2016 and 2022 recalls), Peloton Tread+ (125,000 units after child death 2021), Samsung Galaxy Note 7 (2.5M units, battery fires 2016), Takata airbags (67M+ airbags, 19+ deaths from metal shrapnel, 2014-2019, NHTSA-led); data access: recalls.gov/api with product_type_id/date_from/date_to parameters and recallID/recallDate/title/hazard/remedy/units/productCategory/injuries/deaths fields; cpsc.gov/data bulk XML; Python analysis: recalls.gov API fetch, hazard-type aggregation, product category units recalled, 2015-2024 annual trend, fatal recall identification.

  9. Q4Writing

    ClinicalTrials.gov research registry deep-dive published

    Long-form article on ClinicalTrials.gov (National Library of Medicine, launched February 2000 per FDA Modernization Act 1997): 500,000+ registered studies as of 2024; FDAAA 801 (Food and Drug Administration Amendments Act 2007) made registration mandatory for applicable clinical trials (ACTs) -- Phase 2+ interventional studies of FDA-regulated drugs/biologics/devices -- within 21 days of first patient enrollment; results reporting required within 12 months of primary completion date; failure penalties: FDA civil monetary penalties up to $10,000/day, NIH grant funding withholding; 2015 NEJM study found only 13% of applicable trials reported results on time -- enforcement widely criticized as inadequate; study phases: Phase 0 (sub-therapeutic microdosing, pharmacokinetics), Phase 1 (first-in-human safety, dose-ranging, 20-80 participants), Phase 2 (efficacy signal and safety, 100-300 participants), Phase 3 (pivotal randomized controlled trials comparing to standard of care, basis for FDA approval, hundreds to thousands of participants), Phase 4 (post-marketing surveillance, real-world effectiveness); key data fields: NCT number (unique identifier e.g. NCT04368728), official title, brief title, brief summary, sponsor type (industry ~50%, NIH/federal ~20%, academic/other ~30%), study type (interventional/observational), phase, allocation (randomized/non-randomized), intervention model (parallel/crossover/factorial/sequential), masking (none/single/double/triple/quadruple), primary completion date, enrollment count, primary outcome measures, secondary outcomes, eligibility criteria, MeSH condition terms, intervention types (drug/device/behavioral/procedure/dietary supplement/genetic); disease area composition: oncology ~35% of all trials, diabetes/endocrinology, cardiology, psychiatry, infectious disease follow; COVID-19 surge: ~11,000 COVID trials registered 2020-2021; publication bias (file drawer problem): trials with negative results less likely published; AllTrials campaign 2013, Ben Goldacre Bad Pharma book, COMPARE project (outcome switching), RIAT (restoring invisible and abandoned trials) initiative; ClinicalTrials.gov API v2 at clinicaltrials.gov/api/v2/studies: no API key required, pagination by pageSize/pageToken, protocolSection/resultsSection/statusModule/conditionsModule/designModule/eligibilityModule/outcomesModule/sponsorCollaboratorsModule structure; aggregate registry statistics: ~40% completed, ~25% recruiting, ~15% terminated; ICTRP (WHO International Clinical Trials Registry Platform) cross-registry; TrialsTracker automated compliance monitoring; Python API query: recruiting Phase 3 oncology trials ranked by enrollment (top 10) plus phase distribution for all cancer trials.

  10. Q4Writing

    Census CPS household survey deep-dive published

    Long-form article on the Census Bureau Current Population Survey (CPS): joint Census Bureau and BLS monthly household survey conducted since 1940; ~60,000 housing units per month representing ~110,000 individuals; 4-8-4 rotation group design -- households interviewed 4 consecutive months, removed for 8 months, then re-interviewed for 4 months (16-month total contact period); reference week = week containing the 12th of each month; labor force status classification: employed (worked at least 1 hour for pay or profit during reference week), unemployed (without work AND actively sought work during past 4 weeks AND currently available for work), not in labor force (NILF = everyone else including discouraged workers, retired, students, caregivers); official unemployment rate = U-3 = unemployed divided by labor force (employed plus unemployed); supplemental measures U-1 through U-6: U-1 (persons unemployed 15+ weeks), U-2 (job losers and persons who completed temporary jobs), U-3 (official total unemployment), U-4 (discouraged workers added), U-5 (all marginally attached workers added), U-6 (total underemployment = U-5 plus part-time-for-economic-reasons); COVID-19 peak April 2020: U-3 14.7%, U-6 22.9%, labor force participation rate fell to 60.2%; Annual Social and Economic Supplement (ASEC, March CPS expanded to ~100,000 households): official US poverty rate using 48 Orshansky thresholds by family size and composition (Mollie Orshansky 1963 -- food budget multiplied by 3, updated annually for CPI-U); 2023 thresholds: ~$15,500 single person, ~$30,900 family of 4; 2023 official poverty rate ~11.1% (~36M people); critiques: food-share multiplier outdated (food is now 10-15% of budget not 33%), no geographic cost-of-living variation, excludes SNAP/EITC/housing subsidies from income; Supplemental Poverty Measure (SPM, developed from 1995 NAS panel, first published 2011): counts non-cash government benefits (SNAP, housing vouchers, LIHEAP), subtracts taxes paid and work expenses, adjusts for geographic housing costs; SPM shows lower poverty for children (non-cash benefits counted) but higher poverty for elderly (medical out-of-pocket costs); CPS vs. CES/QCEW: CPS is residence-based (where people live, includes self-employed, agricultural, domestic workers), CES/QCEW are establishment-based (where jobs are, payroll employment only); microdata fields: PWSSWGT person weight, PRTAGE age, PESEX sex, PRDTRACE race, PEHSPNON Hispanic, PEEDUCA education, PEMLR monthly labor force recode (1=employed at work through 7=not in labor force), PRERNWA weekly earnings, OFFPOV official poverty indicator, POVLL poverty level ratio; IPUMS-CPS harmonized microdata back to 1962 at ipums.org/cps; raw files at census.gov; FRED: UNRATE/U6RATE/CIVPART; BLS LAUS (Local Area Unemployment Statistics) for state and county unemployment; Python: FRED API UNRATE history + BLS LAUS API state unemployment 2024 and 2023 annual average + Census ACS poverty rates + comparative state table with YoY change.

  11. Q4Writing

    DEA ARCOS opioid distribution database deep-dive published

    Long-form article on the DEA ARCOS (Automation of Reports and Consolidated Orders System) opioid distribution database: mandatory reporting system under 21 USC 827 and 21 CFR 1304.33 requiring manufacturers/distributors/importers of Schedule I and II controlled substances to report every transaction; opioid coverage: hydrocodone, oxycodone, fentanyl, morphine, codeine, hydromorphone, methadone, oxymorphone, buprenorphine; 380 million individual opioid transaction records from 2006-2014; transaction fields: reporter DEA number, buyer DEA number, drug code, drug name, dosage unit, quantity, transaction date, transaction type (S=sale, P=purchase, T=theft or loss, R=return); ARCOS data was secret until MDL 2804 (In re: National Prescription Opiate Litigation, Judge Dan Polster, US District Court Northern District of Ohio): July 2019 court order released ARCOS transaction-level data to Washington Post and HD Media -- first public release of transaction-level opioid shipment data; key findings: 76 billion oxycodone and hydrocodone pills shipped 2006-2014; West Virginia ~780 pills per person per year; Mingo County WV received 3.3M hydrocodone pills over 2 years for a population of 25,000; McKesson, Cardinal Health, AmerisourceBergen (Big Three distributors) distributed 44% of all opioids nationally; suspicious order monitoring failure: 21 CFR 1301.74(b) requires distributors to identify and report suspicious orders (unusual size, frequency, or pattern) -- Big Three failed to report thousands of red-flag orders; DEA enforcement actions: McKesson $150M and registration surrenders 2017, AmerisourceBergen $150M 2017, Cardinal Health $44M; Purdue Pharma: OxyContin launched 1996, 2007 federal guilty plea $634M, 2020 DOJ settlement $8.34B, Sackler family $6B separate settlement, Harrington v. Purdue Pharma 2023 Supreme Court ruling limiting Sackler immunity; Mallinckrodt largest generic opioid manufacturer $1.6B settlement; civil settlements: Big Three $21B (2022 nationwide), J&J $5B, Walgreens $5.7B, CVS $5B, Walmart $3.1B; total opioid settlements $55B+; Washington Post published searchable ARCOS database at washingtonpost.com (pills to any US pharmacy 2006-2014); arcos R package for analysis; DEA does not currently publish ARCOS publicly; Python: downloads WaPo ARCOS bulk TSV for selected state, aggregates by county and drug, computes pills-per-capita using Census population estimates, prints top-10 counties by oxycodone and hydrocodone per-capita, top-10 distributors by pill volume.

  12. Q4Writing

    DOL UI Claims weekly unemployment database deep-dive published

    Long-form article on the DOL ETA (Employment and Training Administration) weekly unemployment insurance (UI) claims series: published every Thursday 8:30am ET for the prior week; two headline series -- Initial Claims (IC, new UI applications) and Continuing Claims (CC, ongoing recipients); 53 reporting jurisdictions -- 50 states plus DC, Puerto Rico, US Virgin Islands; DOL aggregates ETA-539 (initial) and ETA-5159 (continued) forms from state workforce agencies; seasonal adjustment applied to smooth structural patterns (summer/winter construction layoffs, retail holiday cycles); 4-week moving average reduces single-week noise; historical data back to January 1967; COVID-19 surge: 6.9M initial claims week ending April 4 2020 (prior record 695,000 October 1982, set during Reagan-era recession), continuing claims peaked at ~24.9M May 2020; CARES Act PUA (Pandemic Unemployment Assistance) expanded eligibility to gig workers, self-employed, and independent contractors not normally covered by state UI; state benefit variation: Mississippi maximum $235/week vs. Massachusetts $1,050/week (2024); replacement rate typically 40-50% of prior weekly wage; regular state UI 26 weeks in most states (cut to 12-16 weeks in Georgia and Missouri 2011); Federal-State Extended Benefits (EB) triggered when state insured unemployment rate reaches 6.5% or 8%; Emergency Unemployment Compensation (EUC) during 2008-2014 provided up to 99 weeks total; recipiency rate ~27% of BLS-defined unemployed workers in normal times due to voluntary quit ineligibility, benefit exhaustion, insufficient work history; distinction from BLS unemployment rate -- UI continuing claims = administrative benefit recipients, CPS-based unemployment rate = household survey estimate; FRED series: ICSA (initial SA weekly), ICNSA (not SA), CCSA (continuing SA), CC4WSA (4-week moving average), IURSA (insured unemployment rate); Python FRED API analysis: downloads ICSA 2019-present, detects COVID peak week, computes 52-week rolling average, prints latest 12 weeks with year-over-year comparison.

  13. Q4Writing

    CMS Nursing Home Compare database deep-dive published

    Long-form article on the CMS Nursing Home Compare dataset and Five-Star Quality Rating System: ~15,000 Medicare/Medicaid-certified nursing facilities, ~1.35M residents at any given time, ~$90,000-105,000/year private-pay cost; Five-Star system introduced 2008 with three component domains -- Health Inspections (based on annual standard surveys plus complaint investigations; state survey agencies conduct annual unannounced inspections of 1-2 days using multidisciplinary teams), Staffing (Payroll-Based Journal PBJ data: actual payroll records submitted quarterly since 2017, replacing self-reported data; measures: RN hours/resident day, total nurse hours/resident day including LPN/CNA, weekend staffing tracked separately), and Quality Measures (15 measures derived from Minimum Data Set 3.0 resident assessments: long-stay high-risk pressure ulcers, falls with major injury, UTI, physical restraint use, antipsychotic use in dementia -- focus since 2012 National Partnership to Improve Dementia Care -- depression, and 7 short-stay measures); F-tag deficiency system: F600-F999 range, scope and severity matrix (Isolated/Pattern/Widespread x Harm level A-L, immediate jeopardy J-L requires immediate correction), Form CMS-2567 as official deficiency citation document; Special Focus Facilities (SFF): ~90 facilities with persistent serious deficiency history on public CMS list updated monthly, ~400 on SFF Candidate list; decertification (loss of Medicare/Medicaid certification) is ultimate enforcement tool; civil monetary penalties (CMPs) per day or per instance; ownership transparency via Form CMS-855A; academic literature: private equity ownership associated with lower nurse staffing ratios and higher deficiency rates (Braun et al. 2021 Health Affairs, Harrington et al. 2020); data.cms.gov datasets include Provider Information, Health Deficiencies (F-tag level), Quality Measures (MDS-derived), Staffing (PBJ), and Penalties; Socrata API requires no key; Python: downloads Provider Information CSV, computes star-rating distribution, identifies SFF facilities, computes average staffing hours by star tier, ranks top-10 states by 1-star share, identifies abuse-flagged facilities.

  14. Q4Writing

    BLS QCEW payroll database deep-dive published

    Long-form article on the BLS Quarterly Census of Employment and Wages (QCEW): joint BLS-state partnership using UI administrative tax records from 53 jurisdictions; ~11 million establishment records per quarter representing ~95% of all US civilian employment; coverage exclusions: self-employed (no payroll), military personnel, elected officials (some states), railroad workers (covered by Railroad Retirement Board), some agricultural workers, unpaid family workers; key data fields -- area_fips (2-digit state, 5-digit county, 5-digit MSA, or US for national), industry_code (NAICS 2-6 digit; 10 or 00 for all industries), own_code (0=total, 1=federal government, 2=state government, 3=local government, 5=private), disclosure_code (N = suppressed cell: fewer than 3 establishments OR single employer accounting for 80%+ of wages), avg_weekly_wage, month1/2/3_emplvl (monthly employment for reference week including the 12th), total_qtrly_wages, taxable_qtrly_wages, annual_avg_emplvl; geographic granularity: national, 50 states plus DC, 3,200+ counties, 380+ MSAs, workforce investment areas, congressional districts; QCEW vs. CES: QCEW is administrative census (~5-month publication lag) providing definitive employer universe; CES is a sample survey of ~140,000 worksites (~1-month lag) providing timely monthly signal; March annual benchmark revision realigns CES to QCEW administrative universe -- the 2024 revision showed CES overstated payrolls by 818,000 jobs (largest downward revision since 2009); Location Quotient (LQ) = county industry employment share divided by national industry share: LQ > 1.0 indicates local specialization above national average (Midland TX oil/gas LQ ~30+, Manhattan securities LQ ~4+, Las Vegas hospitality LQ ~3+); disclosure suppression creates analysis challenges in small counties and narrow NAICS codes, handled by geographic or industry aggregation; three data access paths: BLS API with QCEW series IDs (ENU + FIPS + ownership + NAICS + data type), QCEW cross-sectional API at data.bls.gov/cew/api/data/v1/area/, bulk flat-file downloads at blsdownload.bls.gov (~500MB per quarter compressed); QCEW also provides the sampling frame for BLS OEWS (Occupational Employment and Wage Statistics) and CES surveys; Python: QCEW cross-sectional API fetches two years of private-sector 2-digit NAICS data, prints employment/wage ranking table, computes year-over-year wage growth by supersector, demonstrates LQ calculation vs. national benchmark.

  15. Q4Writing

    BLS CES monthly jobs report database deep-dive published

    Long-form article on the BLS Current Employment Statistics (CES) program -- the source of the monthly Jobs Report (Jobs Friday): monthly establishment payroll survey of ~140,000 businesses covering ~440,000 individual worksites representing ~34% of all nonfarm payroll employees; companion to the CPS household survey (which produces the unemployment rate); reference week = week containing the 12th of each month; employers report employment and payroll for that week via mail, fax, or electronic submission; response rate ~70%; released first Friday of following month at 8:30am ET under strict BLS embargo; three revision rounds: preliminary (T+30 days), first revision (T+60), second revision (T+90); March annual benchmark revision aligns all monthly CES estimates to QCEW administrative universe (2024 revision: -818,000, signaling QCEW grew slower than CES had estimated, largest downward revision since 2009 Great Recession); headline numbers: total nonfarm payroll employment change (most-watched), private nonfarm change, average hourly earnings (AHE) change year-over-year (Fed-watched wage inflation indicator), average weekly hours; NAICS supersectors: Mining and Logging, Construction, Manufacturing (durable goods and nondurable goods), Trade/Transportation/Utilities, Information, Financial Activities, Professional and Business Services (includes temp staffing as leading indicator), Education and Health Services, Leisure and Hospitality, Other Services, Government; BLS series ID format: CEU + 2-digit supersector code + 6-digit industry code + 2-digit data type (01=employment, 03=average weekly hours, 11=average hourly earnings); examples: CEU0000000001 total nonfarm, CEU0500000001 total private, CEU3000000001 manufacturing, CEU7000000001 leisure/hospitality, CEU0500000011 private AHE; historical context: Great Recession peak-to-trough January 2008 to February 2010 lost 8.7M jobs; COVID-19 April 2020 single-month loss of 20.5M jobs (worst month ever recorded, prior worst was 1,961,000 in September 1945 post-WWII demobilization); COVID recovery: 22M+ jobs recovered by June 2022; AHE all private workers ~$35.40/hour 2024 (unadjusted); real wage growth = AHE nominal growth minus CPI-U; compositional bias in AHE: when lower-wage workers exit workforce, average wage rises even with no individual pay increases (COVID-era artifact); BLS API: api.bls.gov/publicAPI/v2/timeseries/data/ with registration key for 500 series per query and 10 years of data; ADP National Employment Report (private payrolls, published 2 days before Jobs Friday) serves as preview/consensus anchor; Python: fetches 20 CES series via BLS API, prints supersector employment table with year-over-year change, reports AHE and AWH, computes COVID-era recovery percentage for each supersector.

  16. Q4Writing

    BOP federal prison population database deep-dive published

    Long-form article on BOP (Federal Bureau of Prisons, established 1930 under DOJ) inmate population and facility data: BOP administers 121 federal facilities -- United States Penitentiary (USP, high security), Federal Correctional Institution (FCI, medium and low security), Federal Prison Camp (FPC, minimum/satellite camps adjacent to larger facilities), Federal Detention Center (FDC), Metropolitan Detention Center (MDC), Federal Medical Center (FMC), Residential Reentry Center (RRC, contract halfway house); ~148,000 federal inmates as of 2024 (down from peak ~219,000 in 2013 following Fair Sentencing Act retroactivity and FIRST STEP Act implementation); BOP weekly population statistics at bop.gov/about/statistics/: breakdown by offense type, demographics (race/gender/age/citizenship), facility security level, sentence length; offense composition of sentenced population: drug offenses ~43% (methamphetamine ~22%, cocaine and crack ~12%, heroin and opioids ~6%, marijuana declining post-state legalization), weapons/firearms ~18%, sex offenses ~15%, immigration offenses ~8%, robbery ~5%, financial and fraud offenses ~4%; demographic breakdown: ~93% male, ~7% female; race -- White 57%, Black 38% (Black Americans incarcerated at ~3.8x the per-capita rate of White Americans in the federal system), Hispanic 30% (overlap with racial categories in BOP classification scheme); age: largest cohort 31-40, median ~38; federal sentencing framework: Sentencing Reform Act 1984 abolished federal parole and established US Sentencing Commission (USSC) advisory guidelines grid (offense level 1-43 x criminal history category I-VI = guideline range in months); mandatory minimums: 21 USC 841 drug trafficking thresholds (5yr/10yr); 18 USC 924(c) consecutive firearms enhancement; FIRST STEP Act 2018: good time credit increase from 47 to 54 days per year, risk-needs assessment PATTERN tool, expanded compassionate release under 18 USC 3582(c)(1)(A), retroactive application of Fair Sentencing Act crack/powder ratio equalization; COVID-19 home confinement: 7,000+ referrals under CARES Act authority 2020-2021; USSC Monitoring of Federal Criminal Sentences at ussc.gov/research/datafiles (sentencing trends by district, offense, demographics); article includes Python script parsing BOP weekly statistics for population by facility security level and analyzing offense-type distribution trends 2013-2024.

  17. Q4Writing

    CDC drug overdose mortality database deep-dive published

    Long-form article on CDC NCHS drug overdose mortality data: NCHS (National Center for Health Statistics) Multiple Cause of Death (MCOD) database compiled from death certificates filed with state vital statistics offices and transmitted to CDC via the National Vital Statistics System (NVSS); underlying cause of death coded to ICD-10: X40-X44 (accidental poisoning by drugs, medicaments, and biological substances), X60-X64 (intentional self-poisoning), X85 (assault by drugs), Y10-Y14 (undetermined intent); drug-specific T-codes as contributing causes: T40.0 (opium), T40.1 (heroin), T40.2 (natural and semi-synthetic opioids including oxycodone and hydrocodone), T40.3 (methadone), T40.4 (synthetic opioids excluding methadone -- fentanyl and tramadol), T40.5 (cocaine), T43.6 (psychostimulants including methamphetamine); 2023 provisional estimate ~107,500 drug overdose deaths; peak: 2022 ~109,000; fentanyl (T40.4) involved in 73%+ of opioid-involved deaths by 2022; three-wave opioid epidemic framework: Wave 1 (mid-1990s -- prescription opioid surge, OxyContin launched 1996 by Purdue Pharma, company criminal plea 2020 $8.3B settlement); Wave 2 (circa 2010 -- heroin surge as prescription access tightened via Prescription Drug Monitoring Programs and pill mill enforcement); Wave 3 (2013+ -- illicitly manufactured fentanyl and analogs at 50-100x morphine potency, pill press operations producing counterfeit M30 tablets mimicking oxycodone); death certificate underspecification: ~25% of overdose records lack a specific drug T-code, especially in jurisdictions with limited medical examiner resources and inconsistent toxicology testing; naloxone (Narcan) OTC FDA approval March 2023; harm reduction infrastructure: syringe services programs in 38+ states, NYC Overdose Prevention Centers opened November 2021 as first sanctioned supervised consumption sites in US; CDC WONDER query tool at wonder.cdc.gov/mcd.html for custom MCOD extracts; provisional monthly counts at cdc.gov/nchs/nvss/vsrr/drug-overdose-data.htm by state and drug category; VSRR (Vital Statistics Rapid Release) program publishes 12-month rolling estimates with ~6-month lag; SAMHSA NSDUH (National Survey on Drug Use and Health) provides complementary prevalence data; article includes Python script querying CDC WONDER API for state-level opioid overdose mortality 2010-2023 and computing fentanyl-era acceleration factor by state relative to pre-2013 baseline.

  18. Q4Writing

    DOL Form 5500 pension plan database deep-dive published

    Long-form article on the DOL Form 5500 filing system for private-sector employee benefit plans: ERISA (Employee Retirement Income Security Act) 1974 created joint DOL-IRS-PBGC reporting framework; ~800,000 plan filings per year via EFAST2 (ERISA Filing Acceptance System 2) at efast.dol.gov; filing thresholds: Form 5500 for plans with 100+ participants, Form 5500-SF for plans with fewer than 100 participants, Form 5500-EZ for one-participant owner-only plans exempt from ERISA; key data fields: plan sponsor EIN and name, plan year end date, plan type codes (1=defined benefit, 2=money purchase, 3=profit sharing, 4=stock bonus, 6=401k, 7=ESOP, 9=403b), number of participants at beginning/end of year, total plan assets, employer and employee contributions, benefits paid, plan funding arrangement codes (1=insurance, 2=trust, 3=general assets, 4=both); defined benefit (DB) plan trajectory: 114,000 single-employer DB plans in 1983, ~22,000 remaining 2022 as employers shifted to defined contribution; PBGC (Pension Benefit Guaranty Corporation) single-employer maximum guaranteed benefit $81,000/year (2024); multiemployer plans (~1,400 plans, 10M+ union workers): Central States Teamsters ($17B liability), PBGC Special Financial Assistance program (Butch Lewis Emergency Pension Relief Act 2021) allocated $94B to rescue 200+ critically underfunded multiemployer plans; defined contribution dominance: 600,000+ DC plans (401k/profit sharing/ESOP) covering 85M+ participants; $12T+ total private pension assets (2022); vesting schedules: cliff vesting (3 years for DC, 5 years for DB) vs. graded vesting (2-6 years for DC, 3-7 years for DB); SECURE Act 2019 and SECURE 2.0 Act 2022: raised RMD age from 70.5 to 73 (then 75), expanded automatic enrollment mandates, Roth catch-up contributions for high earners, increased catch-up limit to $10,000 for ages 60-63; EFAST2 full-text search at efts.dol.gov and bulk XML download at dol.gov/agencies/ebsa; article includes Python script downloading annual Form 5500 index from EFAST2 API, computing DB vs. DC asset distribution by plan size cohort, and identifying top 50 largest plans by total assets across filing years.

  19. Q4Writing

    BLS OEWS wage statistics database deep-dive published

    Long-form article on the BLS Occupational Employment and Wage Statistics (OEWS) program: annual mail survey of ~1.1 million employer establishments across 800+ industries in 50 states plus DC, Puerto Rico, Guam, and the US Virgin Islands; 830 occupational categories under the Standard Occupational Classification (SOC) system; published each May for the prior reference year; geographic coverage: national, 50 state and DC, 590 metropolitan and nonmetropolitan area estimates; key fields: OCC_CODE (6-digit SOC, e.g., 15-1252 Software Developers), OCC_TITLE, TOT_EMP (employment estimate), EMP_PRSE (percent relative standard error), A_MEAN (annual mean wage), A_MEDIAN (annual median wage), H_MEAN (hourly mean), hourly percentiles H_PCT10/H_PCT25/H_PCTMED/H_PCT75/H_PCT90; national median hourly wage $22.61 all occupations (May 2023); highest-paying occupations: surgeons ($267,020 annual mean), anesthesiologists ($239,200), oral and maxillofacial surgeons ($237,570), obstetricians and gynecologists ($237,340), psychiatrists ($237,000); lowest-paying: fast food and counter workers ($28,960), dishwashers ($29,140), laundry/dry cleaning workers ($29,460); OEWS wage data underpins H-1B and PERM prevailing wage determinations (DOL FLAG/iCERT system uses OEWS-derived wage levels I-IV corresponding to 17th/34th/50th/67th percentiles), Davis-Bacon Act construction wage determinations, Service Contract Act (SCA) wage rates, OMB pay comparability studies for federal pay scales; API: api.bls.gov/publicAPI/v2/timeseries/data/ (OEWS series IDs: OEUS + 5-digit area + 6-digit industry + 8-digit occupation + 2-digit data type; free registration key, 500 series per query); FTP bulk files at bls.gov/oes/special.requests/ as national and state CSV/Excel archives; OEWS supplements BLS Quarterly Census of Employment and Wages (QCEW) frame for employer sampling; SOC major group structure: 23 major groups (00-00 All Occupations, 11-0000 Management, 13-0000 Business and Financial Operations, 15-0000 Computer and Mathematical, 17-0000 Architecture and Engineering, 19-0000 Life/Physical/Social Science, 21-0000 Community and Social Service, 23-0000 Legal, 25-0000 Educational Instruction, 27-0000 Arts/Design/Entertainment, 29-0000 Healthcare Practitioners, 31-0000 Healthcare Support, 33-0000 Protective Service, 35-0000 Food Preparation, 37-0000 Building/Grounds Cleaning, 39-0000 Personal Care/Service, 41-0000 Sales, 43-0000 Office/Administrative Support, 45-0000 Farming/Fishing/Forestry, 47-0000 Construction/Extraction, 49-0000 Installation/Maintenance/Repair, 51-0000 Production, 53-0000 Transportation/Material Moving); article includes Python script calling BLS OEWS API for software developer wages across all 50 states and computing geographic wage premium index relative to national median.

  20. Q4Writing

    FRA railroad accident database deep-dive published

    Long-form article on the FRA (Federal Railroad Administration) accident reporting system: covers every US rail incident since 1975 under 49 CFR Part 225; ~224,000 records total; three main form types -- Form FRA F 6180.54 train accidents (reportable threshold: $11,200 property damage or death/injury/evacuation/hazmat release); Form FRA F 6180.57 highway-rail grade crossing incidents; Form FRA F 6180.55 employee injuries and illnesses; cause code taxonomy: Track (geometry/cross-level/surface/alignment/joint), Equipment (wheels/axles/couplings/trucks), Human Factors (train handling/speed/signal violation/switches), Miscellaneous -- each coded with 5-character cause codes; accident types: Derailment (most common), Collision, Fire/Explosion, Other; East Palestine OH February 3 2023 (Norfolk Southern 32N): 38 cars derailed including 11 hazmat tank cars carrying vinyl chloride and butyl acrylate; controlled burn decision; NTSB issued 37 safety recommendations; FRA Emergency Order requiring hot bearing detector protocol fixes; grade crossing: ~2,000-2,200 vehicle-train collisions annually, ~270-290 deaths, 128,000 public grade crossings; Operation Lifesaver education program; FRA Grade Crossing Inventory; Quiet Zones; Positive Train Control (PTC): mandated by Rail Safety Improvement Act 2008 after Chatsworth CA 2008 collision (25 dead, engineer texting); Class I railroad implementation completed 2020; FRA enforcement: ~140,000 inspections per year across 5 safety disciplines (track, equipment, signal, operating practices, hazmat), ~28,000 violations cited, civil penalties up to $27,904 per violation; CRISI grants $1B+ from IIJA 2021; FRA Safety Data API at safetydata.fra.dot.gov/OfficeofSafety/publicsite/api/ with GetAccidents, GetGradeCrossing, GetEmployeeRailroadInjuries endpoints; article includes Python script querying derailments by state and hazmat releases by commodity type.

  21. Q4Writing

    OPM FedScope federal workforce database deep-dive published

    Long-form article on the OPM (Office of Personnel Management) Central Personnel Data File (CPDF) and FedScope workforce database: ~2.1-2.3M federal civilian employees tracked quarterly (excludes military, USPS, legislative branch, judiciary); largest employers: DOD civilian (~750k), VA (~400k), DHS (~250k), HHS (~90k), DOJ (~120k), Treasury (~90k); General Schedule (GS) pay system covers ~70% of federal workers: GS-1 through GS-15, each with 10 steps; GS-1 Step 1 base $22,270/year, GS-15 Step 10 $163,964 plus locality pay (34 designated areas; DC area +33.26% = ~$217,562); Senior Executive Service (SES): ~9,000 positions, $155,000-$235,000 (2024); Law enforcement officer pay (LEO table) and LEAP (+25% for criminal investigators); FedScope at fedscope.opm.gov provides public cube analysis across 15 dimensions: agency (3-digit code), occupation series (4-digit OPM codes: 0343=Management and Program Analysis, 1811=Criminal Investigation, 2210=IT Management), location, pay plan, grade, work schedule, appointment type (permanent/term/temporary/SES), education, age group, length of service, race/national origin, gender, veterans status, supervisory status; data is aggregate not individual-level (Privacy Act protected); FERS retirement system (1987): 1.1% per year of high-3 salary times years of service (at 62+ with 20 years) plus TSP (Thrift Savings Plan with 5% agency match; G/F/C/S/I funds; L target-date funds) plus Social Security; CSRS pre-1984 (~5% of workforce, no Social Security, higher pension formula); DOGE 2025 workforce reduction: fork-in-the-road email February 2025, ~75,000 acceptances; USAID ~10,000 terminated; HHS ~20,000 (CDC/FDA/NIH); DOE ~1,500; EPA ~1,500; federal union lawsuits (AFGE, NTEU); federal workforce demographics: ~27% veterans (vs ~6% private sector), 44% female, ~65% bachelor+ degree, African Americans overrepresented vs. private sector; data access: fedscope.opm.gov cube explorer downloadable as CSV; opm.gov/data bulk quarterly files; no formal public REST API; FEVS (Federal Employee Viewpoint Survey) annual.

  22. Q4Writing

    NIFC wildfire data database deep-dive published

    Long-form article on the NIFC (National Interagency Fire Center, Boise ID) wildfire data: NIFC coordinates USFS, NPS, BLM, BIA, FWS, and state forestry agencies; publishes authoritative annual wildfire statistics since 1926; 2023: ~56,580 fires burning ~2.7M acres nationally (10-year average ~7M acres/year); record years: 2015 (10.1M acres), 2020 (10.1M acres including CA 4.2M alone), 2017 (10.0M), 2012 (9.3M); fire suppression paradox: Smokey Bear campaign since 1944 + 20th century total suppression policy led to massive fuel accumulation driving larger, more intense 21st century fires; fire suppression costs: USFS alone spent $2.5B in 2017; fire borrowing from non-fire accounts ended by 2018 consolidated appropriations; USFS Fire Occurrence Database (FOD, Karen Short): ~2.3M individual fire records 1992-present in SQLite format at USFS Research Data Archive (RDS-2013-0009); key fields: FIRE_YEAR, DISCOVERY_DATE, NWCG_CAUSE_CLASSIFICATION (Human/Lightning/Unknown), FIRE_SIZE (acres), FIRE_SIZE_CLASS (A through G: A=<0.25 acres through G=5,000+), LATITUDE, LONGITUDE, OWNER_DESCR (federal agency/state/private), COUNTY, STATE; MTBS (Monitoring Trends in Burn Severity): joint USGS-USFS program using pre/post-fire Landsat imagery to compute dNBR (differenced Normalized Burn Ratio) and classify burn severity (unburned/low/moderate/high); perimeters as shapefiles at mtbs.gov; Camp Fire 2018 (Paradise CA, 153,336 acres, 85 deaths, most destructive CA fire history); Lahaina 2023 (Maui HI, 2,200 structures destroyed, 100+ deaths, deadliest US wildfire in 100+ years); ICS-209 incident status reports at famweb.nwcg.gov for extended attack fires; WUI (Wildland-Urban Interface): 43M homes in WUI per Radeloff 2018 PNAS, fastest-growing US land-use type; active fire data: NIFC ArcGIS Feature Service GeoJSON, NASA FIRMS MODIS/VIIRS; climate signal: Williams et al. 2019 PNAS (VPD increasing, burned area correlated), Westerling et al. 2006 Science (fire frequency/duration increase since mid-1980s linked to earlier snowmelt); article includes Python script for NIFC decade-by-decade acreage analysis and active fire GeoJSON query.

  23. Q4Writing

    CFPB Consumer Complaint Database deep-dive published

    Long-form article on the CFPB (Consumer Financial Protection Bureau) Consumer Complaint Database: launched March 2012 under Dodd-Frank 2010; 7M+ complaints as of 2024 growing ~1M+/year; products: credit reporting/repair services (~60% of all complaints -- driven by pandemic-era errors, identity theft freezes, dispute process failures); debt collection (~10%); credit card/prepaid (~8%); mortgage (~7%); checking/savings account (~5%); Equifax/Experian/TransUnion collectively receive 50%+ of all database complaints; key data fields: complaint_id, date_received, product, sub_product, issue, sub_issue, consumer_complaint_narrative (voluntary submission, ~20% of complaints include text, CFPB scrubs PII including names/account numbers), company_public_response (optional company statement), company name, state, zip_code (3-digit partial for privacy), tags (Older American/Servicemember), consumer_consent_provided, submitted_via, date_sent_to_company, company_response_to_consumer (closed with monetary relief/non-monetary relief/explanation/in progress), timely (Yes/No), consumer_disputed (Yes/No/N/A); company name normalization challenge: same entity appears as multiple string variations; major events: COVID-19 2020-2021 mortgage forbearance surge; Biden loan forgiveness announcement 2022-2024 drove 2-3x increase in student loan complaints; BNPL (Buy Now Pay Later) complaints surged 2023-2024; major enforcement actions: Navient $1.85B settlement January 2022 ($1.7B student loan cancellation); Wells Fargo $3.7B December 2022 (largest-ever CFPB action -- fake accounts, auto repossessions, mortgage failures); CFPB total consumer relief ordered: $20B+ since 2011; data API: api.consumerfinance.gov/data-research/consumer-complaints/search (Elasticsearch, no API key, max 10,000 per query, frm offset for pagination); bulk download ~1.5GB+ CSV or JSON; article includes Python urllib script analyzing mortgage complaint distribution by company and response type.

  24. Q4Writing

    NOAA Storm Events database deep-dive published

    Long-form article on the NOAA NCEI Storm Events Database: ~2.1M weather event records since 1950; 48 standardized event types (tornado, hurricane, flash flood, hail, winter storm, wildfire, and more); county-level property damage, crop damage, injuries, and fatalities; bulk download at ncei.noaa.gov/pub/data/swdi/stormevents/csvfiles/ as annual gzip CSVs; DAMAGE_PROPERTY K/M/B suffix encoding requiring custom parsing; CDO REST API at www.ncei.noaa.gov/cdo-web/api/v2/ with free key registration; NOAA Billion-Dollar Disasters tracker (376 events since 1980, $2.6T CPI-adjusted damages; 2023: 28 events exceeding $1B -- record); tornado climatology (EF0-EF5 Enhanced Fujita scale, 2011 Super Outbreak 362 tornadoes/3 days, Dixie Alley shift from traditional Tornado Alley into MS/AL/TN due to climate pattern changes, Moore OK 2013 EF5); hurricane county-level damage records (Harvey $125B/89k claims, Ian $112B/FL, ACE index for season intensity); flood events as deadliest weather type most years (~88 average annual fatalities); climate change signal in increasing damage frequency and extreme precipitation; article includes Python script for downloading annual gzip CSVs, parsing DAMAGE_PROPERTY suffix notation, and ranking top-10 event types by total damage and deaths 2019-2023.

  25. Q4Writing

    FBI NIBRS crime database deep-dive published

    Long-form article on the FBI National Incident-Based Reporting System (NIBRS): replaced summary-level UCR in January 2021 as the FBI's national crime data collection standard; ~15,724 agencies reporting in 2022 covering ~79% of US population; incident-level records with 52 Group A offense categories and 11 Group B citation-only offenses; segment structure -- Incident (date/time/location), Offense (type/method/bias motivation), Victim (age/sex/race/ethnicity/victim-offender relationship), Offender (age/sex/race), Arrestee (disposition/statute), Property (type/value) -- linked by incident number and agency ORI; 88 hate crime bias motivation codes covering race/ethnicity/religion/sexual orientation/disability/gender identity; NYC (NYPD, 8M+ population) began NIBRS submission only in 2023; Crime Data Explorer API at cde.ucr.cjis.gov -- /api/nibrs/{offense}/offense/agencies and /api/nibrs/{offense}/victim/count endpoints; free API key, 1,000 req/day; annual bulk downloads of all segments available; Supplemental Homicide Reports (SHR) since 1976: victim-offender-weapon-circumstance at case level, 40-50% unknown offender in unsolved cases; NIBRS captures only ~43% of violent crimes (complement: NCVS captures unreported victimization); TRAC-NIBRS coverage analysis; article includes Python CDE API script computing violent crime rates per 100k population by state with Census population denominator.

  26. Q4Writing

    SSA Social Security OASDI database deep-dive published

    Long-form article on SSA (Social Security Administration) Old-Age, Survivors, and Disability Insurance data: OASDI is the largest federal program at ~$1.4T annually to ~70M beneficiaries; three components -- OASI (Old-Age and Survivors Insurance, ~58M beneficiaries, ~$1.2T), DI (Disability Insurance, ~8.8M beneficiaries, ~$160B), and SSI (Supplemental Security Income, ~7.5M, ~$65B, means-tested general revenue); established Social Security Act 1935 (FDR); DI added 1956; FICA tax 6.2%+6.2% employee+employer on wages up to $168,600 taxable maximum (2024); OASI Trust Fund ~$2.75T in special-issue Treasury securities; 2034 projected OASI depletion per Trustees Report (77% of scheduled benefits payable without legislation); AIME/PIA benefit formula: Average Indexed Monthly Earnings from 35 highest-earning years indexed to AWI; Primary Insurance Amount = 90% of AIME to first bend point ($1,174) + 32% between bend points ($1,174-$7,078) + 15% above second bend point; FRA 67 for born 1960+; early retirement at 62 with ~30% permanent reduction; delayed retirement credits +8%/year to age 70; SSA public data at data.ssa.gov -- Monthly Statistical Snapshot, Annual Statistical Supplement (Tables 5.A state beneficiaries, 4.B DI allowance rates by state, 6.C SSI by state/category), state/county OASDI CSV; FRED series SSASSHDI and SSARECEIPTSDISABILITY; DI determination: SGA test, severe impairment, Blue Book Listing of Impairments, RFC assessment, vocational grids; ALJ hearing backlog ~1M pending; Social Security Fairness Act January 2025 eliminated WEP/GPO for 3.2M workers; article includes Python script downloading SSA state data and Census 65+ population API to compute retired-worker benefit penetration by state.

  27. Q4Writing

    IRS Exempt Organizations nonprofit database deep-dive published

    Long-form article on the IRS Exempt Organizations Business Master File (BMF) and Form 990 public data: BMF registers 1.26M active tax-exempt organizations -- 501(c)(3) public charities and private foundations (~1M), 501(c)(4) social welfare orgs (~80k), 501(c)(6) trade associations, 501(c)(7) social clubs, 527 political organizations, and 25 other IRC subsection categories; $2.8T in annual revenues (~5.5% of US GDP), ~12M nonprofit employees; BMF published monthly at IRS.gov as tab-delimited flat files (regional files eo1.csv through eo4.csv plus national); key fields: EIN, subsection code (03=501c3), NTEE code (26 major categories A-Z: A=Arts, B=Education, E=Health, I=Crime, J=Employment, P=Human Services, Q=International, R=Civil Rights, T=Philanthropy, X=Religion), ruling date, deductibility code (1=deductible), foundation code (15=private foundation, 17=public charity), asset and income range codes; Form 990 e-file JSON at AWS S3 (s3://irs-form-990/) since 2013 with annual index files at s3://irs-form-990/index_{year}.json; key Form 990 schedules: Part VII compensation (top 5 officers/directors), Schedule A (public support test -- 33.3% threshold for public charity classification), Schedule B (donor list, confidential -- not publicly released), Schedule C (political campaign and lobbying activity), Form 990-PF for private foundations (1.39% net investment income excise tax, 5% minimum distribution requirement, IRC 4941-4945 self-dealing/jeopardizing investment/excess business holding/taxable expenditure rules); Citizens United 2010 + 501(c)(4) anonymous political spending; Lois Lerner/BOLO scandal 2013; Form 8976 notice required for new 501(c)(4)s; ProPublica Nonprofit Explorer API at api.propublica.org/nonprofits/v2/organizations/{ein}.json; major private foundations: Gates ($70B), Ford ($16B), Robert Wood Johnson ($13B); church filing exemption -- largest data gap in EO database; article includes Python script downloading BMF national file, parsing NTEE major codes, and fetching top charities from ProPublica API.

  28. Q4Writing

    USAID foreign assistance data and ForeignAssistance.gov deep-dive published

    Long-form article on USAID (United States Agency for International Development) foreign assistance data: USAID created 1961 by JFK Executive Order 10924 and Foreign Assistance Act; part of US foreign affairs apparatus alongside State Department; ~$40B annual budget (FY2022); implements foreign assistance in 100+ countries; six mission bureaus (Africa, Asia, Europe and Eurasia, Latin America and the Caribbean, Middle East, Democracy/Conflict/Humanitarian Assistance); ForeignAssistance.gov transparency portal (launched 2010 under International Aid Transparency Initiative (IATI) mandate; publishes all US foreign assistance spending from multiple agencies including USAID, State Department, Millennium Challenge Corporation, Peace Corps, PEPFAR coordination office, DOD security assistance; four-dimensional data: by agency, by country, by sector (using OECD Development Assistance Committee sector codes), by implementing partner; covers obligations (legally committed funds) and disbursements (actual cash transferred); API at foreignassistance.gov/api/v1/resources.json with limit/offset pagination, fiscal_year filter, country_code filter, category filter); PEPFAR (President's Emergency Plan for AIDS Relief -- Bush 2003 State of the Union; $15B initial authorization, now $110B+ cumulative investment; 20M+ people on antiretroviral therapy through PEPFAR support; largest bilateral global health initiative in history; operates in 50+ countries primarily Sub-Saharan Africa plus Caribbean; Country Operational Plans (COPs) submitted annually to PEPFAR headquarters for approval; PEPFAR Expenditure Analysis at pepfar.gov; PEPFAR implementing partners: JHPIEGO, Elizabeth Glaser Pediatric AIDS Foundation, Management Sciences for Health, FHI 360, Jhpiego, Population Services International, local faith-based organizations; PEPFAR data at pepfar.gov/data and in ForeignAssistance.gov); DATA Act and USASpending integration (DATA Act 2014 requires USAID to report contracts and grants to USASpending.gov; USAID appears as civilian agency in federal spending; contracts go through FPDS-NG; IATI XML published at iatiregistry.org -- USAID publishes activity-level XML with 40+ IATI data elements; difference in framing: ForeignAssistance.gov focuses on development outcomes/sectors, USASpending focuses on domestic procurement rules); implementing partner concentration (top 10 contractors/grantees receive majority of USAID funding; Chemonics International (Washington DC): ~$1-2B annually for economic growth, food security, health programs across Africa/Asia/MENA; DAI Global (Bethesda MD): democracy/governance, economic growth; Management Systems International (MSI)/Tetra Tech DPK: monitoring and evaluation; AECOM International Development: infrastructure; RTI International (NC): education, health research; Catholic Relief Services: food security, emergency response; Save the Children: child development, education, health; World Vision: food security, WASH; International Rescue Committee: humanitarian; Concern Worldwide; John Snow Inc.; criticism of Beltway Bandit concentration -- top contractors receive 60-70% of USAID funding with limited competition; local partner policy: USAID committed to 25% local procurement by 2025 under Journey to Self-Reliance framework); geographic distribution (top recipient countries FY2022: Ukraine ($1.9B including humanitarian; surged post-Russian invasion February 2022); Ethiopia ($1.1B including drought/conflict humanitarian); DR Congo ($850M); Nigeria ($700M); Jordan ($600M); South Africa ($450M primarily PEPFAR); Mozambique ($400M); Kenya ($350M); Afghanistan ($300M even post-withdrawal); Tanzania ($300M primarily PEPFAR; by region: Sub-Saharan Africa ~35% of total USAID obligations; Near East/South Asia ~20%; Asia ~15%; Latin America/Caribbean ~10%; Europe grew dramatically 2022-2023 with Ukraine); IATI and international transparency infrastructure (IATI standard: machine-readable XML with activity identifier, reporting-org ref, title, description, sector (5-digit OECD DAC code: 12220 = basic health care, 11220 = primary education, 15150 = democratic participation), recipient-country, transaction elements (type: 1=incoming funds, 2=commitment, 3=disbursement, 4=expenditure); D-Portal.org aggregates all IATI publishers globally; OECD CRS/QWIDS: official ODA statistics; AidData.org: research platform tracking global development finance; Panorama.solutions: USAID-specific data visualization; OpenAid.com aggregates IATI); DOGE and 2025 context (January 2025 executive order froze most US foreign assistance; USAID dismantlement proposed by DOGE; AidData/Devex tracking which programs were cut; USASpending data on contract terminations; ForeignAssistance.gov data availability questioned post-DOGE; importance of data as accountability mechanism during political transitions; Congressional notification requirements for major reprogramming); Python script calling ForeignAssistance.gov API with fiscal_year=2023, paginating through all records, computing top-10 recipient countries by obligation_amount, sector distribution for Sub-Saharan Africa vs. Near East (health/democracy/food security/infrastructure composition), top-10 implementing partners by total awards across all countries and sectors, and year-over-year trends for Ukraine, Ethiopia, Afghanistan.

  29. Q4Writing

    PCAOB auditor inspection and audit oversight database deep-dive published

    Long-form article on PCAOB (Public Company Accounting Oversight Board) inspection data: PCAOB created by Sarbanes-Oxley Act 2002 Section 101 after collapse of Enron (October 2001) and WorldCom (July 2002) accounting scandals and dissolution of Arthur Andersen following SEC investigation; predecessor audit oversight was AICPA (American Institute of Certified Public Accountants) self-regulation deemed inadequate -- AICPA peer review program had repeatedly given Arthur Andersen clean reviews despite known audit failures; PCAOB is nonprofit corporation overseen by SEC; 5 Board members appointed by SEC (Chair plus four); SEC provides primary oversight and can approve/reject PCAOB rules; ~$325M annual budget funded entirely by accounting support fees assessed proportionally on SEC-registered issuers based on float; PCAOB registers all accounting firms that prepare or issue audit reports for SEC-registered public companies -- mandatory since 2003; currently ~1,700 registered firms globally across 80+ countries; annual registration fees; global network structure (Big Four: Deloitte Touche Tohmatsu, Ernst & Young, KPMG, PricewaterhouseCoopers are each global networks of legally separate country partnerships -- US firms (Deloitte & Touche LLP, Ernst & Young LLP, KPMG LLP, PricewaterhouseCoopers LLP) are separate registered entities from KPMG China, PwC China, etc.; second-tier firms: Grant Thornton LLP, BDO USA, RSM US, Crowe LLP; regional and local firms: hundreds of small firms auditing smaller SEC registrants, blank check companies, SPACs); inspection program mechanics (PCAOB Section 104 requires annual inspection for firms issuing audit reports for >100 distinct SEC-registered issuers in any calendar year; triennial for firms issuing reports for ≤100 issuers; inspection focuses on: (1) evaluating whether firm performed selected audit engagements in accordance with PCAOB standards, securities laws, and firm QC policies; (2) evaluating adequacy of firm quality control systems; inspectors select engagements using risk-based criteria (unusual financial results, prior inspection findings, complex accounting areas) plus random selection; inspectors have Section 104(d) authority to compel production of workpapers, communications, other audit documentation; inspectors may interview engagement personnel); inspection report structure (Part I.A: audit deficiencies -- published immediately upon report issuance; described as engagement-specific audit failures including: auditor did not perform sufficient procedures to test management claims, auditor failed to evaluate contrary evidence, auditor did not appropriately evaluate going concern risk, auditor failed to test ICFR controls over financial reporting areas; Part I.B: additional deficiencies less severe than Part I.A -- also public; Part II: quality control criticisms -- systemic failures in firm-wide QC systems rather than engagement-specific; published 12 months after firm receives Part II text IF firm has not taken appropriate remedial steps -- if firm remediates satisfactorily, Part II remains confidential; critics argue Part II confidentiality reduces accountability); Big Four deficiency trends (2004-2010: deficiency rates rising across Big Four as inspection program gained rigor; 2012-2016: rates stabilized 35-50%; 2018-2020: PCAOB Chair Doty era stricter standards, rates rising; 2022: KPMG 44% deficiency rate (highest among Big Four), EY 43%, Deloitte 31%, PwC 31%; 2023: improvement, PCAOB Chair Erica Williams announced aggregate decrease; common deficiency areas: revenue recognition (ASC 606 adoption), ICFR/SOX 404(b) (internal control over financial reporting -- auditors must separately opine on ICFR for accelerated filers), goodwill impairment testing (DCF assumptions), CECL credit loss model auditing, going concern assessments, related party transactions, cybersecurity disclosures); HFCAA and Chinese audit access resolution (Holding Foreign Companies Accountable Act 2020: SEC must identify foreign companies using audit firms that PCAOB cannot fully inspect; companies remain on identification list for 3 consecutive years = delisted; China Investment Corp, Alibaba, JD.com, Baidu, NIO and 273 other Chinese-listed US companies were at risk; PCAOB previously received no access to Chinese audit workpapers due to China's state secrecy and national security laws; August 2022: PCAOB, SEC, China Securities Regulatory Commission (CSRC), and Ministry of Finance signed Statement of Protocol governing PCAOB inspection of Chinese accounting firms; October-November 2022: PCAOB team traveled to Hong Kong, conducted first-ever full inspection of KPMG Huazhen LLP and PricewaterhouseCoopers Zhong Tian LLP; PCAOB Board voted 4-1 December 2022 that China provided complete access; 2023-2024 inspections continued; HFCAA delisting threat was primary driver of Chinese company compliance); enforcement (PCAOB Section 105 enforcement authority; adjudication before Board with right to SEC appeal; sanctions: monetary penalties up to $15M per firm, $750,000 per individual; suspension or revocation of firm registration; permanent or temporary bar from association with registered firm; notable enforcement actions: KPMG 2019 $50M SEC/PCAOB settlement for stealing PCAOB inspection plans through employee hired from PCAOB -- systematically used confidential PCAOB information to improve scores on inspected engagements; Deloitte & Touche LLP Brazil $8M 2016; multiple small firm practice bars for independence violations; PCAOB enforcement orders published at pcaobus.org/Enforcement); Critical Audit Matters (CAMs: AS 3101 effective 2019 for large accelerated filers, 2020 for others; CAMs are audit matters arising from the current period audit that involve especially challenging subjective or complex auditor judgment AND communicated to audit committee; examples: goodwill impairment testing, revenue recognition timing, income tax valuation allowances, warranty reserves, business combination purchase price allocation; CAM must describe principal considerations, judgments, audit response -- disclosure expands standard auditor boilerplate); Python script fetching PCAOB registered firm list via undocumented /api endpoint, filtering to US firms and Chinese network firms, fetching inspection report index for 2004-2024, parsing Part I deficiency counts and inspected engagement counts from report HTML, computing year-by-year deficiency rates for Big Four vs. non-Big Four, running keyword analysis on deficiency descriptions to identify most common audit areas, outputting trend table and exporting CSV.

  30. Q4Writing

    Medicare Part D drug spending database deep-dive published

    Long-form article on Medicare Part D prescription drug spending data: Part D created by Medicare Prescription Drug, Improvement, and Modernization Act 2003 (MMA); signed by Bush December 2003; implemented January 2006; voluntary outpatient prescription drug benefit for Medicare beneficiaries administered through private Prescription Drug Plans (PDPs, standalone) and Medicare Advantage with Prescription Drug (MA-PD, integrated) plans; ~50M enrolled beneficiaries (2022); ~$225-230B in total drug spending (2022 CMS Drug Spending Dashboard); federal government net cost approximately $100B/year after plan payments and beneficiary premiums; benefit structure (2024 standard benefit: $545 deductible; initial coverage phase: 25% beneficiary coinsurance for formulary drugs up to initial coverage limit; coverage gap (donut hole): ACA 2010 phased in manufacturer 70% discount for brand drugs and government subsidy for generics, effectively closing gap by 2020; catastrophic coverage phase: after beneficiary spends $8,000 OOP (2024), plan pays 80%, catastrophic subsidy pays 15%, beneficiary pays 5%; Inflation Reduction Act 2022 eliminated catastrophic copay for brand drugs; IRA 2022 $2,000 OOP cap effective January 2025 -- fundamental change eliminating risk of catastrophic drug costs for beneficiaries); IRA drug pricing provisions (Section 1 Part B and D: CMS negotiates Maximum Fair Price (MFP) for drugs that meet criteria -- small molecule drugs after 9 years on market, biologics after 13 years; first 10 drugs announced August 2023: apixaban (Eliquis/BMS-Pfizer), sitagliptin (Januvia/Merck), rivaroxaban (Xarelto/J&J), empagliflozin (Jardiance/BI-Lilly), dapagliflozin (Farxiga/AstraZeneca), sacubitril/valsartan (Entresto/Novartis), etanercept (Enbrel/Amgen), ibrutinib (Imbruvica/AbbVie), ustekinumab (Stelara/J&J), insulin aspart (Fiasp/NovoLog/Novo Nordisk); maximum fair prices announced August 2024, effective January 2026; pharma companies filed multiple lawsuits challenging constitutionality; Section 2: if drug price increases faster than inflation, manufacturer pays CMS inflation rebate -- applies retroactively for price increases above CPI since 2021 baseline; Section 3: $2,000 OOP cap starting 2025); CMS Part D public data (CMS Part D Prescriber Public Use File at data.cms.gov: NPI (National Provider Identifier), provider specialty, drug name, number of beneficiaries receiving drug, number of claims, total day supply, total drug cost -- privacy suppression for <11 claims per prescriber-drug pair; enables: which opioid prescribers have highest claims per beneficiary? Which oncologists prescribe most expensive biologics?; CMS Drug Spending Dashboard: drug-level data including total beneficiaries, total claims, total spending, average spending per claim; formulary database); top drugs by spending (apixaban (Eliquis): ~$14B -- atrial fibrillation anticoagulant; adalimumab (Humira): ~$6B 2022 before biosimilar wave; pembrolizumab (Keytruda): ~$5B and growing -- oncology PD-1 checkpoint inhibitor; rivaroxaban (Xarelto): ~$4-5B; lenalidomide (Revlimid): ~$4B -- multiple myeloma; semaglutide (Ozempic/Wegovy): rapidly growing from ~$1B 2021 to ~$6B+ 2023 as GLP-1 agonist prescribing exploded; specialty drugs (biologics, oncology, gene therapies) now represent ~50% of total drug spending despite <1% of prescription volume -- driven by unit costs of $15,000-$600,000/year per patient); PBM and formulary mechanics (Part D plans negotiate with manufacturers through PBMs (Pharmacy Benefit Managers) for rebates; three dominant PBMs: CVS Caremark, Express Scripts (Cigna), OptumRx (UnitedHealth); PBM rebates reduce net cost to plan but do not reduce patient coinsurance (which is % of list price); insulin controversy: list price $300/vial; manufacturer rebate to PBM often >80%; net cost to plan ~$30-60/vial; patient pays 25% coinsurance of $300 list = $75/vial; IRA 2022 insulin cap $35/month for Medicare beneficiaries; formulary tiers: Tier 1 preferred generic ($0-5 copay), Tier 2 generic, Tier 3 preferred brand, Tier 4 non-preferred brand, Tier 5 specialty ($100+ copay or 25-33% coinsurance); prior authorization, step therapy, quantity limits as utilization management tools; LIS/Extra Help: ~13M Medicare beneficiaries receive full or partial subsidy -- no premium, $0-10 copays for full LIS; dual eligibles (Medicare + Medicaid, ~12M) auto-enrolled in benchmark plans; Low-Income Subsidy pays plan premiums and cost-sharing); biosimilar dynamics (Humira/adalimumab: AbbVie patent fortress 130+ patents; multi-biosimilar launch January 2023 -- Hadlima, Hyrimoz, Cyltezo, Yusimry, Simlandi, Hulio, Idacio all approved; first multi-biosimilar simultaneous launch in history; AbbVie pre-negotiated exclusive deals with PBMs delaying formulary inclusion -- formulary exclusions challenged as anticompetitive; 2024: biosimilar market share growing to 20-40% depending on plan; Part D formulary data shows differential biosimilar adoption; Stelara/ustekinumab biosimilars expected 2025; IRA promotes biosimilar adoption by counting biosimilar savings toward negotiation criteria delay); Python script loading CMS Part D Prescriber PUF via data.cms.gov download, filtering to opioid drug names (buprenorphine, fentanyl, hydrocodone, hydromorphone, methadone, morphine, oxycodone, oxymorphone, tapentadol, tramadol), grouping by provider NPI and specialty, computing opioid claims per beneficiary by specialty (flagging outliers above 3x specialty median), joining to NPPES registry for provider address/ZIP code, ranking top-20 ZIP codes by opioid prescribing intensity, computing state-level opioid claim rate per Part D beneficiary.

  31. Q4Writing

    NFIP flood insurance and National Flood Hazard Layer data deep-dive published

    Long-form article on NFIP (National Flood Insurance Program) data: NFIP created by National Flood Insurance Act 1968 after private insurance market failed for flood coverage (moral hazard: flood damage is localized and correlated rather than dispersed; adverse selection: only high-risk properties buy coverage; private insurers withdrew from flood market); FEMA (Federal Emergency Management Agency) administers; mandatory participation: federal law requires any community receiving federal financial assistance to participate in NFIP and maintain flood damage prevention regulations; ~22,000 communities participate nationwide; ~5 million active flood insurance policies (2022); ~$1.3 trillion in total insurance coverage in force; mandatory purchase requirement: Flood Disaster Protection Act 1973 requires federally-backed mortgages (FHA, VA, Fannie Mae, Freddie Mac, USDA) in Special Flood Hazard Areas to maintain NFIP flood insurance for loan duration; gaps: renters, property owners without mortgages, properties outside SFHAs not required to carry coverage; coverage limits: $250,000 building/$100,000 contents residential; $500,000/$500,000 commercial -- amounts not increased since 1994, limiting protection given inflation; Write-Your-Own (WYO) program: private insurers sell and service NFIP policies but FEMA bears all risk -- insurers receive ~30% expense allowance; large WYO carriers: Allstate, USAA, Travelers, Selective Insurance; flood zones (FEMA Flood Insurance Rate Maps/FIRMs divide all US territory into flood zones: Zone AE (detailed-studied 1% annual chance with BFE); Zone A (approximate 1% annual chance, no BFE); Zone AH (1% annual chance shallow flooding); Zone AO (1% annual chance sheet flow); Zone V (coastal with wave action, highest-risk zone); Zone VE (detailed-studied coastal with BFE); Zone X (areas of 0.2% annual chance flood -- 500-year; or areas of 1% annual chance with low depths/drainage areas, or areas protected by levees); Zone X (unshaded) outside 0.2% chance); major claims history (Hurricane Katrina 2005: ~$16.3B paid on ~267,000 claims -- exceeded NFIP reserves by $15B, requiring Treasury borrowing; Superstorm Sandy 2012: ~$8.6B on ~144,000 claims; Hurricane Harvey 2017 (TX): ~$8.9B on ~89,000 claims -- catastrophic flooding in Houston metro; Hurricane Florence 2018 (NC): ~$1.5B; Hurricane Dorian 2019; Hurricane Ida 2021: ~$3.3B; Hurricane Ian 2022: ~$3.6B (FL); cumulative deficit: NFIP borrowed ~$36B from Treasury post-Katrina; Biggert-Waters Flood Insurance Reform Act 2012 mandated actuarial rate increases and eliminating subsidized rates for primary residences; massive political backlash from coastal communities; Homeowner Flood Insurance Affordability Act 2014 rolled back most increases; repeated temporary reauthorizations as Congress struggled with fiscal and affordability conflicts); Risk Rating 2.0 (FEMA Risk Rating 2.0: Equity in Action -- effective October 1 2021 for new policies, October 2022 for renewal policies; fundamentally changed pricing methodology: replaced flat zone-based rates (all Zone AE properties paying same base rate regardless of flood frequency) with property-specific risk assessment using: flood frequency (how often property floods), types of flooding affecting property (riverine vs. coastal/storm surge vs. surface water/pluvial vs. coastal erosion), distance to water bodies, first floor elevation relative to BFE, foundation type, replacement cost value of structure -- removed arbitrary zone boundaries as primary pricing factor; results: many coastal vacation homes and high-value properties saw sharp increases (some 200-400%); some inland moderately flood-prone areas saw decreases; moderate-value primary residences in high-risk areas often saw increases; 18% annual increase cap protects existing policyholders from shock; 1.2M policies non-renewed or canceled since RR 2.0 due to cost concerns as of 2023; First Street Foundation published alternative risk model showing 14.6M properties at substantial flood risk vs. NFIP's 8.7M; Congressional hearings on RR 2.0 affordability impacts); National Flood Hazard Layer (NFHL: FEMA's comprehensive GIS database of flood zones, base flood elevations, floodways, regulatory floodplains, and cross-sections for all mapped US areas; updated continuously as new FIRMs are issued via Flood Map Status Information Service; access: msc.fema.gov (FEMA Map Service Center) for map lookups by address; NFHL WFS (Web Feature Service) at hazards.fema.gov/femaportal/wps/knowledgebase/NFHL for programmatic GIS access; FEMA Flood Map Change Viewer for monitoring map revision status; BFE data in NFHL; LOMA (Letter of Map Amendment): property owner process to remove single lot or structure from SFHA if natural ground elevation is above BFE; LOMR (Letter of Map Revision): community process to revise map based on new hydrologic analysis; CLOMR (Conditional LOMR) for development projects); OpenFEMA datasets (OpenFEMA at openFEMA.gov provides programmatic access to NFIP data: FimaNfipClaims (claims since 1976; ~2.5M records; fields: reportedZipCode, occupancyType, buildingDamageAmount, contentsDamageAmount, causeOfLoss (flooding types), numberOfFloorsInInsuredBuilding, originalConstructionDate, floodZone, disasterNumber; privacy masking on some geographic fields); FimaNfipPolicies (active and historical policies; ~80M+ records; policy premium, coverage amounts, occupancy type, flood zone, community name, county, state, original construction date)); repetitive loss properties (FEMA identifies repetitive loss structures as those filing ≥2 claims of >$1,000 in 10-year period; severe repetitive loss: ≥4 claims exceeding $5,000 or ≥2 claims exceeding building value; ~25,000 severe repetitive loss structures as of 2022; these ~1% of NFIP properties have accounted for ~25-30% of all NFIP claims dollars historically; FEMA Hazard Mitigation Grant Program (HMGP): federal funds available post-disaster to buy out repetitive loss properties (buyouts) or elevate structures above BFE (mitigation); some Midwest properties flooded 5-7 times and paid out more in claims than original property value; political difficulty removing structures: local opposition, property rights concerns, slow buyout process); climate change trajectory (NOAA 2022 sea level rise scenarios: 0.3m to 1.0m by 2100 depending on emissions pathway; FEMA RR 2.0 incorporates 30-year forward-looking risk projections; growing actuarial gap: rising flood risk vs. affordable premiums vs. Treasury fiscal exposure; Congressional Budget Office analysis: NFIP faces persistent multi-billion-dollar annual actuarial shortfall; managed retreat policy debate -- organized relocation of communities from high-risk flood zones; Harris County TX buyouts post-Harvey; state-level flood risk disclosure requirements (NY, NJ, FL seller disclosure laws)); Python script calling OpenFEMA API FimaNfipClaims endpoint with disasterNumber filter for Harvey disasters (DR-4332, DR-4337), pulling claim records with buildingDamageAmount and reportedZipCode fields, aggregating by county (using FIPS from ZIP-county crosswalk), computing average payout per claim by county, ranking top-20 counties by total claims paid, computing proportion in Zone AE vs. Zone X (flooding of mapped vs. unmapped properties -- Harvey was notable for high claims outside SFHA).

  32. Q4Writing

    FARA Foreign Agents Registration Act database deep-dive published

    Long-form article on FARA (Foreign Agents Registration Act) registration database: FARA enacted 1938 as the Foreign Agents Registration Act (22 U.S.C. §§ 611-621) targeting Nazi propaganda distribution in the US by agents of foreign governments; administered by DOJ National Security Division FARA Unit; requires agents of foreign principals (foreign governments, foreign political parties, foreign persons outside the US who direct, supervise, or control activities of the agent) who engage in covered US activities to register and file semi-annual disclosure statements; covered activities: political activities, lobbying US officials, public relations on behalf of foreign principal, fundraising, political consulting, media placement; registration: Form RA-1 (initial registration within 10 days of commencing covered activities) lists foreign principal identity, terms of engagement, compensation, activities; Form NSD-3 (semi-annual supplement) discloses disbursements exceeding $10, activities undertaken, political contributions, political contacts, materials disseminated; LDA exemption problem (22 U.S.C. § 613(h)): agents who register under the Lobbying Disclosure Act 1995 and whose principal is not a foreign government or foreign political party may use LDA disclosure instead of FARA -- creates gap where agents of state-owned enterprises (sovereign wealth funds, national oil companies, state media) use LDA rather than FARA; DOJ Inspector General 2016 report sharply criticized FARA enforcement as lax, noting DOJ had not criminally prosecuted anyone for failure to register since 1966; scale (~500-600 active FARA registrations at any time; compared to 12,000+ LDA registrations; top registering countries: Saudi Arabia, UAE, Turkey, South Korea, Qatar, Japan, China); Mueller investigation FARA surge 2018-2022 (Paul Manafort: convicted August 2018 on 8 counts including FARA violations related to Ukraine lobbying for Viktor Yanukovych government through Mercury/Podesta Group without registration; Michael Flynn: pleaded guilty December 2017 to lying to FBI about Turkey contacts; retroactively registered as foreign agent for Inovo BV (Turkey-linked) days before inauguration; Rick Gates cooperation; Tony Podesta and Mercury LLC retroactively registered as foreign agents for Ukraine work; Bijan Kian indicted for Iran work; Tom Barrack: charged 2021 with acting as UAE foreign agent, acquitted September 2022); Saudi Arabia post-Khashoggi ($14M+ per year in FARA-reported lobbying fees; $450M+ cumulative since 2016; lobbying firms retained: Squire Patton Boggs (longest relationship), Akin Gump Strauss Hauer & Feld, BGR Group, Harbor Group, Padilla & Associates; activities reported: congressional outreach, op-ed placement, think tank research grants, grassroots advocacy coordination; FARA filings show specific members of Congress contacted, dates, subjects discussed, materials distributed; post-Khashoggi assassination October 2018: contracts maintained despite congressional pressure; FARA filings accessible to journalists); China enforcement (DOJ China Initiative 2018-2022 used FARA as enforcement tool; CGTN (China Global Television Network) registered as foreign agent 2019; Xinhua News Agency registered 2019; academic researchers at universities targeted for non-disclosure of Chinese government ties; Chinese state-owned enterprise US subsidiaries subject to FARA analysis; China-linked influence operations (United Front Work Department); DOJ terminated China Initiative 2022 amid civil liberties criticism but FARA enforcement continued); criminal enforcement (22 U.S.C. § 618: willful failure to register or file supplements is felony punishable up to 5 years imprisonment + fines; 1966-2016: only 7 total prosecutions; post-2016 surge driven by Mueller investigation and DOJ leadership priority; DOJ pattern: use FARA threat to pressure cooperation in broader national security investigations; civil enforcement: injunctions and retroactive registration (Manafort, Flynn, Podesta) more common than criminal prosecution); eFARA electronic system (mandatory electronic filing since 2016; eFARA portal at efile.fara.gov; Electronic Reading Room at justice.gov/nsd-fara: all registrations, amendments, supplements, exhibit A/B documents since 1942; bulk CSV download at efile.fara.gov/bulk/ (registrations, supplements, foreign principals, contacts, disbursements CSVs); OpenSecrets FARA database compilation; POGO project on FARA enforcement; ProPublica and National Security Archive maintain secondary archives); Python script fetching eFARA bulk registrants CSV and supplements CSV, joining on registration number, computing total disbursements by foreign principal country, identifying top-10 registrant law/PR firms by number of foreign principals, charting annual new registration count 2000-2024 with Mueller-era spike annotated.

  33. Q4Writing

    CMS Open Payments pharma payments to physicians deep-dive published

    Long-form article on CMS Open Payments (Physician Payments Sunshine Act database): Physician Payments Sunshine Act enacted as Section 6002 of the Affordable Care Act 2010 (42 U.S.C. § 1320a-7h); CMS administers; requires applicable manufacturers (pharmaceutical manufacturers, medical device manufacturers, biologics companies, medical supply companies with >$100M in annual US sales of covered products) and applicable GPOs to report all payments and transfers of value to covered recipients; covered recipients: physicians (all licensed physicians), teaching hospitals, and since 2022 (CY2021 data) physician assistants, nurse practitioners, clinical nurse specialists, certified registered nurse anesthetists, and certified nurse-midwives; three payment categories (General Payments: consulting fees, speaker fees, food and beverage, travel/lodging, research payments, education, gifts, entertainment, grants, royalties/licenses, current/prospective ownership/investment interests, charitable contributions on behalf of covered recipient; Research Payments: payments for research activities including principal investigator fees, clinical trial payments; Ownership/Investment Interests: stocks, stock options, partnership shares, LLC interests); reporting thresholds ($10 minimum per payment; $100 aggregate per year per recipient; food/beverage aggregated regardless of amount; nature of payment determines category); scale (2022 CY dataset: $12.7B total reported; research payments ~$4B (largest by aggregate but covers smaller number of transactions); general payments ~$2.5B (consulting/speaking/food/travel); ownership/investment interests ~$6.2B (royalties from device inventors); ~2,700 applicable manufacturers reporting; ~900,000 covered recipients receiving payments; top paying companies by general payments: Amgen, Pfizer, Johnson & Johnson, AbbVie, Medtronic; by research payments: large pharma with active clinical trial pipelines; by royalties: Intuitive Surgical (da Vinci robotic surgery device royalties), major orthopedic implant companies (Stryker, Zimmer Biomet, DePuy Synthes) paying royalties to orthopedic surgeons who designed implants); dataset structure (CMS publishes as three separate Socrata datasets at openpaymentsdata.cms.gov and data.cms.gov/open-payments: General Payments dataset fields include applicable_manufacturer_or_gpo_making_payment_name, covered_recipient_npi (National Provider Identifier), covered_recipient_first_name, covered_recipient_last_name, covered_recipient_primary_type_1 (physician specialty from NPI registry), total_amount_of_payment_usdollars, date_of_payment, nature_of_payment_or_transfer_of_value (from 16 standardized categories), product_category_or_therapeutic_area, name_of_drug_or_biological_or_device_or_medical_supply_1, indicate_drug_or_biological_or_device_or_medical_supply_1, physician_primary_type, recipient_city, recipient_state, dispute_status_for_publication; Ownership dataset adds ownership_type (stock/stock option/partnership interest/LLC interest/other equity interest), investment_type; Research dataset adds related_product_indicator, context_of_research); NPI cross-referencing (NPI linkage to NPPES National Plan and Provider Enumeration System enables physician specialty, practice location, group practice affiliation cross-reference; enables analysis: do cardiologists in Group X receive more payments from Device Company Y than Group Z?; NPPES bulk download at download.cms.gov); journalism and research history (ProPublica launched Dollars for Docs 2010 -- six years before CMS database launch -- using voluntary disclosures from seven pharma companies that settled DOJ whistleblower suits; ProPublica Dollars for Docs continues tracking; investigative journalism use: Wall Street Journal orthopedic implant royalties; NYT opioid payments to pain specialists; STAT News oncology payments; NPR/Marketplace pharma marketing spending; academic journals including JAMA, NEJM, BMJ have published extensively); research findings on prescribing (Carey et al. 2021 JAMA Internal Medicine: receipt of any pharma meal or payment associated with increased prescribing of sponsor drug relative to therapeutic alternatives; DeJong et al. 2016 JAMA Internal Medicine: physicians who received meals from pharma reps had higher prescription rates for brand drugs; Yeh et al. 2016 BMJ: receipt of industry payments associated with branded drug prescribing for cardiologists; Ornstein et al. 2016 JAMA: Open Payments analysis finding psychiatrists receiving antipsychotic payments more likely to prescribe those drugs; causality debate: selection (companies target high prescribers) vs. causal (payments influence behavior) -- Carey et al. used matched controls; ongoing academic debate); drug-specific linkage (Open Payments links payments to specific drug/biologic/device names; semaglutide (Ozempic/Wegovy) payments to endocrinologists and PCPs 2022-2024: sharp spike tracking GLP-1 agonist commercial launch; adalimumab (Humira) AbbVie payments to rheumatologists/dermatologists before and after biosimilar entry 2023; PCSK9 inhibitor payments (evolocumab/alirocumab) to cardiologists from Amgen/Sanofi-Regeneron; insulin manufacturer (Eli Lilly, Novo Nordisk, Sanofi) payments to endocrinologists/PCPs amid insulin pricing controversy; da Vinci robotic surgery royalties from Intuitive Surgical to surgeons who hold surgeon-specific instruments); Python script querying data.cms.gov Socrata API for 2022 general payments, filtering by nature_of_payment_or_transfer_of_value == Consulting Fee and total_amount > 10000, grouping by covered_recipient_primary_type_1 (specialty), computing top-10 specialties by total consulting fee payment, then separately pulling all payments linked to GLP-1 agonist drugs (semaglutide/liraglutide/dulaglutide/tirzepatide) and computing total by manufacturer and year.

  34. Q4Writing

    NLRB union elections and unfair labor practice data deep-dive published

    Long-form article on NLRB (National Labor Relations Board) union election and ULP data: NLRB created by the National Labor Relations Act 1935 (Wagner Act); independent federal agency; 5 Board members appointed by President and confirmed by Senate (partisan balance -- no more than 3 from same party); General Counsel (GC) is separately appointed and independently prosecutes cases -- creates frequent GC/Board tension when different party controls each; 26 regional offices process cases; NLRA covers most private-sector employees; exclusions: agricultural laborers (Taft-Hartley confirmed), domestic service workers, independent contractors (constantly litigated including SuperShuttle 2019, Atlanta Opera 2023), supervisors (Taft-Hartley 1947; supervisor definition also litigated -- Oakwood Healthcare 2006), employees of employers subject to Railway Labor Act (airlines and railroads), managers, and confidential employees; representation election mechanics (petition types: RC = employees or union petition for certification election; RM = employer petition when union claims majority or seeks election to test support; RD = decertification petition by employees to remove incumbent union; UC = unit clarification to add/remove positions; AC = amendment of certification; RC requires 25-30% showing of interest -- union submits authorization cards signed by at least 30% of proposed unit; regional director investigates appropriate unit applying community-of-interest factors: job functions, skills, working conditions, supervision, integration of operations, bargaining history; pre-election hearing on contested issues; consent or directed election; NLRA Section 9(c)(1) election -- secret ballot; winning requires majority of valid ballots cast (not majority of unit); election bar: once election held, no new election for 12 months; contract bar: during term of collective bargaining agreement up to 3-year max); ambush election rule history (2011: Obama NLRB proposed streamlined election rules; 2012: D.C. Circuit struck down on quorum grounds; 2014: final rule dramatically shortened pre-election timeline from average 38 days to ~23 days; eliminated pre-election litigation of voter eligibility, to be resolved post-election if determinative; required employer to provide employee list (Excelsior list) with names, addresses, phone numbers within 2 business days of direction of election; 2015: Republican Congress passed CRA resolution to overturn, Obama vetoed; 2019: Trump NLRB reinstated blocking charge rule, voluntary recognition bar; 2023: Biden NLRB GC McLaren Macomb memo + final election rule restored 2014 ambush rule provisions including blocking charges and voluntary recognition bar; current average petition-to-election time: ~23 days for consent elections); election statistics and trends (1950s peak: ~7,500 RC petitions/year; 1970s peak: ~8,000 petitions/year; steady decline to ~2,000-2,500 petitions 2000s-2010s; FY2022: ~2,510 petitions -- 57% increase from prior year, largely Amazon/Starbucks effect; FY2023: back to ~2,300; union win rate: ~52-55% 1990s-2010s; recent years ~65-70% partly due to smaller unit elections where unions have stronger pre-vote organization; average unit size declining as unions file for smaller, more winnable bargaining units); Amazon and Starbucks organizing (Amazon Labor Union Staten Island LDJ5 warehouse: April 1 2022 election -- 2,654 in favor, 2,131 against -- first successful Amazon union in US; Christian Smalls-led organizing committee; spontaneous worker organizing without established union backing; ALU filed petitions for JFK8 Amazon facility (lost election 618-380 March 2022) and additional Staten Island facilities; Starbucks Workers United (SBWU, affiliated with Workers United/SEIU): first store unionized December 2021 Buffalo NY; wave spread to 400+ store petitions by mid-2023; National Labor Relations Board prosecuted Starbucks for 8(a)(3) violations -- closing pro-union stores, firing union organizers -- and won multiple reinstatement orders; Section 10(j) injunctions sought against Starbucks); ULP charges (15,000-20,000 charges filed annually; CA charges = unfair labor practices by employers (8(a)(1): interfere/restrain/coerce; 8(a)(2): dominate/assist company union; 8(a)(3): discriminate to discourage union membership; 8(a)(4): retaliate for filing NLRB charge; 8(a)(5): refuse to bargain collectively in good faith); CB charges = unfair labor practices by unions (8(b)(1)(A): restrain/coerce employees; 8(b)(3): refuse to bargain; 8(b)(7): picketing without petition); charge → NLRB regional investigation → complaint or dismissal → pre-trial settlement or ALJ (administrative law judge) hearing → NLRB Board decision → circuit court of appeals review; Gissel bargaining orders (NLRB v. Gissel Packing 1969): when employer ULPs made fair election impossible, Board can order recognition without election; Biden-era surge in Gissel order requests for Amazon/Starbucks); key doctrines and precedents (Joy Silk doctrine 1949: employer required to recognize union unless good-faith doubt about majority; overruled in Linden Lumber 1974 requiring employer to only accept NLRB election; McLaren Macomb 2023: separation/severance agreements containing broad confidentiality or non-disparagement clauses violate NLRA Section 7 rights; Cemex Construction Materials 2023: mandatory arbitration clause conditioning employment violates NLRA; Atlanta Opera 2023: musicians are employees not independent contractors, rejecting SuperShuttle test); public data and access (nlrb.gov case search at nlrb.gov/cases-decisions; election case search; NLRB decisions database back to 1935; annual election results spreadsheet download at nlrb.gov/reports/graphs-data/recent-election-results (CSV format with case number, employer, union, region, unit size, votes for/against, election date)); NLRB Data Dashboard at nlrb.gov showing petition filings by month; Annual Report with detailed statistics; press releases for significant decisions; FOIA requests for regional case files); Python script downloading NLRB FY2024 election CSV, normalizing column names, computing win rates by union (top 10 unions), by NAICS industry sector (petition volume and win rate), average days from petition to election by regional office, monthly petition filing trend 2019-2024 overlaid with Amazon/Starbucks spike annotation.

  35. Q4Writing

    ATF firearm trace data and crime gun database deep-dive published

    Long-form article on ATF eTrace firearm trace data and NIBIN ballistic intelligence network: ATF (Bureau of Alcohol, Tobacco, Firearms and Explosives) within DOJ administers eTrace (Electronic Tracing System) -- the federal system for tracing crime guns from point-of-manufacture to first retail purchaser; approximately 350,000-400,000 trace requests processed annually from law enforcement agencies; trace mechanics (law enforcement agency recovers firearm at crime scene → submits trace request via eTrace to ATF → ATF queries manufacturer/importer using firearm serial number, make, model, caliber → manufacturer identifies wholesale distributor of first sale → ATF queries distributor → distributor identifies FFL who purchased at wholesale → ATF queries that FFL → FFL checks its Form 4473 records → identifies first retail purchaser name, address, DOB, ID type; if FFL subsequently sold to another FFL, trace continues until first retail purchaser found; average trace takes hours to a few days; time-to-crime (TTC) = interval between first retail sale date and recovery date; national average TTC ~7-8 years; short TTC -- especially under 3 years -- is primary trafficking indicator; 21% of handguns traced nationwide had TTC under 3 years; multiple traces to same purchaser flags straw purchase pattern); Tiahrt Amendments (Rep. Todd Tiahrt (R-KS) first attached rider to FY2004 Commerce/Justice/State appropriations bill; three main prohibitions: (1) ATF prohibited from releasing firearm trace data to the public or in civil litigation against gun manufacturers/dealers; (2) ATF prohibited from requiring FFLs to conduct physical inventory audits; (3) trace data only available to law enforcement for specific criminal investigations; Chicago v. Beretta (2004) and other municipal gun litigation was direct motivation -- cities sought trace data to prove dealers knowingly supplied illegal traffickers; ATF aggregate data (not individual traces) published annually in state trace data tables; ATF Firearms Commerce in the United States annual statistical report); ATF annual state trace data (ATF publishes annual PDF reports at atf.gov/resource-center/firearms-trace-data by state; each state report includes: top source states for traced firearms recovered in that state; top counties of recovery; firearm types traced; time-to-crime distribution; top 10 FFL dealer sources for traced firearms; researchers use these aggregate tables to study Iron Pipeline and interstate trafficking); Iron Pipeline (term coined by BATFE/media for southeast-to-northeast trafficking corridor; regulatory arbitrage: Georgia, South Carolina, Virginia, North Carolina, Florida have no waiting periods, no universal background check requirements for private sales, no handgun permit requirements, one-gun-per-month limits repealed; New York, New Jersey, Maryland, Massachusetts, Connecticut have stricter laws; ATF traces show GA, SC, VA, FL account for disproportionate share of crime guns recovered in NYC, NJ, MD; Indiana supplies Chicago; Mississippi-to-Chicago pipeline; straw purchasing rings exploit weak-law states; traffickers drive I-95 corridor (the Iron Pipeline highway) with multiple firearms purchased from licensed dealers using straw purchasers); FFL compliance infrastructure (nine FFL license types: Type 1 dealer/gunsmith; Type 2 pawnbroker; Type 3 collector; Type 6 ammunition manufacturer; Type 7 firearms manufacturer; Type 8 importer; Type 9/10/11 FFL/SOT for destructive devices/machineguns; ~130,000 active FFL licenses (Type 1+2+7+8); ATF field offices conduct compliance inspections -- only 5-7% of FFLs inspected per year due to resource constraints; significant violations: dealing without a license, failing to conduct NICS background check, falsifying Form 4473, allowing straw purchases; GAO found FFLs with significant trace concentrations rarely lose licenses); Out-of-Business Records Center (OBRC at Martinsburg WV: when an FFL goes out of business, all Form 4473 records transferred to OBRC; 920M+ paper and digital records as of 2024; searchable by ATF for trace purposes; FOPA 1986 Firearm Owners Protection Act Section 926(a) prohibits ATF from creating computerized national registry of non-NFA firearms -- OBRC searches remain manual/semi-manual for non-NFA traces); NIBIN (National Integrated Ballistic Information Network: ballistic imaging network operated by ATF; when law enforcement recovers spent cartridge cases or bullets, they are imaged and entered into NIBIN; automated algorithm compares new entry against all prior entries -- generates candidate leads for human examiner review; manufactured by Agiletek (formerly Forensic Technology) and Cadillac Forensics; 300+ NIBIN sites at law enforcement agencies; approximately 7,300 correlation leads per week FY2023; links: fired cartridge case at aggravated assault to same gun used in subsequent murder; links serial shooter across multiple crime scenes; particularly valuable for illegal firearms without traceable serial numbers; no public access -- law enforcement only; AFTE (Association of Firearm and Tool Mark Examiners) methodology; federal courts accept NIBIN evidence under Daubert); data access and proxies (because Tiahrt restricts trace data, researchers rely on: GVA (Gun Violence Archive) -- nonprofit tracking US gun violence incidents 2013+; Mother Jones database (mass shootings 1982+); FBI UCR/NIBRS Supplemental Homicide Reports (offender/victim relationship, weapon type, circumstances, back to 1976); CDC WISQARS fatal injury data (ICD-10 external cause codes); ATF annual Firearms Commerce in the United States; Violence Policy Center and Everytown for Gun Safety analysis of ATF annual trace reports); Python script using publicly available ATF state trace PDF tables (parsed to CSV): compute TTC distribution by state for handguns vs. long guns, calculate percentage of guns with TTC under 3 years (trafficking indicator) by state, rank top source states for crime guns recovered in highest-burden states (NY, IL, CA, MD), plot Iron Pipeline corridor visualization using matplotlib/geopandas, annotate top FFL source counties per ATF aggregate data.

  36. Q4Writing

    FDIC institution database and banking system overview deep-dive published

    Long-form article on FDIC BankFind Suite institution database: FDIC (Federal Deposit Insurance Corporation) established 1933 post-Depression; BankFind Suite (banks.data.fdic.gov) provides institution profiles for all ~4,600 currently active FDIC-insured banks and thrifts plus 10,000+ historical institutions dating to 1934; charter type taxonomy (N = national bank, OCC-chartered; SM = state member bank, regulated by Federal Reserve; NM = state nonmember bank, FDIC-regulated; SA = state savings association, OCC; SB = state savings bank, FDIC; OI = other insured institution; charter class determines primary federal regulator and examination schedule); dual banking system (US unique two-tier structure: state charter = state banking department primary regulator with either Fed (state member) or FDIC (state nonmember) as federal regulator; national charter = OCC primary; regulatory competition hypothesis: states compete to attract bank charters, leading to innovation but also concerns about race-to-the-bottom in regulation; Marquette National Bank v. First Omaha 1978 enabled national bank credit card rate exportation -- Delaware and South Dakota became credit card hub states); banking consolidation (14,000+ FDIC-insured institutions in 1984 to ~4,600 today -- 67% reduction; drivers: S&L/thrift crisis 1986-1995 (1,000+ S&L failures, $160B RTC cost); Riegle-Neal Interstate Banking and Branching Efficiency Act 1994 eliminated state-by-state branching barriers; Gramm-Leach-Bliley 1999 repealed Glass-Steagall barriers between banking/securities/insurance; technology reducing economies of scale advantages of small community banks; post-GFC wave 2008-2012 (465+ FDIC-assisted transactions); ongoing community bank M&A 200-300/year; biggest acquirers: JPMorgan Chase, Bank of America, Wells Fargo built through acquisition chains); community bank definition (under $10B assets per Dodd-Frank qualified mortgage provisions; ~4,400 community banks (95% of institutions) hold ~15% of banking assets; mega-banks JPMorgan $3.4T, BofA $3.2T, WF $1.9T, Citi $2.4T hold ~45% of assets; FDIC SPECGRP specialization codes: 1=agricultural, 2=credit card, 3=international, 4=mortgage, 5=consumer, 6=commercial lending); Summary of Deposits (annual survey June 30 reference date; branch-level deposit data for all FDIC-insured offices; banking desert metric: FDIC identifies census tracts with no bank or credit union within 10 miles as banking deserts -- 4.5M unbanked households 2021 FDIC survey; data at banks.data.fdic.gov/api/summary); CRA (Community Reinvestment Act 1977: banks have affirmative obligation to serve credit needs of low-to-moderate income communities; four exam ratings: Outstanding, Satisfactory, Needs to Improve, Substantial Noncompliance; OCC/Fed/FDIC conduct CRA exams on regular schedule; 2023 joint CRA reform: new asset-based tests, updated assessment areas for digital banking; public file at bank and online; CRA ratings public database); BankFind API mechanics (banks.data.fdic.gov/api/institutions endpoint; CERT = 5-digit FDIC certificate number (unique institution identifier stable across acquisitions until charter termination); ACTIVE filter; ASSET in thousands of dollars; CLASSP charter class; STALP state abbreviation; ESTYMD established date; SPECGRP specialization; HCTMULT holding company indicator; 10,000 record page limit with offset pagination; no API key required; also /summary, /history (charter changes/mergers), /financials (Call Report data), /locations (branch-level) endpoints); Python script fetching all active institutions, classifying by asset tier (community <$1B, regional $1B-$50B, large >$50B), ranking top-20 states by institution count, computing average asset size by charter class, plotting asset tier composition bar chart.

  37. Q4Writing

    FMCSA crash data and commercial vehicle safety deep-dive published

    Long-form article on FMCSA crash data: MCMIS (Motor Carrier Management Information System) as the federal repository for all reportable CMV crashes -- trucks >10,001 lbs GVWR and passenger vehicles with 9+ occupants; reportable if fatality, injury, or towed vehicle; ~500,000 reportable CMV crashes/year; MCMIS also tracks 3.5M annual roadside inspections, carrier SMS scores, driver medical qualification, carrier registration; fatality statistics (5,837 large truck fatalities in 2022 -- highest since 2005; 875 truck occupants, ~4,700 passenger vehicle occupants and other road users killed; asymmetric crash physics; bus fatalities ~275/year; long-term trend: declined from 4,429 (1985) to 3,321 (2009) with safety regulations and economic slowdown, then rose with e-commerce boom to new highs; leading location: daytime rural highways; leading factors: speeding, driver fatigue, improper lookout); LTCCS Large Truck Crash Causation Study (FMCSA 963-crash in-depth study; critical event vs. critical reason framework; 55% truck driver critical reason; among truck-assigned: 87% driver error -- decision errors (speeding, misjudging gap), recognition errors (inattention, distraction), performance errors (overcompensation, panic); fatigue in ~13% of truck-critical crashes; HOS violations implicated in fatigue subset); HOS regulations 49 CFR Part 395 (property-carrying: 11-hr driving/day, 14-hr on-duty window, 30-min break after 8 hrs driving, 60/70-hr weekly limit in 7/8 consecutive days, 34-hr restart; ELD mandate December 2017 -- GPS-integrated electronic logging replacing paper; ELD mandate reduced HOS violations detectable at roadside inspections; fatigue still #1 systemic risk factor); CSA Safety Measurement System (7 BASICs: Unsafe Driving, HOS Compliance, Driver Fitness, Controlled Substances/Alcohol, Vehicle Maintenance, HazMat Compliance, Crash Indicator; monthly percentile scoring vs. peer carriers; intervention thresholds 65th-80th percentile; ATA v. FMCSA 2019: D.C. Circuit sided with trucking industry, BASIC percentile scores removed from public-facing SMS website); roadside inspections (3.5M/year; Level I full inspection, Level II walk-around, Level III driver only; vehicle OOS ~20% -- brakes/tires/lights/coupling; driver OOS ~5% -- HOS, expired medical certificate, suspended CDL; inspection data to MCMIS within 24 hours; 24-month rolling window in SMS); data access (SAFER at safer.fmcsa.dot.gov for carrier search; A&I at ai.fmcsa.dot.gov for state-level statistics; FMCSA public SMS API at mobile.fmcsa.dot.gov/qc/services/carriers/{dot_number}/crashes; MCMIS research snapshots via data request; NHTSA FARS as fatal crash complement); industry context (3.5M drivers, ~750,000 carriers, 350,000 owner-operators; Amazon DSP and gig delivery light commercial vehicles below 10,001 lb threshold -- regulatory gap; ELD mandate created behavioral telematics data infrastructure; underride guard standards under IIHS pressure -- 49 CFR Part 393.86); Python script computing state fatality rates per 100M VMT, time-of-day and road-type breakdowns, critical reason attribution from LTCCS, carrier-level API crash lookup.

  38. Q4Writing

    CRS congressional research reports and public policy analysis deep-dive published

    Long-form article on Congressional Research Service (CRS) reports database: CRS established 1914 as the Legislative Reference Service, renamed CRS 1970; housed within Library of Congress; mission: provide comprehensive, authoritative, objective research and analysis to Congress on legislative, oversight, and policy questions; 700 analysts and information professionals across 7 analytical divisions (Government and Finance Division, American Law Division, Domestic Social Policy Division, Foreign Affairs, Defense and Trade Division, Resources, Science and Industry Division, Knowledge Services Group, Office of the Director); 6 CRS product types (Reports: comprehensive authoritative analyses typically 20-80 pages, cited in committee hearings and floor debates; Insight: 2-4 page analysis of current issues and recent developments; In Focus: 2-page quick overview reference; Legal Sidebar: 2-4 page legal analysis of judicial decisions and legislative developments; Report Updates: periodic updates to Reports when statutes or case law change; Congressional Testimonies: formal CRS testimony at committee hearings); 25+ policy subject areas (Agriculture, Appropriations, Budget and Tax Policy, CRS Products on Congress, Defense, Education, Elections, Energy, Environment and Natural Resources, Financial Services, Foreign Affairs and National Defense, Government Operations, Health, Homeland Security, Housing, Immigration, Information Technology, Labor and Employment, Law, Natural Resources, Science and Technology, Social Policy, Trade, Transportation and Infrastructure, Veterans Affairs and Military Personnel); public access mandate and history (historically CRS products made available exclusively to members of Congress, congressional staff, and official congressional users -- not released publicly; advocates for transparency including Federation of American Scientists (FAS) and OpenCRS.com filed under FOIA (reports classified as congressional documents exempt from FOIA); the 2012 Coburn incident: CRS produced report in September 2012 finding no statistically significant correlation between top marginal income tax rates and economic growth over 1945-2010; then-Senator Tom Coburn (R-OK) pressured CRS to withdraw the report; CRS complied October 2012 but released publicly November 2012 after public pressure -- became landmark case for public access; Consolidated Appropriations Act FY2018 (PL 115-141, Section 104): first statutory public release mandate for non-confidential CRS reports; crsreports.congress.gov launched 2018 as official portal; as of 2024 over 9,000 reports publicly available); EveryCRSReport.com (maintained by FAS + Demand Progress; 16,000+ reports including pre-2018 collection; API at everycrsreport.com/reports.json returns full index array; each report object: id (R12345/IF11234 format), title, topics (array of subject area strings), date (most recent version), versions (array of revision objects newest-first, each with date/formats/id/retrieved); individual report JSON at everycrsreport.com/reports/{id}.json; no API key required; bulk access enables topic trend analysis); comparison to GAO and CBO (GAO Government Accountability Office: investigative/audit arm, formally requested by committee chairs and ranking members, public reports with agency responses, longer production timeline 6-18 months; CBO Congressional Budget Office: score legislation for budgetary impact, produce economic forecasts, produce mandated reports including Medicare/Social Security trust fund analyses, CBO baseline budget projections; CRS differs: faster turnaround (days to weeks), confidential until released, covers legal/policy/factual questions beyond budget scoring); Python script downloading EveryCRSReport.com index JSON, computing report counts by policy area topic, publication frequency by year (2010-present), identifying most frequently updated reports by version count, analyzing report types distribution (R/IF/IN/LSB prefix patterns), ranking top policy areas by new report output in 2024.

  39. Q4Writing

    NIST NVD cybersecurity vulnerability database deep-dive published

    Long-form article on NIST National Vulnerability Database (NVD): US government repository of standards-based vulnerability management data; NIST enriches CVE records from MITRE/CVE.org (CISA-funded) with CVSS scores, CWE classifications, CPE affected product lists, references; 250,000+ CVE entries as of 2024; published since 1999, significantly expanded 2005; CVE assignment system (400+ CNAs including Microsoft, Google, Apple, Red Hat, Cisco; MITRE as root CNA; ~28,000 new CVEs assigned in 2023; 2024 NVD backlog crisis: 10,000+ CVEs pending CVSS enrichment); CVSS v3.1 base score components (Attack Vector N/A/L/P, Attack Complexity L/H, Privileges Required N/L/H, User Interaction N/R, Scope U/C, CIA impact N/L/H); score ranges Critical 9.0-10.0, High 7.0-8.9, Medium 4.0-6.9, Low 0.1-3.9; Temporal Score (exploit maturity, remediation availability) and Environmental Score (organizational context adjustments); CISA KEV catalog: 1,000+ confirmed-exploited CVEs, EO 14028 and BOD 22-01 require federal civilian agencies to patch KEV vulnerabilities within 14 days for critical; CWE taxonomy (CWE-787 Out-of-Bounds Write, CWE-79 XSS, CWE-89 SQL Injection, CWE-416 Use After Free, CWE-476 NULL Pointer Dereference -- NSA/CISA memory safety campaign); landmark CVEs (Log4Shell CVE-2021-44228 CVSS 10.0 JNDI injection in Log4j2; EternalBlue CVE-2017-0144 CVSS 9.3 WannaCry/NotPetya SMBv1; Heartbleed CVE-2014-0160 CVSS 7.5 OpenSSL buffer over-read; HTTP/2 Rapid Reset CVE-2023-44487 first DDoS-class CVE 201M req/sec); compliance applications (FedRAMP 30-day Critical/90-day Medium patching; PCI DSS High/Critical scan remediation; FISMA SP 800-53 RA-5; SOC 2 audits); NVD REST API v2.0 /rest/json/cves/2.0 with cvssV3Severity/cweId/cpeMatchString/hasKev parameters, 2000 results/page, 120-day date window, API key for 50 req/30s; Python analysis of 2024 Critical CVEs: weekly publication volumes, top-10 CWE distribution, CISA KEV overlap percentage.

  40. Q4Writing

    EPA Greenhouse Gas Reporting Program deep-dive published

    Long-form article on EPA GHGRP: mandatory facility-level GHG emissions reporting established under CAA Section 114, 40 CFR Part 98 (Mandatory Reporting of Greenhouse Gases Rule, promulgated 2009); reporting threshold 25,000 metric tons CO2-equivalent per year from stationary sources; ~8,000 facilities report annually, covering ~85-90% of US stationary source GHG emissions; 18-month reporting lag (CY 2023 emissions published late 2024); 41 subpart source categories including Subpart D (electricity generation), Subpart W (petroleum and natural gas systems -- largest by facility count, covering gathering/boosting/processing/transmission/storage/distribution/LNG), Subpart Y (petroleum refining), Subpart AA (pulp and paper), Subpart C (general stationary fuel combustion), Subpart HH (municipal solid waste landfills), Subpart FF (underground coal mines); not covered: agriculture (EPA does not regulate ag GHG under GHGRP), mobile sources (transportation), residential/commercial building operations; six covered GHGs with IPCC AR5 GWP100 values used (CO2 = 1 by definition; CH4 = 28-34x CO2 depending on fossil vs. biogenic; N2O = 265-298x; HFCs: HFC-23 = 12,400x, HFC-134a = 1,300x, HFC-32 = 677x; PFCs: CF4 = 6,630x, C2F6 = 11,100x; SF6 = 23,500x -- used in high-voltage electrical switchgear); reporting calculated as metric tons CO2-equivalent (CO2e) = mass x GWP100; key findings from GHGRP data (power sector: electric generation historically ~30-35% of total US GHG but declining as coal retires and gas/renewables expand; GHGRP shows plant-level trajectory -- coal plant James H. Miller Jr. (Alabama Power/Southern Co) among highest single-site emitters historically; ExxonMobil Baytown TX complex among highest industrial emitters; 2022 total power sector: ~1.5B tCO2e reported to GHGRP); Scope 1 only (GHGRP covers direct facility emissions; Scope 2 = purchased electricity not in GHGRP; Scope 3 = value chain excluded; separate voluntary disclosure via CDP and Science Based Targets initiative); FLIGHT tool (Facility Level Information on GreenHouse Gases Tool at ghgdata.epa.gov/ghgp/main.do: maps all reporting facilities, time-series charts by gas and subpart, comparison across facilities, downloadable CSV export by year and sector); data access (ECHO Enforcement and Compliance History Online at echo.epa.gov: annual GHGRP bulk download CSV with facility name, parent company, city/state, latitude/longitude, industry type, GHG quantities by gas and subpart; Envirofacts envirofacts.epa.gov/enviro/ with multi-system query; EPA ENVIRO API for programmatic access); policy applications (EPA power plant CO2 standards under Section 111(d); state carbon pricing -- CA CARB uses separate mandatory reporting similar to GHGRP; academic methane vs. satellite studies; TCFD climate-related financial disclosures; CDP corporate reporting cross-reference); satellite validation controversy (TROPOMI Sentinel-5P, GHGSat commercial satellites, Carbon Mapper, MethaneSAT -- repeatedly detect higher methane plumes from Permian Basin TX/NM and other oil/gas basins than bottom-up GHGRP Subpart W calculations; discrepancy 50-100% in some basin-level estimates; EPA responded with 2024 Subpart W final rule significantly revising calculation methodologies, adding equipment-specific emission factors and requiring more frequent monitoring); Python script loading ECHO GHGRP bulk ZIP, grouping by state and industry sector, computing top-10 emitting states, average emissions per facility by sector type, top-20 individual facilities by CO2e, sector composition pie chart.

  41. Q4Writing

    DOJ Antitrust Division enforcement data deep-dive published

    Long-form article on DOJ Antitrust Division: created 1933 as part of DOJ under Attorney General; enforces Sherman Antitrust Act (1890: Section 1 prohibits contracts/combinations/conspiracies in restraint of trade -- per se illegal for horizontal price-fixing, bid-rigging, market allocation; Section 2 prohibits monopolization and attempted monopolization -- rule of reason standard), Clayton Act (1914: Section 7 prohibits mergers/acquisitions that substantially lessen competition or tend to create monopoly; Section 8 prohibits interlocking directorates among competitors), and Hart-Scott-Rodino Antitrust Improvements Act (1976: pre-merger notification and waiting period); DOJ has criminal enforcement authority under Sherman Act (grand jury investigation, indictment, felony conviction up to 10 years prison + $1M fine for individuals, $100M fine for corporations) -- FTC does not have criminal authority; both DOJ and FTC have civil merger review authority; budget ~$400M, ~800 attorneys; HSR pre-merger notification mechanics (parties to transactions meeting HSR thresholds must file Forms with DOJ and FTC and observe 30-day initial waiting period; 2024 HSR size-of-transaction threshold: $119.5M adjusted annually for GNP changes; size-of-person threshold: $23.9M; Hart-Scott-Rodino annual report publishes aggregate statistics on filings, requests, enforcement; ~1,500-2,000 HSR filings per year; ~3% receive Second Request for additional documents; Second Request starts new 30-day waiting period after full compliance -- can take 12-18+ months for complex deals; civil penalty for failure to file $51,744/day per violation; DOJ Antitrust can seek preliminary injunction in federal district court to block merger during review); merger review process and 2023 guidelines (2010 Horizontal Merger Guidelines updated January 2023 by Biden DOJ/FTC -- more aggressive standards: HHI thresholds remain (unconcentrated < 1,500; moderately concentrated 1,500-2,500; highly concentrated > 2,500) but new presumption of harm at delta HHI > 100 in highly concentrated markets, down from > 200; new vertical merger concern framework; labor market concentration -- merger that increases monopsony power in labor markets scrutinized; platform/digital market guidance; high-profile merger challenges: AT&T-Time Warner (2018) -- DOJ challenged vertical merger but lost at trial; JetBlue-Spirit (2024) -- DOJ blocked, structural remedies insufficient; UnitedHealth-Change Healthcare (2024) -- DOJ blocking; Microsoft-Activision (2022-2023) -- FTC challenged, not DOJ; Amazon-iRobot (2024) -- abandoned amid scrutiny); criminal cartel enforcement and leniency program (Sherman Act criminal enforcement: price-fixing among competitors per se illegal; grand jury secrecy, corporate leniency since 1993 -- first company to self-report cartel gets automatic leniency (no criminal fine, no jail for cooperating employees); subsequent leniency applicants (Leniency Plus) negotiate reduced fines and cooperation credit; leniency program credited as most powerful cartel detection tool globally; major cartel cases: auto parts cartel (2011-2016) -- over 50 companies, 60+ individuals, $2.9B in fines (largest criminal antitrust enforcement action); LCD/TFT panel cartel (2008-2012) -- Samsung/LG/Sharp $1.39B; air cargo price-fixing (2008-2014) $1.8B; shipping container lines; chicken industry bid-rigging (ongoing Pilgrim's Pride/Claxton Poultry since 2019)); public data sources (DOJ ATR press releases at justice.gov/atr/press-releases -- all enforcement actions, consent decrees, criminal charges/pleas; RSS feed justice.gov/rss/atr/press-releases.xml; PACER federal court system for civil merger complaints (US v. Defendant); Federal Register Tunney Act notices for proposed consent decrees with 60-day public comment period; DOJ HSR annual report (aggregate statistics only, individual filings confidential); GAO antitrust enforcement reports; FTC Annual Highlights; Stanford Computational Antitrust; OECD competition policy statistics); Python script fetching DOJ ATR press release RSS, parsing XML, classifying by keyword pattern (criminal/merger/consent decree/civil), computing annual enforcement action counts by type 2018-2024, identifying most common industry targets in criminal enforcement.

  42. Q4Writing

    CDC WISQARS injury and violence mortality data deep-dive published

    Long-form article on CDC WISQARS (Web-based Injury Statistics Query and Reporting System): primary federal source for injury-related deaths and nonfatal injuries; fatal data from NCHS National Vital Statistics System -- death certificates coded to ICD-10 underlying cause, with V01-Y89 external cause of morbidity codes classifying how injury occurred; back to 1981 for comparable data; nonfatal data from NEISS-AIP (National Electronic Injury Surveillance System All Injury Program) -- stratified probability sample of ~100 hospital emergency departments producing national estimates; leading causes of injury death (2022 provisional data: unintentional injury ~230,000 deaths = #1 cause of death for ages 1-44; drug poisoning/overdose 109,680 (74,702 involving synthetic opioids/fentanyl, 24,486 methamphetamine, 9,173 heroin -- categories overlap for polydrug deaths); motor vehicle traffic 46,027 (including pedestrian 7,508, motorcyclist 6,084, pedal cyclist 1,105); unintentional falls 44,686 (predominantly elderly: 78% of fall deaths age 65+, hip fractures leading to pneumonia/DVT); suicide 49,449 (firearms 26,993 = 54.6% of suicides; suffocation/hanging 13,250 = 26.8%; poisoning 4,522 = 9.1%; other methods); homicide 24,849 (firearms 19,592 = 78.8%; knives/cutting instruments 2,120 = 8.5%; hands/feet 820 = 3.3%); total firearm deaths across intents: 48,204 = 14.6/100,000 rate (54.0% suicide, 40.8% homicide, 3.0% undetermined, 2.1% unintentional)); ICD-10 external cause coding (V01-V99: transport accidents -- V20-V29 motorcycle occupant, V40-V49 car occupant, V80-V89 other land transport, V90-V95 water/air transport; W00-X59: other external causes of unintentional injury -- W54 bitten/struck by dog, W65-W74 drowning, X00-X09 exposure to smoke/fire, X40-X49 accidental poisoning; X60-X84: intentional self-harm -- X72 intentional self-harm by handgun discharge, X78 sharp object; X85-Y09: assault -- X93 assault by handgun, X99 assault by sharp object; Y10-Y34: events of undetermined intent; two-letter mechanism/nature codes in WISQARS query interface); firearm violence data (WISQARS provides most comprehensive public data on firearm deaths by state/year/age/sex/race-ethnicity/intent; 2022 geographic variation: MS 28.6/100k, LA 26.3, WY 24.5 highest total firearm death rates vs. MA 3.9, HI 4.3, RI 5.0 lowest; firearm suicide: AK 19.0/100k, WY 16.0, MT 14.8 vs. NJ 1.0, NY 0.8, HI 1.3 -- correlates with household gun ownership rates; firearm homicide: DC 18.4, MS 13.2, LA 12.7 vs. NH 0.5, ID 0.8, UT 0.8 -- concentrated in urban areas, disproportionately affecting Black men ages 15-34; WISQARS is primary data source for peer-reviewed gun violence research and policy analysis); NEISS and nonfatal injuries (~40 million injury-related ED visits/year; ~3 million hospitalizations; falls most common in elderly; sports/recreation 8-10M ED visits; motor vehicle occupant 2.5M ED visits; NEISS consumer product module tracks specific products -- playgrounds, bicycles, ATVs -- managed by CPSC; NEISS-AIP for all-cause injury surveillance managed by CDC); suicide surveillance and policy (2022 suicide rate: 14.3/100,000 = highest since 1938; male suicide rate 22.8 vs. female 6.1; American Indian/Alaska Native rate highest among racial groups; rural rates 20+ vs. urban 11-12 per 100,000; means restriction evidence: ecological studies show firearm storage laws reduce firearm suicide rates without equivalent substitution -- method substitution estimated at 10-30% offset; 988 Suicide and Crisis Lifeline launched July 2022 replacing 1-800-273-8255; Zero Suicide initiative in healthcare systems targeting zero preventable suicide deaths among patients; ERPO/red flag laws in 21 states: temporarily remove firearms from individuals in crisis); opioid three-wave epidemic (Wave 1 1990s-2010: prescription opioid deaths rise, OxyContin 1996 Purdue Pharma; Wave 2 2010-2013: heroin increase as prescription opioids cracked down; Wave 3 2013-present: illicit fentanyl synthesized in China/Mexico, 50-100x potency of morphine, test strips and naloxone distribution key interventions; 109,680 deaths 2022 -- fentanyl/synthetics 73,800 (67%), stimulants often co-involved 32,537 (polydrug)); data access (WISQARS interactive query at cdc.gov/wisqars/; fatal injury: select mechanism/intent/year/state/age/sex/race, output rates or counts, CSV download; WONDER at wonder.cdc.gov for underlying cause of death with demographic detail; NVDRS (National Violent Death Reporting System) 50-state case-level data with death certificate + coroner/ME report + law enforcement report linkage -- circumstance data (mental health history, crisis in past 2 weeks, substance use, previous attempts, precipitating circumstances); AtlasPlus for STI/HIV/hepatitis/TB county-level mapping); Python script pulling WONDER or WISQARS API for firearm death rates by state 2022, separating suicide vs. homicide rates, computing correlation with proxy gun ownership measure, ranking top-10 states by each intent category, noting divergent geographic patterns.

  43. Q4Writing

    USASpending.gov federal contracts and grants data deep-dive published

    Long-form article on USASpending.gov: official federal government spending database; statutory foundation (Federal Funding Accountability and Transparency Act (FFATA) 2006 signed by President Bush -- required single searchable website for federal grants and contracts exceeding $25,000; Digital Accountability and Transparency Act (DATA Act) 2014 -- expanded to link financial system data (budget/obligations/outlays) to award-level records, required governmentwide data standards (Treasury data element standards), and mandated agency financial system submissions to Treasury DATA Act Broker; administered by Bureau of the Fiscal Service, Department of the Treasury; total federal spending FY2023 ~$6.1T: mandatory/direct programs ~$3.5T (Social Security $1.3T, Medicare $0.9T, Medicaid $0.6T, other), discretionary grants ~$0.8T, contracts ~$0.7T, loans/loan guarantees ~$0.35T, other financial assistance); award type breakdown and source systems (contracts: from FPDS-NG (Federal Procurement Data System - Next Generation) -- real-time reporting by contracting officers within 3 business days of award; DoD ~$412B FY2023, civilian agencies ~$300B; largest civilian contract agencies: DHS ~$30B, HHS ~$15B, VA ~$25B, NASA ~$22B, DOE ~$25B; top 10 defense prime contractors received ~45% of all DoD prime contract dollars; grants: from GrantSolutions (HHS grants), Payment Management System (PMS), G5 (ED grants), ASSIST (NIH), eRA Commons, NSF FastLane/Research.gov, multiple agency grant management systems -- USASpending aggregates via financial system submissions; HHS largest grant-making agency including Medicare/Medicaid pass-throughs to states; NIH ~$40B research project grants; NSF ~$9B; DOE Office of Science ~$8B R&D; subawards: FSRS (FFATA Subaward Reporting System) -- prime awardees with federal contracts >$30,000 or grants >$30,000 must report first-tier subawards within 30 days; shows contracting supply chain); FPDS contract data structure (approximately 40 data elements per transaction: Unique Entity Identifier (UEI replacing DUNS since April 2022), CAGE Code (Commercial and Government Entity code, DoD-assigned), Recipient Name, Recipient Street/City/State/Zip, Recipient Congressional District, Awarding Agency, Awarding Sub-Agency, Awarding Office, Funding Agency, Contract/Delivery Order Number, Date Signed, Period of Performance Start/End, Ultimate Completion Date, Total Obligated Amount, Total Base and Exercised Options, Total Base and All Options, PSC (Product Service Code from 4-character code list), NAICS Code, Type of Contract (firm-fixed-price FFP, fixed-price-incentive FPIF, cost-plus-fixed-fee CPFF, cost-plus-incentive-fee CPIF, cost-plus-award-fee CPAF, time-and-materials T&M), Extent of Competition (full and open competition, competitive with exclusions, not competed, follow-on to competed action), Reason Not Competed (FAR Part 6 exceptions: J&A required for sole source above simplified acquisition threshold), Type of Set-Aside (no set-aside, small business total, partial, 8(a) program, HUBZone, service-disabled veteran-owned SDVOSB, women-owned WOSB, economically disadvantaged WOSB), Place of Performance); defense spending patterns (DoD FY2023 $412B prime contracts; Lockheed Martin ~$73B: F-35 JSF (~$15B/yr), C-130J Super Hercules, PAC-3 Patriot missiles, AEGIS combat system, satellite systems; RTX (Raytheon Technologies post-merger) ~$42B: Patriot/PAC-3 production, Stinger MANPADS, Javelin ATGM (joint with General Dynamics), Pratt & Whitney F135 engines, Raytheon intelligence/space systems; Boeing ~$28B: KC-46A tanker, F-15EX Eagle II, F/A-18 E/F Super Hornet, P-8 Poseidon MPA, satellites; General Dynamics ~$25B: Abrams M1A2 tanks, Virginia-class submarines (Electric Boat), Stryker, M2 Bradley, Gulfstream executive aircraft; Northrop Grumman ~$22B: B-21 Raider stealth bomber (LRSO contract ~$33B ceiling), Sentinel/GBSD ICBM replacement, E-2D Hawkeye, ground-based midcourse defense; cost-plus contracts common for R&D and complex systems development; firm-fixed-price for production after engineering development); competition and set-aside programs (full and open competition ~57% of contract dollars FY2023; not competed ~30% -- justified by: only one responsible source, unusual/compelling urgency, national security concerns, authorized by statute, follow-on for standard commercial items; Section 8(a) Program (SBA): socially/economically disadvantaged individuals, 9-year program, no-competition threshold $25M; HUBZone: historically underutilized business zones; SDVOSB: verified by SBA, VA hospitals required to use SDVOSB first; WOSB: women-owned and economically disadvantaged; federal small business contracting goal 23% of prime contracts); DATA Act financial linkage (TAS (Treasury Account Symbol) links award obligations to specific appropriations; agency financial systems submit monthly SF-133 obligations; USASpending shows appropriation source → obligation date → outlays (actual cash payments); helps track congressional appropriation to actual spending outcomes; agency submission quality varies -- some agencies have significant data gaps); data access (api.usaspending.gov REST API: /api/v2/search/spending_by_award -- full-text search with award type/agency/date/NAICS/PSC filters, returns award records with all FPDS fields; /api/v2/bulk_download -- asynchronous CSV generation for any filter set, max 500k records per file; /api/v2/recipient -- recipient profiles with parent-child entity relationships; /api/v2/federal_accounts -- appropriation-level spending summary; FPDS.gov direct query going back to FY2000 for contract data only; SAM.gov for entity registration data including UEI, NAICS codes, small business certifications, active/inactive status); Python script pulling DoD FY2023 top-20 prime contractors via USASpending API /spending_by_category endpoint filtered to DoD, then separately analyzing small business set-aside awards by recipient census region.

  44. Q4Writing

    Federal Register rulemaking database deep-dive published

    Long-form article on the Federal Register: official daily journal of the US federal government since March 16, 1936; four document categories (proposed rules/NPRMs, final rules, presidential documents including executive orders/proclamations/memoranda, and notices including grant announcements/meeting notices/corrections); Code of Federal Regulations (CFR) as the annual codification of all effective final rules organized into 50 titles by subject area (Title 1 General, Title 26 IRS tax regulations, Title 40 EPA, Title 21 FDA, Title 47 FCC, Title 29 OSHA/DOL); volume statistics (~85,000-95,000 pages/year; Obama second term record highs; Trump first term record lows from deregulation push); Administrative Procedure Act (APA) 1946 framework (notice-and-comment required for legislative rules: NPRM published with preamble and proposed regulatory text, minimum 30-day comment period -- usually 60-90 days for significant rules, agency must substantively address all significant comments in final rule preamble, final rule published with 30-day implementation delay unless good cause; informal rulemaking vs. formal (trial-type) rulemaking; arbitrary-and-capricious review under APA Section 706 -- courts review agency reasoning in the rulemaking record; Chevron doctrine (1984) gave deference to reasonable agency statutory interpretations overruled by Loper Bright Enterprises v. Raimondo June 2024 -- courts now must independently interpret ambiguous statutes without deferring to agencies); OIRA review process (Office of Information and Regulatory Affairs within OMB; EO 12866 (Clinton 1993): OIRA reviews significant rules defined as >$100M annual economic impact or novel legal issues; Regulatory Impact Analysis required for major rules; EO 13563 (Obama 2011): retrospective review of existing regulations; EO 13771 (Trump 2017): one-in-two-out for significant rules; EO 13990 (Biden 2021) revoked Trump deregulation EOs; ~100-150 significant rules under OIRA review at any time); Unified Regulatory Agenda (published twice yearly spring/fall via Reginfo.gov; lists all agency rules in four stages: pre-rule/ANPRM, proposed rule, final rule, long-term > 12 months; Regulatory Plan = most significant priorities; predecessor agencies use separate agenda entries); Congressional Review Act mechanics (1996; Congress may overturn recent major rules via joint resolution within 60 legislative days of transmission to Congress; President can sign or veto; 16 total uses before 2017; Trump/Republican Congress used CRA 14 times in 2017 to overturn Obama midnight rules including OSHA beryllium standard, Stream Protection Rule, teacher preparation reporting, ISP broadband privacy; rules overturned under CRA cannot be reissued in substantially similar form without new legislation); Federal Register API (federalregister.gov/api/v1/: /articles endpoint for full-text search with conditions[term], conditions[agencies][], conditions[type][], conditions[publication_date][gte/lte], per_page up to 1000; /documents endpoint for specific document detail with CFR citations; bulk XML download from govinfo.gov for years back to 1994; eCFR continuously updated at ecfr.gov vs. annual printed CFR); Regulations.gov (public comment portal at regulations.gov; all open and closed dockets with submitted comment PDFs; net neutrality 2014: 3.7M comments; CFPB payday lending 2016: 1.4M; Regulations.gov API v4 requires API key for programmatic docket and document access); Python script using Federal Register API to fetch all EPA proposed rules from 2024, extract title/abstract/comment deadline/CFR citations, compute average comment period length, sort by deadline, identify dockets linked to most significant regulatory actions.

  45. Q4Writing

    FEC committee filings and campaign finance data deep-dive published

    Long-form article on FEC committee filings: FEC created by Federal Election Campaign Act 1971 (amended post-Watergate 1974); six commissioners bipartisan (3D + 3R), four votes needed for enforcement -- frequent deadlock on controversial matters; federal elections jurisdiction only (House/Senate/President) -- state elections handled by state agencies; committee types and registration (Form 1 Statement of Organization: Principal Campaign Committee (PCC) = candidate primary committee; authorized and unauthorized committees; national party committees (DNC/RNC/DSCC/NRSC/DCCC/NRCC); state/local party committees; Non-Connected PAC: independent from any corporation or union, accepts contributions up to $5,000/person/year, contributes up to $5,000/election to candidate committees; SSF (Separate Segregated Fund): corporate or labor-union connected PAC, solicits only restricted class; Super PAC (IEC: Independent Expenditure-only Committee): post-Citizens United v. FEC (2010) and SpeechNow.org v. FEC -- unlimited contributions from corporations/unions/individuals, must spend independently without candidate coordination; Hybrid PAC (Carey Committee): split bank account -- one for traditional PAC activity (contribution limits), one for Super PAC IEs (unlimited); Leadership PAC: separate committee maintained by federal officeholder for leadership/party activities); disclosure requirements (Form 3 for House/presidential: quarterly April/July/October/January plus 12-day pre-election and 30-day post-general; Form 3S for Senate: two annual + pre/post election; Form 3X for PAC/party: monthly if >$50k raised/spent or quarterly; Form 3L: lobbyist bundling disclosure; 48-hour notices for contributions >$1,000 received in last 20 days before election; 24-hour notices for IEs >$10,000 in last 20 days before election; 2024 federal election spending ~$14B total per OpenSecrets; presidential race $3.5B; Senate competitive seats average $20-50M per cycle; Citizens United: corporations have First Amendment right to make unlimited IEs from general treasury; McCutcheon v. FEC 2014 eliminated aggregate contribution limits; 2024 limits: individual to candidate $3,300/primary + $3,300/general = $6,600 total, to PAC $5,000/year, to national party committee $41,300/account); FEC bulk data structure (eight key files at fec.gov/data/browse-data/?tab=bulk-data: cm.zip committee master with name/type/treasurer/connected organization; cn.zip candidate master with office/party/state/district; ccl.zip candidate-committee linkage; pas2.zip PAC/party contributions to candidate committees; indiv.zip individual contributions >$200 with NAME/EMPLOYER/OCCUPATION/CITY/STATE/ZIP/AMOUNT/DATE/RECIPIENT -- primary analytical file; oth.zip committee-to-committee transfers; oppexp.zip operating expenditures/disbursements; weball.zip summary financial data by committee); OpenFEC API (api.open.fec.gov/v1/: /candidates, /committees, /filings, /schedules/schedule_a receipts, /schedules/schedule_b disbursements, /schedules/schedule_e independent expenditures, /schedules/schedule_f coordinated party expenditures; free API key at fec.gov/api/public/signup/; 1,000 requests/hour; 20 results/page default, max 100 per_page); dark money dynamics (501(c)(4) social welfare organizations may make unlimited IEs without disclosing donors; Crossroads GPS Karl Rove; Priorities USA; IRS Form 990 public but donor Schedule B not required for 501(c)(4); DISCLOSE Act passed House 2010, 2012, 2014 -- failed Senate; STOCK Act 2012 added super PAC disclosure; FEC MURs (Matters Under Review) = enforcement cases -- released publicly when closed; FEC advisory opinions); geographic/employer analysis (indiv.zip enables zip-code level contribution mapping; employer/occupation fields allow industry analysis: finance/investment ~30-40% of total to both parties; defense/aerospace concentrated in Republican candidates from defense-district incumbents; tech sector gives disproportionately to Democratic candidates; law firms bipartisan depending on trial vs. corporate orientation; real estate/construction more Republican; educators more Democratic); Python script loading cm.zip (committee master) and indiv.zip (individual contributions), filtering to 2024 cycle, aggregating total by committee type (Super PAC vs. PAC vs. party vs. candidate), then computing top-20 occupations by total contributed amount and partisan lean index (D proportion of D+R total).

  46. Q4Writing

    CDC NNDSS notifiable disease surveillance deep-dive published

    Long-form article on CDC NNDSS: National Notifiable Diseases Surveillance System; history (notifiable disease reporting dates to 1878 -- cholera/yellow fever/smallpox for quarantine at ports of entry; modern NNDSS formalized through Council of State and Territorial Epidemiologists (CSTE) annual review process; CSTE publishes annual list of nationally notifiable conditions -- currently 120+ conditions; state law defines mandatory reporting (physicians/labs/hospitals report to local/state health department); federal reporting is voluntary -- no federal statute requires states to report to CDC, but all 50 states and territories participate); disease category overview with statistics (vaccine-preventable: measles 962 cases 2019 before recovery, mumps outbreaks in college settings, pertussis 15,000-20,000/year with peaks to 50,000 in outbreak years, polio 1 case 2022 in Rockland County NY -- first US case since 1979, hepatitis A/B/C varying; sexually transmitted infections: gonorrhea ~700,000 cases/year at record highs with AMR concern (ceftriaxone last reliable treatment after fluoroquinolone/azithromycin resistance), chlamydia ~1.6M/year (most-reported notifiable condition), syphilis 176,713 total cases 2022 -- highest since 1950 pre-penicillin era -- with congenital syphilis 3,755 cases -- 755% increase from 491 cases in 2012, HIV ~35,000 new diagnoses/year; vector-borne: Lyme disease ~476,000 new clinical diagnoses/year estimated (IDSA clinical criteria) vs. ~35,000 confirmed/probable NNDSS-reported cases -- 13:1 estimated-to-confirmed ratio; West Nile virus seasonal peaks; Rocky Mountain spotted fever; foodborne: Salmonella ~1.35M estimated/year vs. ~50,000 confirmed NNDSS -- 27:1 ratio reflecting iceberg effect; E. coli O157:H7 HUS cases; Listeria ~1,600/year with high fatality rate ~16%; Campylobacter most common foodborne globally); data structure and reporting mechanics (NNDSS Table I weekly: number of cases by condition, current week, previous 52 weeks cumulative; Table II: provisional cases by condition and reporting area; MMWR published every Thursday; reporting delay 1-8 weeks depending on condition and lab confirmation time; Electronic Lab Reporting (ELR): labs transmit HL7 v2 messages to state health departments -- dramatically improved timeliness; National Electronic Disease Surveillance System (NEDSS) infrastructure; BioSense Platform: CDC near-real-time syndromic surveillance aggregating ED chief complaints from 5,000+ facilities -- not notifiable disease reports but complementary signal); STI surveillance case study (syphilis trajectory: 2000 near-elimination campaign, nadir 5,979 cases 2001, subsequent rise driven by MSM community, then broadening to heterosexual transmission and congenital syphilis; 2022: total 176,713 cases = 53.9/100,000 rate; primary/secondary syphilis 49,000 cases; congenital syphilis 3,755 newborns -- 22.7/100,000 live births; geographic concentration: South states account for ~40% of syphilis burden; Black and Hispanic communities disproportionately affected; gonorrhea AMR: ceftriaxone 500mg IM now standard after azithromycin resistance emerged 2018 and fluoroquinolone resistance established 2007; WHO global gonococcal AMR surveillance warning of possible untreatable gonorrhea); influenza surveillance (FluView system: four components -- virologic surveillance from WHO/NREVSS collaborating labs (positive flu tests/subtype); outpatient ILI (influenza-like illness) from ILINet 3,000+ sentinel providers (% visits with fever + cough/sore throat); FluSurv-NET laboratory-confirmed hospitalization rates (13 states catchment); mortality surveillance -- NCHS P&I (pneumonia/influenza) death data; RSV and COVID added to FluView Interactive for integrated respiratory surveillance -- RESP-NET 2024); COVID-19 surveillance evolution (added to NNDSS January 2020; case reporting system overwhelmed by Omicron January 2022 ~1M+ reported cases/day; CDC COVID COG governance; transition from individual case surveillance to aggregate indicators: NWSS (National Wastewater Surveillance System) 1,000+ sites sampling sewage for SARS-CoV-2 RNA -- 4-7 day leading indicator of case trends; genomic surveillance (Nowcast) for variant proportion estimates; RESP-NET: laboratory-confirmed hospitalization rates for COVID/flu/RSV; emergency department data); data access paths (data.cdc.gov NNDSS Table II Socrata dataset; WONDER (Wide-ranging Online Data for Epidemiologic Research): mortality, STI, TB, HIV, hepatitis, cancer data queryable by year/state/age/sex/race; AtlasPlus interactive maps for STI/TB/HIV/hepatitis by county; FluView Interactive at gis.cdc.gov/grasp/fluview/); Python script downloading NNDSS Lyme disease annual confirmed cases by state 2010-2023 via data.cdc.gov Socrata API, computing CAGR by state, identifying spreading geographic frontier (Midwest and Southeast states with fastest growth vs. traditional Northeast endemic states), plotting multi-line trend chart.

  47. Q4Writing

    SEC Form D private placement database deep-dive published

    Long-form article on SEC Form D and the Regulation D exempt offering framework: Form D required within 15 days of first sale of securities in a Reg D exempt offering; also required for Reg A offerings and Reg S offshore offerings in some cases; ~300,000-350,000 Form D filings per year in recent years; legal framework (Securities Act of 1933 requires all securities offerings to be registered with the SEC unless an exemption applies; Regulation D under Securities Act provides safe harbor exemptions that avoid the costly and time-consuming registration process; Section 4(a)(2) statutory exemption for transactions not involving any public offering -- Reg D operationalizes this for certainty); exemption mechanics (Rule 504: offering up to $10M in 12 months; may use general solicitation if state-registered; state blue sky laws apply; often used by smaller businesses and Reg A+ issuers; Rule 506(b): no dollar limit on offering size; up to 35 non-accredited investors who must be sophisticated; unlimited accredited investors; no general solicitation or advertising permitted; most common exemption -- ~90% of all Form D filings by number and >95% by dollar value; Rule 506(c): added by JOBS Act Title II effective September 23, 2013; no dollar limit; general solicitation and advertising permitted -- TV, internet, print ads allowed; only accredited investors may purchase; issuer must take reasonable steps to verify accredited investor status -- acceptable verification: bank statements for net worth, CPA letter, tax returns for income, third-party accredited investor verification letter); accredited investor definition evolution (originally: net worth >$1M excluding primary residence OR income >$200,000 individually or $300,000 jointly for prior two years with expectation of same; 2020 SEC expansion: added knowledge-based category -- Series 7/65/82 license holders, knowledgeable employees of private funds, SEC/state-registered investment advisers, rural business investment companies; inflation critique: $200k threshold set in 1982 equivalent to ~$680k in 2024; Congressional Research Service estimates 16% of US households qualify vs. 2% originally intended); JOBS Act 2012 full impact (Title II: Rule 506(c) general solicitation -- AngelList launches public fundraising profiles; Title III: Regulation Crowdfunding up to $5M per 12 months via SEC-registered funding portals -- Republic, Wefunder, StartEngine; Title IV: Regulation A+: Tier 1 up to $20M with state blue sky; Tier 2 up to $75M with SEC qualification, annual/semi-annual reporting requirements but no state registration; used by regional issuers, real estate companies, some tech startups; broader retail investor reach than 506(b)); Form D data fields (entity/issuer name; date of first sale; total offering amount; total amount sold; number of investors who have purchased; exemption(s) relied upon; type of securities: equity/debt/option to acquire security/security to be acquired upon exercise/pooled investment fund interests/tenant-in-common/mineral property interests/other; federal exemption: 506(b)/506(c)/504/Reg A/Reg S/4(a)(5); industry group: 18 categories including banking/financial services/technology/healthcare/real estate; revenue range: no revenues/$1-1M/$1M-5M/$5M-25M/$25M-100M/$100M+ or declined to disclose; investment fund type: hedge fund/private equity fund/venture capital fund/real estate fund/other investment fund; state of incorporation; related persons/promoters names and addresses; yes/no: is issuer a pooled investment fund); private capital market scale (SEC DERA (Division of Economic and Risk Analysis) annual Regulation D report: Rule 506 offerings raised ~$2.5T in 2022, compared to ~$1.1T in 2009 when electronic filing began; total domestic public equity raised in IPOs/SEOs ~$300B/year -- private capital markets 8x+ larger; VC tracked through Form D: Series Seed ~$1-3M, Series A $10-15M, Series B $20-40M, Series C $50-100M+; hedge fund industry ~$4T AUM mostly under 506(b); PE buyout funds raise during capital formation period via Form D then close; real estate syndications: 506(b) for apartment complex/commercial property acquisitions); analytical applications (Form D as private market leading indicator: filing spike in Q1 2021 reflected SPAC boom and frothy private market; geographic VC concentration: SF Bay Area, NYC, Boston account for ~60% of VC-related Form D filings; industry sector: tech/software 30-35%, healthcare/biotech 20-25%, real estate 15%, financial services 10%; amendment filings track ongoing fundraises vs. closed funds; amendment trail: Form D amended when total offering amount updated -- can track fundraising progress); EDGAR access (EDGAR full-text search: efts.sec.gov/LATEST/search-index?q=%22form+D%22&dateRange=custom for JSON index; direct filing search at sec.gov/cgi-bin/browse-edgar; data.sec.gov/submissions/{CIK} API for company-specific filing history; electronic filing required since 2009 -- historical data back to 2009 for electronic, paper filings pre-2009 in separate archive; 10 requests/second rate limit); limitations and opacity (Form D is notice-only -- no financial statements, no audits, no ongoing periodic reporting; if offering never closes, no amendment required; shell companies and SPVs can file without revealing ultimate beneficial owners; many Cayman Islands master fund structures have Delaware feeder funds that file Form D while master fund is offshore; 506(c) issuer self-certifies accredited investor verification -- no independent SEC check); Python script fetching recent Form D filings from EDGAR full-text search API, parsing issuer name/industry/exemption type/offering amount/investment fund type, computing top-10 states by VC-related filings, average offering size by exemption type, monthly filing trend for 2024, proportion using 506(c) general solicitation vs. 506(b) traditional.

  48. Q4Writing

    CMS Medicare Inpatient DRG provider data deep-dive published

    Long-form article on CMS Medicare Inpatient Provider Charge Data: annual dataset covering ~3,000 Medicare-certified hospitals across ~760 DRGs (Diagnosis Related Groups); IPPS payment formula (Base Rate ~$6,000 in 2023 x Relative Weight x Adjustments = DRG payment); RW examples (DRG 001 Heart Transplant RW ~25.0, DRG 207 Respiratory/Chest Procedure with MCC RW ~5.4, DRG 292 Heart Failure with MCC RW ~1.7, DRG 470 Major Joint Replacement without MCC RW ~2.1, DRG 683 Renal Failure with CC RW ~0.9); geographic adjustments (Wage Index: local hospital labor market cost relative to national average; IME Indirect Medical Education: teaching hospital add-on of ~5.5% per 0.1 resident-to-bed ratio; DSH Disproportionate Share Hospital: safety-net add-on for Medicaid/low-income patient share; Rural adjustments; CAH Critical Access Hospital exempt from IPPS -- cost-plus 101% reimbursement); outlier payments (additional payment when case cost exceeds DRG amount + $40,000 fixed loss threshold + case-mix-adjusted outlier threshold); chargemaster gap (hospitals set list charges independently; average covered charges can be 5x-10x actual Medicare payments; chargemaster prices are negotiating artifacts unrelated to cost or actual payment); geographic variation (DRG 470 average Medicare payment ranges $12,000 to $35,000+ across hospitals; McAllen TX vs. El Paso TX from Atul Gawande 2009 New Yorker -- same Medicare demographics, 2x per-beneficiary spending; Dartmouth Atlas of Health Care documents regional variation); value-based adjustments (HVBP Hospital Value-Based Purchasing adjusts DRG payments +-2% based on quality/patient experience; HRRP Hospital Readmissions Reduction Program penalizes up to 3% for excess AMI/HF/pneumonia/COPD/CABG/TKA-THA readmissions; HACRP Hospital-Acquired Condition Reduction Program penalizes bottom quartile on HAC scores); dataset structure (Provider NPI, Provider Name, DRG Definition, Total Discharges, Average Covered Charges, Average Total Payments, Average Medicare Payments; provider address/state/zip/HRR Description); data access (data.cms.gov/provider-data; Socrata SODA API with field filtering; also CMS Provider Utilization and Payment Data page; historical back to 2011; NBER IRIS repository for cost report data); Python script loading CMS DRG CSV via Socrata API, filtering to DRG 470 (Major Joint Replacement), computing charge-to-payment ratio by hospital, identifying top-10 highest and lowest ratios, computing state-level median payment variation, visualizing payment distribution histogram.

  49. Q4Writing

    FDA Orange Book drug patent database deep-dive published

    Long-form article on the FDA Orange Book (Approved Drug Products with Therapeutic Equivalence Evaluations): published continuously since 1980 as the definitive reference for FDA-approved drugs and their substitutability; Hatch-Waxman Act (Drug Price Competition and Patent Term Restoration Act 1984) framework -- ANDA (Abbreviated New Drug Application) pathway: generic applicants show bioequivalence rather than repeating clinical trials, dramatically reducing generic entry costs; Paragraph IV certification mechanics (generic ANDA applicant certifies listed patent is invalid or not infringed; brand must sue within 45 days to trigger 30-month FDA stay; if brand sues, ANDA approval delayed 30 months or until court decision; 180-day exclusivity for first Paragraph IV filer); pay-for-delay / reverse-payment settlements (brand pays generic to delay market entry; FTC v. Actavis 2013 Supreme Court held these are subject to rule of reason antitrust scrutiny; FTC estimates pay-for-delay costs consumers $3.5B/year); therapeutic equivalence code taxonomy (two-letter codes: first letter A = therapeutically equivalent; B = not established bioequivalent; second letter: A=conventional dosage; B=solution/powder; N=aerosol; O=injection; P=powder; T=topical; X=do not substitute; AB-rated = most commercially significant, triggers state substitution laws); patent thicket dynamics (brand manufacturers list all patents within 30 days of approval; average brand drug has 71+ Orange Book-listed patents; continuation patents, method-of-use patents, formulation patents extend exclusivity; AbbVie Humira 130+ patents across multiple patent families); exclusivity types (NCE/NME 5-year: no ANDA until 4th anniversary for Paragraph I/II/III or 5th anniversary for Paragraph IV; new clinical investigation 3-year: when NDA relies on new studies; Orphan Drug 7-year exclusivity for rare diseases defined as <200,000 US patients; Pediatric 6-month add-on for studying drug in pediatric populations; QIDP Qualified Infectious Disease Product 5-year extension under GAIN Act); patent cliff economics (Lipitor/atorvastatin $10B/year peak, lost exclusivity November 2011, generic reached ~90% market share within 6 months; Crestor/rosuvastatin 2016 genericization; Plavix/clopidogrel 2012; Humira/adalimumab 2023 -- 7 biosimilar launches simultaneously under BPCIA biosimilar pathway, first multi-biosimilar same-day launch; metformin: near-universal generic penetration, $0.004/pill vs. $1+/brand); Purple Book for biologics (FDA Biosimilar Product Database -- separate from Orange Book; 12-year reference product exclusivity under BPCIA 351(k); interchangeability designation under BPCIA higher standard than biosimilar; Humira biosimilars: Hadlima, Hyrimoz, Cyltezo, Hadlima, Yusimry, Simlandi, Hulio); data structure and access (Products.txt tilde-delimited: Appl_Type, Appl_No, Product_No, Ingredient, DF/Route, Trade_Name, Applicant, Strength, Appl_Public_Notes, Approval_Date, TE_Code; Patent.txt: Appl_No, Product_No, Patent_No, Patent_Expire_Date_Text, Drug_Substance_Flag, Drug_Product_Flag, Patent_Use_Code, Delist_Requested; Exclusivity.txt: Appl_No, Product_No, Exclusivity_Code, Exclusivity_Date; download at FDA.gov/drugs/drug-approvals-and-databases/orange-book annual cumulative supplement + monthly addenda); Python script loading all three Orange Book flat files, identifying patents expiring within 24 months, computing top-20 upcoming patent cliffs by number of affected dosage forms, identifying deepest patent thickets by unique patent count per NDA.

  50. Q4Writing

    CDC PLACES small area health estimates deep-dive published

    Long-form article on CDC PLACES (Population Level Analysis and Community Estimates): model-based small area health estimates for all 3,100+ US counties, 29,000+ census tracts, and 28,000+ ZIP code tabulation areas (ZCTAs); history (500 Cities Project 2016 covered only the largest 500 cities; expanded to PLACES 2020 covering all US geographies); the BRFSS problem (Behavioral Risk Factor Surveillance System surveys ~400,000 adults annually but sample sizes too small for reliable county or sub-county estimates; state-level reliable but county estimates have wide confidence intervals; rural counties may have only 50-100 respondents); multilevel regression and poststratification (MRP) methodology (Step 1: fit multilevel logistic regression on BRFSS data -- individual response ~ age + sex + race/ethnicity + education + income + marital status + state random effect; Step 2: post-stratify -- apply estimated cell probabilities to Census ACS population demographic cells at target geography; Step 3: aggregate cell estimates to county/tract/ZCTA with uncertainty; very small tracts under 500 people have wide credible intervals); 36+ health measures across 5 domains (Health Outcomes: coronary heart disease, stroke, high blood pressure, COPD, asthma, cancer excluding skin, CKD, chronic kidney disease, diabetes, obesity, arthritis, depression, high cholesterol, poor mental health, poor physical health, poor general health, fair/poor self-rated health; Prevention: colorectal cancer screening, mammography, cervical cancer screening, dental visit, annual checkup, health insurance coverage, cholesterol screening, taking BP medication, core preventive services; Unhealthy behaviors: current smoking, binge drinking, physical inactivity, sleep less than 7 hours; Disabilities: cognitive disability, hearing disability, vision disability, mobility disability, self-care disability, independent living disability; Social determinants: housing insecurity, food insecurity, lack of reliable transportation -- newer additions 2022+); geographic disparities (obesity prevalence >40% in McDowell County WV, Knox County KY vs. <20% in Summit County CO, Boulder County CO; diabetes 15%+ in Tunica County MS, Leflore County MS vs. <7% in Pitkin County CO; smoking 25%+ in Leslie County KY, Pike County KY vs. <8% in San Jose neighborhoods; 500 Cities data revealed Chicago North Shore neighborhoods diabetes 4% vs. South Side neighborhoods 12% same city); policy applications (CDC/NCHHSTP uses PLACES for HIV/STI prevention resource allocation weighting; HRSA uses for federally qualified health center (FQHC) need assessment and grant scoring; local health departments required to produce Community Health Improvement Plans (CHIP) using PLACES for gap analysis; health equity research: neighborhood determinants of health, social vulnerability index correlation); PLACES vs. County Health Rankings (Robert Wood Johnson Foundation CHR: composite score 0-100 across four components -- health behaviors, clinical care, social/economic factors, physical environment; CHR uses multiple underlying sources including PLACES; CHR county-only, PLACES has sub-county; CHR uses measured outcomes like mortality rates, PLACES focuses on prevalence of behaviors and conditions; both update annually); data access (data.cdc.gov/500-Cities-Places/ → multiple datasets by geography; Socrata SODA API with SoQL query filtering; GeoJSON API endpoint with geographic filtering; CSV bulk download for county, tract, and ZCTA levels; ArcGIS REST API for GIS integration; sodapy Python library for Socrata); Python script calling CDC PLACES Socrata API for all Mississippi counties, computing correlation matrix between obesity/diabetes/physical inactivity, identifying 10 counties with highest combined burden composite score, plotting diabetes vs. obesity scatter with state average reference lines.

  51. Q4Writing

    BSEE offshore oil and gas safety data deep-dive published

    Long-form article on Bureau of Safety and Environmental Enforcement (BSEE) offshore incident data: BSEE created October 2011 by Secretary Salazar after Deepwater Horizon/Macondo blowout (April 20, 2010 -- Transocean Deepwater Horizon MODU drilling BP Macondo well Mississippi Canyon 252; blowout preventer failure; 87 days flowing; 4.9 million barrels discharged; largest accidental marine oil spill in history; 11 workers killed; $20B+ cleanup costs; $65B total liability for BP); MMS (Minerals Management Service) was predecessor agency with structural conflict -- same office managed revenue collection (shared with industry), resource development promotion, and safety regulation; 2010 reorganization split MMS into BSEE (safety/environmental), BOEM (Bureau of Ocean Energy Management for leasing/resource management), and ONRR (Office of Natural Resources Revenue for royalty collection); BSEE regulatory scope (~2,000 offshore facilities on US Outer Continental Shelf -- platforms, MODUs/mobile offshore drilling units, floating production systems, subsea wellheads; 15,000+ wells; ~30,000 miles offshore pipeline; Gulf of Mexico dominates ~98% of OCS production; Pacific OCS California; Alaska OCS including Beaufort and Chukchi seas); inspection program (4,000+ facility inspections annually; most are unannounced; four Gulf Districts: Lafayette, Lake Charles, New Orleans, Houma; Pacific District: Camarillo CA; Alaska District: Anchorage; Incident of Noncompliance (INC) = formal violation document; ~2,000+ INCs issued annually; INC categories: Safety (failure/degradation of safety system), Environmental (unauthorized discharge or spill), Operations (regulatory non-compliance), Structural (facility integrity); civil penalties up to $40,000/day per violation per Federal OCS Lands Act); incident reporting (BSEE incident database: blowouts, losses of well control, fires and explosions, structural failures, collisions, fatalities, injuries, near-miss Safety Alerts, spills/pollution events; recordable fatalities ~10-15/year normalized post-Macondo vs. 11 deaths in single DWH event; recordable injuries 100+/year; SEMS (Safety and Environmental Management System) rule: SEMS I (30 CFR Part 250 Subpart S) effective November 2011; SEMS II (2013): added stop-work authority for any employee, ultimate work authority designation, job safety analysis requirements, third-party CASP (SEMS Audit Service Provider) audits every 3 years); Well Control Rule (30 CFR Part 250 Subpart G updated 2016): API Standard 53 BOP equipment requirements; blind/shear ram capability for largest drillpipe; BOP testing intervals (surface: every 14 days; subsea: every 21 days); real-time monitoring of well control operations transmitted to shore; deepwater dual BOP/casing hanger requirements; HWCG (Helix Well Containment Group) and MWCC (Marine Well Containment Company) industry-funded subsea well containment systems -- capping stacks available in Gulf within 24 hours; 2019 rollback of some 2016 provisions (deepwater real-time monitoring) was controversial; production safety and environmental data (BSEE collects monthly production by facility and well: oil barrels, gas MCF, condensate, produced water; OCS deepwater Gulf produces ~15-17% of total US crude oil, ~5% of gas; Baker Hughes rig count tracks active MODU count; offshore production data available at bsee.gov/data-statistics going back to MMS era); data access (bsee.gov/data-statistics → downloadable CSV datasets: BSEE INCIDENTS (incident details, incident type, date, operator, OCS area, fatalities, injuries); INC database (violations, district, area, facility, INC category); inspection database (inspection date, type, facility, findings); production database (monthly production by facility); well data (API well number, spud date, TD, status); ESRI ArcGIS REST services for OCS facility infrastructure visualization); Python script loading BSEE incident CSV, classifying incident types by keyword, computing annual fatality and injury counts, normalizing by active facility count, ranking operators by incident frequency, computing pre-post-2011 trend comparison.

  52. Q4Writing

    Treasury Daily Treasury Statement data deep-dive published

    Long-form article on the DTS: published each federal business day at 4 PM ET by Bureau of the Fiscal Service; Table I (Operating Cash Balance: opening balance, deposits, withdrawals, closing balance at Federal Reserve and Tax and Loan accounts); Table II (Deposits and Withdrawals of Operating Cash: Federal Reserve Account receipts by category -- individual income taxes, FICA, corporate income taxes, excise taxes, estate/gift taxes, customs, miscellaneous; outlays by department/agency -- Defense, HHS, Social Security, Agriculture/SNAP, Treasury interest, Education); Table III-A through III-C (Public Debt Transactions: bills/notes/bonds issued/matured, TIPS, FRNs, savings bonds); Table IV (Federal Tax Deposits: withheld taxes, FUTA); Table V (Short-Term Cash Investments); Table VI (Income Tax Refunds Issued); Table VII (Treasury Borrowing from Federal Financing Bank); TGA balance mechanics (Treasury holds master account at NY Fed; zero-balance sweep to/from commercial TT&L accounts); debt ceiling X-date tracking (TGA drawdown + extraordinary measures extending borrowing authority); seasonal receipt patterns (April individual/corporate estimated payments, September corporate fiscal-year-end, January/April FICA spikes); fiscal year deficit calculation (running sum of daily net outflows from Oct 1; $1.695T deficit FY2023, $1.833T FY2024); Fiscal Data API (fiscaldata.treasury.gov/api/v1/accounting/dts/ endpoint, JSON, date-range filtering, field selection); comparison with Monthly Treasury Statement (MTS: broader accrual-basis vs. DTS cash-basis); Python script querying Fiscal Data API for 90-day window, parsing Table II withdrawals by department, computing daily outflow z-scores to flag anomalous spending days, plotting 7-day rolling outflows with FY reference line.

  53. Q4Writing

    Federal Reserve H.15 interest rates deep-dive published

    Long-form article on the H.15 Selected Interest Rates release: published by Federal Reserve Board on each business day; EFFR (federal funds effective rate) vs. IOER/IORB (interest on reserve balances as policy corridor floor); CMT (Constant Maturity Treasury) yields constructed daily by Treasury from on-the-run/off-the-run curve fitting -- 1-month, 3-month, 6-month, 1-year, 2-year, 3-year, 5-year, 7-year, 10-year, 20-year, 30-year; prime rate (300bps above EFFR target by convention, moves with FOMC decisions, used for HELOC/credit card/auto loan pricing); discount rate (primary credit rate at Fed discount window, 50bps above EFFR target); SOFR (Secured Overnight Financing Rate: daily volume-weighted median of Treasury repo transactions, ARRC-selected LIBOR replacement; publication began April 2018; LIBOR formal cessation June 30, 2023); yield curve mechanics (expectations hypothesis: long yields = geometric mean of expected short rates + term premium; TIPS breakeven = nominal CMT minus real TIPS yield = market-implied inflation expectation); 2-10 spread history (inverted July 2022 to ~Sep 2023; deepest inversion -108bps Oct 2023, deepest since 1981; historical inversion-to-recession lag 6-18 months); FRED series guide (DFF EFFR daily; FEDFUNDS monthly; DGS10 10yr CMT daily; GS10 monthly; T10Y2Y 10yr-2yr spread; SOFR; DFII10 10yr TIPS real yield; T10YFFM 10yr-EFFR term premium proxy); real rate vs. nominal rate distinction (Fisher equation: real ≈ nominal - inflation expectations; negative real rates 2021-2022 stimulus period, turned positive 2023); Python script using fredapi library to fetch DFF, DGS2, DGS10 since 2000, computing daily 2-10 spread, identifying contiguous inversion episodes, overlaying NBER recession shading (USREC), annotating GFC/COVID/2022-2023 episodes, plotting dual-axis chart with spread histogram.

  54. Q4Writing

    Census Population Estimates Program data deep-dive published

    Long-form article on Census PEP: annual county and state population estimates produced each spring for prior July 1 reference date; cohort-component methodology (base = decennial census count + postcensal adjustments; annual change = births [from NCHS vital statistics] - deaths [from NCHS] + net domestic migration [from IRS SOI address-change files + ACS] + net international migration [DHS + DOS estimates]); 2020 base challenges (COVID delayed enumeration, differential undercounting, 2020 accuracy controversies); population estimates vs. decennial census (estimates interpolate between censuses; vintage year concept: each new vintage supersedes prior estimates back to census base; discrepancies resolved at next decennial); major 2020-2023 trends (Florida net domestic migration +2.1M, driven by remote work and retirees; Texas +2.4M from domestic migration + international; NYC -500K from 2020 peak, driven by net domestic outmigration offsetting international immigration; California net domestic outmigration to Nevada/Texas/Arizona; Montana/Idaho/Arizona fast-growing small states; Puerto Rico continued population decline); components data utility (birth/death natural increase separable from migration; counties with high natural increase but net domestic outmigration; sunbelt counties with low natural increase but high in-migration); TIGER geography linkage (FIPS county codes stable across vintages; CBSA/metro-micro updates complicate time series); Census API pep/population endpoint (vintage parameter, COUNTY + STATE FIPS, components D001-D015 variables including births, deaths, domestic migration in/out, international migration in/out); Python script calling Census API for all 3,144 counties for 2020-2023 vintage, computing net migration = total change - natural increase, ranking by growth rate, separating top quintile by migration-driven vs. natural-increase-driven, outputting bar chart of component decomposition for top-20 growing counties.

  55. Q4Writing

    USDA FSIS food safety data deep-dive published

    Long-form article on USDA FSIS food safety data: FSIS jurisdiction (Meat, Poultry, and Egg Products Inspection Act -- FMIA 1906 / PPIA 1957 / EPIA 1970; FSIS regulates beef, pork, lamb, poultry, catfish, egg products; FDA regulates most other food including seafood except catfish); establishment coverage (6,500+ federally inspected establishments; mandatory continuous inspection for slaughter -- inspector present every shift; processing establishments inspection frequency risk-based); three-class recall system (Class I: reasonable probability of adverse health consequence or death -- E. coli O157:H7, Listeria, Salmonella in ready-to-eat; Class II: remote probability of adverse health consequence -- temperature abuse; Class III: no adverse health consequence -- labeling errors, undeclared allergens not posing allergy risk); major recall cases (Hallmark/Westland 2008 143M lbs -- largest recall in history, downer cattle processed, USDA school lunch supply, $7.9B company bankruptcy; Jack in the Box 1993 E. coli O157:H7 hamburgers -- 73 illnesses Seattle, 4 deaths, established critical control point HACCP era; ConAgra Peter Pan peanut butter 2007 Salmonella -- FDA jurisdiction example; Boar's Head 2024 Listeria deli meat outbreak -- 57 cases, 9 deaths); zero-tolerance adulterants (E. coli O157:H7 declared adulterant in ground beef 1994 after Jack in the Box; Salmonella in ready-to-eat not adulterant -- performance standards only; Listeria monocytogenes zero-tolerance in RTE); Establishments.csv schema (EST_NUMBER, ESTABLISHMENT_NAME, STREET, CITY, STATE, ZIP, PHONE, ACTIVITIES codes, GRANT_DATE, DUNS_NUMBER); PHIS (Public Health Information System: electronic inspection records replacing paper NRTE/HACCP forms, plant profile, NRs, corrective action tracking); FSIS recall database API (fsis.usda.gov/api/recall -- JSON, filterable by class/date/product); GenomeTrakr WGS (whole genome sequencing network: 37+ labs, 400k+ sequences of Listeria/Salmonella/E.coli/Campylobacter; allows outbreak attribution within 48hrs; free access via NCBI); HACCP plan requirements (7 principles: hazard analysis, CCPs, critical limits, monitoring, corrective actions, verification, record-keeping; mandatory since 1997 for large plants, 2000 for small, 2001 for very small); Python script querying FSIS recall API for 2015-2024, parsing Class I vs. II vs. III by year, computing pounds recalled per year, identifying commodity categories with highest Class I frequency, plotting stacked bar chart of recall class distribution over time.

  56. Q4Writing

    Census SAIPE small area poverty estimates deep-dive published

    Long-form article on Census SAIPE: SAIPE overview (Small Area Income and Poverty Estimates program; annual county-level, state-level, and school-district-level estimates covering 3,100+ counties and 13,000+ school districts; primary uses Title I-A education funding ~$17B/year and CDBG Community Development Block Grant $3.5B/year formula; unlike CPS which requires 3-year averages for state estimates and cannot produce county-level directly, SAIPE produces single-year county estimates using model-based small area estimation; published approximately December each year for prior calendar year data); methodology (model-based small area estimation combining administrative data sources: ACS 1-year or 5-year microdata as primary geographic input, IRS Statistics of Income individual tax return file counting persons below filing threshold/claiming EITC as income proxy, SNAP participation counts from USDA state agencies, Census intercensal population estimates for denominator, CPS state-level poverty as national constraint; state-space models calibrated to CPS state estimates; estimates come with associated standard errors enabling statistical testing; more current than ACS 5-year rolling period for rapidly changing areas; school district estimates allocated from county estimates using ACS school-district data for children 5-17); key metrics (poverty rate and count for total population and children under 18 by county and state; median household income by county and state; school-district poverty: number and percent of school-age children 5-17 in poverty as primary Title I driver; 2022 SAIPE: national child poverty rate 14.7%, 52.7M children under 18; county range less than 5% in wealthy DC and NYC suburbs to greater than 45% in Mississippi Delta and Native American reservation counties in South Dakota and Arizona); Title I-A education funding (ESEA Elementary and Secondary Education Act Title I-A distributes approximately $17B annually to school districts; formula components: basic grants, concentration grants, targeted grants, education finance incentive grants; all four formulas use SAIPE school-age poverty counts as primary driver; holds-harmless provisions protect districts from large year-over-year swings; state receives allocation then sub-allocates to districts and schools; 1,000-child SAIPE count change can shift several million dollars in Title I funding for a district; SAIPE is the only official single-year source enabling this level of geographic specificity); CDBG formula (HUD CDBG $3.5B annually to 1,200+ entitlement communities — cities over 50,000 and urban counties over 200,000; Formula A weighted: population 25%, poverty 50%, housing overcrowding 25%; Formula B weighted: growth lag 20%, poverty 30%, pre-1940 housing 50%; larger of two formula amounts allocated to each community; SAIPE poverty count is key input to both formulas); Census API access (api.census.gov/data/timeseries/poverty/saipe; key variables: SAEPOVRTALL_PT poverty rate all ages, SAEPOVALL_PT poverty count all ages, SAEPOVRT17_PT poverty rate under 18, SAEPOV17_PT poverty count under 18, SAEMHI_PT median household income; geographic levels: state 040, county 050, school district 970 with LEAID code; time series 1989-present with annual estimates from 2003; Python using requests to Census API returning JSON); comparison to ACS and CPS (ACS 5-year appropriate for 5-year average poverty; CPS appropriate for national and large-state annual estimates; SAIPE appropriate for single-year county and school-district poverty; three programs serve different purposes not substitutable; SAIPE more current than ACS 5-year for rapidly changing areas; SAIPE lacks demographic detail available in ACS microdata); Python script fetching all county SAIPE for 2023 via Census API, computing child poverty rate ranking, identifying bottom 50 counties by child poverty rate, grouping by state to show geographic clusters, comparing 2023 rates to 2019 pre-pandemic baseline.

  57. Q4Writing

    DOT National Transit Database deep-dive published

    Long-form article on NTD: NTD overview (National Transit Database; FTA Federal Transit Administration program; ~800 transit agencies reporting annually as condition of receiving FTA grants under 49 USC 5335; modes: heavy rail/subway, commuter rail, light rail/streetcar, bus rapid transit, local bus, demand-response paratransit, vanpool, ferry boat, cable car; annual reports due October 31 for prior fiscal year; primary data source for Section 5307 formula grants and DOT Public Transportation Fact Book); key metrics (Unlinked Passenger Trips UPT each boarding counted separately even for single journey requiring transfer; Vehicle Revenue Miles VRM miles operated in revenue service carrying paying passengers; Vehicle Revenue Hours VRH; Operating Expenses total and broken into labor/fuel/maintenance/administrative; Capital Expenses; Fare Revenue; Fare Recovery Ratio = fare revenue/operating expenses; 2023 US total: 10.4B UPT recovered from COVID but below 2019 peak 15.7B; heavy rail 3B, bus 4B, commuter rail 400M; NYC MTA subway alone 1.1B UPT in 2019 fell to 600M 2020 partially recovered to 950M 2023; operating expense per UPT: heavy rail $2-4, bus $4-8, demand-response $30-60); COVID impact and recovery (2020 ridership collapsed 60%+ systemwide; MTA subway 1.8B to 600M; MBTA Boston 45% recovery 2020; LA Metro bus less severe decline as essential worker ridership sustained; CARES Act March 2020 $25B transit emergency relief, CRRSAA December 2020 $14B, ARP March 2021 $30B — total $69B; agencies maintained service despite near-zero revenue; post-COVID recovery uneven: suburban commuter rail slowest recovery as office commuting did not return to pre-COVID levels; urban heavy rail recovering faster than commuter rail; bus recovering better than rail overall; 2023 mode-level recovery ratios vs 2019: subway 85%, bus 75%, commuter rail 70%, light rail 80%); agency profiles (major agencies by 2023 annual UPT: NYC MTA 2.5B including subway/bus/LIRR/Metro-North, LA Metro 250M, CTA Chicago 320M, WMATA DC 250M, SEPTA Philadelphia 230M, MBTA Boston 160M, BART Bay Area 90M, MARTA Atlanta 65M, TriMet Portland 55M, DART Dallas 55M; operating budgets range from NYC MTA $18B+ to small rural systems under $1M; fleet vehicle count and average age by mode from Vehicle Inventory spreadsheet); formula grants (Section 5307 Urbanized Area Formula distributes approximately $5B annually; formula uses NTD data directly: 50% allocated by bus UPT and bus VRM, 50% by rail UPT and rail VRM within urbanized area; rural Formula Section 5311 uses NTD rural transit reporting; Capital Investment Grant CIG New Starts competitive process for new light rail and BRT projects; FAST Act 2015 and Infrastructure Investment and Jobs Act 2021 — $91B in transit funding over 5 years in IIJA); data access (transit.dot.gov/ntd/ntd-data annual database Excel files: Revenue Vehicle Inventory, Service, Fare Revenue, Expenses, Capital, Funding, Safety; monthly ridership reports released 6 weeks after each month; NTD agency profiles: UACE urbanized area code, service area population, modes operated, total fleet; no full RESTful API but structured Excel files enable programmatic analysis); Python script downloading NTD monthly ridership Excel for 2019 and 2023, parsing UPT by agency and mode, computing 2023/2019 recovery ratio, ranking top 20 agencies by absolute UPT shortfall from 2019 baseline.

  58. Q4Writing

    USPTO trademark database deep-dive published

    Long-form article on USPTO trademark data: trademark overview (Lanham Act 1946; USPTO manages federal trademark registration; approximately 3 million active registered trademarks; approximately 650,000 new applications per year at recent peak; trademark protection: common law rights exist from use in commerce but federal registration provides nationwide constructive notice, right to use registered trademark symbol, federal court jurisdiction, US Customs Service recordation to block infringing imports, incontestability after 5 continuous years of use; trademark lasts indefinitely as long as in commercial use and renewed; distinct from patents 20-year term and copyright life+70 years); application and registration (TEAS Trademark Electronic Application System; use-in-commerce basis for marks already in use in interstate commerce; intent-to-use ITU basis for marks not yet in use but with bona fide intent; 45 Nice Classification classes: 1-34 goods (class 9 computers/electronics, 25 clothing, 29-33 food/beverages, 5 pharmaceuticals), 35-45 services (class 42 technology services, 36 finance/banking, 41 entertainment/education); examination by USPTO examining attorney 3-6 months; Official Gazette 30-day opposition publication period; Statement of Use SOU for ITU applicants after use begins with maximum 36-month extension period; Section 8 continued use declaration due 5-6 years after registration; Section 15 incontestability declaration after 5 years of continuous use; 10-year renewal cycle; mark types: standard character mark word/phrase, design mark/logo, sound mark, color mark, trade dress overall commercial image); TESS and examination (Trademark Electronic Search System at tmsearch.uspto.gov; free-form keyword search and Boolean field search; Leavitt design search codes classify visual elements of logos; mark statuses: Live Active/Pending, Dead Abandoned/Cancelled/Expired/Surrendered; likelihood of confusion standard: DuPont 13-factor test with similarity of marks and similarity of goods/services as most important factors; dilution protection for famous marks regardless of goods/services relatedness — Nike against non-competing use); TTAB proceedings (Trademark Trial and Appeal Board; inter partes proceedings: opposition filed within 30-day OG publication window, cancellation filed any time after registration; ttabvue.uspto.gov for docket search; First Amendment considerations post Matal v. Tam 2017 invalidated disparagement bar; craft brewery oppositions as major TTAB category with 500+ pending at peak; Federal Circuit as appellate court over TTAB); bulk data and API (bulkdata.uspto.gov; annual full files and daily XML incremental updates; XML fields: serial number, filing date, registration number, mark text, owner name/address, attorney/agent, Nice Classification, goods/services description, disclaimer, drawing type code 1-5, specimen URL; Python lxml or iterparse for large XML files; USPTO Trademark JSON API at api.USPTO.gov/trademark with 25 results/request limit and rate limits; search by mark text, owner, goods/services text, classification); economic signals (trademark filings correlate with small business formation and IP-intensive industry activity; filings peaked 2021-2022 during e-commerce boom; China accounted for ~25% of foreign USPTO trademark filings by 2022 with IPAS International Patent/Trademark Application System; tech sector concentrated in class 42 SaaS and class 9 hardware; fashion class 25/18; food/beverage 29-33; finance class 36; entertainment class 41); Python script using USPTO Trademark API to fetch Class 42 technology service registrations 2020-2024, aggregate by filing year/quarter, compute year-over-year growth rate, analyze goods/services descriptions for keyword frequency to identify fastest-growing technology subcategories.

  59. Q4Writing

    Federal Reserve SLOOS credit conditions data deep-dive published

    Long-form article on Fed Senior Loan Officer Opinion Survey: SLOOS overview (quarterly survey of approximately 80 large domestic commercial banks and 24 US branches of foreign banks; survey conducted January April July October each quarter; results published approximately 2 weeks after end-of-quarter FOMC meeting; questions ask about changes in lending standards and loan demand in current quarter compared to prior quarter — retrospective not forecasting; key tool for Federal Reserve understanding credit cycle conditions and monetary policy transmission); net percentage concept (net percentage = percent of respondents reporting tightening minus percent reporting easing; positive net indicates net tightening, negative indicates net easing; C&I commercial and industrial loans: standards for large/medium firms and small firms separately reported; terms in addition to standards: spreads over cost of funds, non-price terms such as covenants and collateral requirements; CRE commercial real estate: construction/land development, multifamily, nonfarm nonresidential; residential mortgage: GSE-eligible conforming, jumbo non-conforming, subprime, HELOCs; consumer credit: credit card, auto, other consumer; demand questions parallel supply questions for each loan category); historical episodes (GFC 2007-2009: net tightening on C&I standards for large firms reached +80% in Q4 2008; CRE standards reached +95% in Q1 2009; subprime mortgage standards tightened from +30% in Q2 2006 accelerating to +75% by Q2 2007; Fed used SLOOS with CAMELS bank examination ratings; 2020 COVID: Q2 2020 survey showed fastest tightening since GFC — C&I large firms +68%, CRE construction +74%, credit card +55%; agencies ran full service with near-zero fare revenue relying on emergency relief; 2022-2023 rate hike cycle: tightening began Q3 2022 as Fed raised rates 75 bps per meeting; small firm C&I tightening led large firm tightening by 1-2 quarters; SVB Silicon Valley Bank collapse March 2023 caused temporary tightening spike for banks under $50B assets; 2024: gradual easing as Fed cut rates from 5.25-5.5% toward neutral); special topical questions (each quarterly survey includes 4-8 special questions on current policy-relevant topics; Q1 2023 special: how have deposit flow changes affected lending capacity; Q2 2023: what is your outlook for CRE loan quality next 12 months — 70% expected deterioration; Q2 2020: mid-quarter supplemental survey on COVID impact on loan demand and standards; 2019: special questions on leveraged lending and CLO exposures; special questions give SLOOS forward-looking texture beyond standard retrospective questions); macro relationship (SLOOS net tightening for C&I loans leads business investment spending by 2-4 quarters; net tightening above +50% historically coincides with or shortly precedes NBER-dated recessions; Fed uses SLOOS in Beige Book narrative, Monetary Policy Report to Congress, and FOMC meeting minutes with explicit numerical references; transmission mechanism: bank tightening restricts credit to small-medium businesses that lack capital markets access, reducing investment/hiring; SLOOS combined with JOLTS/ECI/CPI as four key Fed monitoring inputs; Fed VP Governors cite SLOOS in academic speeches distinguishing credit supply from credit demand shifts); data access (federalreserve.gov/data/sloos annual/quarterly tables in Excel and PDF; historical from 1990; FRED series: DRTSCILM C&I large/medium firms quarterly percent net tightening, DRTSCIS small firms, DRTSCLCC credit cards, DRTSSP residential prime conforming, DRTSLM large/medium residential; Python via fredapi library FRED API); Python script fetching FRED series DRTSCILM large/medium C&I and DRTSCIS small firms since 2000, computing correlation of two series, identifying quarters where small firm tightening led large firm tightening by more than 10 percentage points, overlaying NBER recession indicator USREC from FRED, annotating GFC/COVID/SVB events, plotting dual-line chart with +50% threshold horizontal reference line.

  60. Q4Writing

    FCC Universal Licensing System deep-dive published

    Long-form article on FCC spectrum data: ULS overview (Universal Licensing System created 1998 consolidating prior paper license systems; 25M+ active wireless licenses; covers amateur radio 11M+ operators, commercial mobile wireless AT&T/Verizon/T-Mobile spectrum, public safety P25 networks, broadcast stations, point-to-point microwave, experimental, satellite earth stations; unique registration number URN and call sign as license identifiers; searchable at wireless.fcc.gov/uls); national frequency allocation (National Table of Frequency Allocations 47 CFR Part 2; NTIA manages federal government spectrum DoD/NASA/FAA; FCC manages commercial and non-government; low-band 600/700 MHz long range and building penetration — T-Mobile 600 MHz and AT&T FirstNet 700 MHz Band 14; mid-band 2.5 GHz T-Mobile legacy Sprint, 3.5 GHz CBRS Citizens Broadband Radio Service three-tier sharing, C-band 3.7-4.2 GHz major 5G capacity; mmWave 24/28/37/39/47 GHz ultra-high capacity short range); spectrum auctions (FCC auction authority from 1993 Omnibus Budget Reconciliation Act; SMRA simultaneous multi-round ascending auction and combinatorial clock auction CCA formats; Auction 110 C-band 2021 $81B largest spectrum auction ever, includes $13B incumbent satellite operator reimbursement; Auction 108 3.45 GHz 2021 $22B; Incentive Auction 1001 600 MHz 2016-2017 $19.8B raised repackaging broadcast TV; AWS-3 Auction 97 2015 $41B; winners T-Mobile $9B C-band, Verizon $45B, AT&T $23B; license geographic units Economic Areas and counties); ULS data and API (bulk data at ftp.fcc.gov/pub/Bureaus/Wireless/Databases/; key tables: EN entity with licensee name/address/FRN, HD header with license status/service/grant/expiration, LO location with latitude/longitude, FR frequency with frequency range/emission designator, AN antenna with height/gain, HS history; license status codes A Active E Expired C Cancelled T Terminated; joining on unique_system_id across tables; amateur radio bulk table l_amat.zip updated weekly); broadcast licensing (CDBS Consolidated Database System transitioning to LMS Licensing Management System; AM/FM/TV stations searchable by call sign at fcc.gov/media/radio; engineering parameters: ERP effective radiated power in kW, HAAT height above average terrain in meters, contour maps; 12,000 commercial and 4,200 noncommercial FM stations; ATSC 3.0 NextGen TV transition); Python script downloading ULS amateur license bulk data, joining EN and HD tables, filtering to active licenses, aggregating by state, normalizing by 2020 Census state population, ranking states by amateur radio operators per 100,000 residents.

  61. Q4Writing

    HUD Housing Choice Voucher data deep-dive published

    Long-form article on HUD Section 8 Housing Choice Vouchers: program mechanics (~2.3 million voucher households 2024, ~$30B annual federal outlays, administered by ~2,200 local Public Housing Authorities PHAs; tenant-based subsidy — voucher holder finds private-market unit meeting Housing Quality Standards HQS, PHA pays difference between tenant contribution of 30% of adjusted income and payment standard; portability allows vouchers to move across jurisdictions; utilization floors require PHAs to lease 95% of budget authority or face recapture); Fair Market Rents (HUD publishes FMRs annually for ~2,600 FMR areas metro and non-metro at 40th percentile of gross rent from recently-moved renters; 2024 examples: New York City 2BR $2,765, San Francisco 2BR $3,268, Boston 2BR $2,450, Dallas 2BR $1,450, rural Mississippi 2BR $725; Small Area FMRs SAFMRs in 30+ designated high-cost metros use zip-code level FMRs rather than metro-wide averages to reduce geographic concentration; payment standard set by PHA at 90-110% of FMR with HUD exception payment standards up to 120% in rare cases); waitlists and program gaps (demand far exceeds supply; waitlists 1-10 years in high-cost metros; Los Angeles Housing Authority waitlist had 600,000+ applicants when briefly opened in 2017; only approximately 25% of eligible extremely low-income households receive any rental assistance nationally; Harvard JCHS estimates 17M+ extremely low-income households competing for approximately 5M total subsidized units; Emergency Housing Vouchers 70,000 targeted for persons experiencing homelessness or fleeing domestic violence funded by ARP Act 2021); PASH/PIC administrative data (HUD PIC Public and Indian Housing Information Center as underlying data system; HUD publishes aggregated Picture of Subsidized Households PASH annually at huduser.gov; tract-level variables: number of units, average adjusted income, percent elderly, percent disabled, percent female-headed household, months on waitlist, race/ethnicity breakdown by PHA and census tract; enables analysis of voucher geographic concentration vs. opportunity areas); Project-Based Rental Assistance PBRA (HUD contracts directly with private landlords to subsidize specific units; approximately 1.2M units in project-based programs; Multi-Family Housing database at huduser.gov; program types: Section 8 New Construction/Substantial Rehabilitation, Section 236 interest subsidy, Section 515 rural rental housing, Section 202 elderly; expiring-use problem as contracts expire and owners opt out; National Housing Preservation Database NHPD tracks all federally subsidized properties); AFFH and CHAS (Affirmatively Furthering Fair Housing rule requires PHAs to map opportunity; CHAS Comprehensive Housing Affordability Strategy data from Census ACS tabulated by income/tenure/affordability/substandard conditions by CDBG jurisdiction; cost burden greater than 30% income; severe cost burden greater than 50%; 72%+ of extremely low-income renters cost-burdened); Python script fetching HUD FY2024 FMR data, joining to ACS median renter household income by metro CBSA, computing FMR as percentage of median renter income, ranking metros by affordability gap identifying where FMR exceeds 50% of renter income.

  62. Q4Writing

    Census American Housing Survey deep-dive published

    Long-form article on Census AHS: survey design (joint Census Bureau/HUD biennial panel survey; odd years national core, 4-year rotation for ~25 large metropolitan oversamples; ~60,000 housing units longitudinally tracked — same units surveyed each wave unlike cross-sectional ACS; Computer-Assisted Personal Interviewing CAPI and CATI; proxy allowed; national sample uses address-based sampling from USPS Delivery Sequence File); structural characteristics (year built by decade cohort, structure type single-family detached/attached/2-4 unit/5+ unit/mobile home/other, rooms count, bedrooms, bathrooms, square footage estimate, plumbing complete/lacking/incomplete, kitchen facilities complete/lacking, heating fuel natural gas/electricity/oil/wood/solar/none, foundation type, exterior material); condition problems (roof leaks past 12 months, water leaks from inside, holes in floors/walls/ceilings, broken plaster or peeling paint more than 1 sq ft, electrical wire exposed or fuses blown 3+ times, evidence of rats/mice past 90 days, heating equipment breakdown more than 6 hours, sewage disposal breakdowns; adequacy classification: adequate, moderately inadequate with 1-3 deficiencies, severely inadequate with 4+ or major deficiency); historical record (AHS national since 1973 as longest continuous US housing quality time series; complete plumbing from 95.5% 1973 to 99.7%+ 2021; mobile home share stable 6-7%; owner-occupancy peaked 69.0% 2004, declined to 63.1% 2016 post-financial-crisis, recovered to ~65% 2021; median new single-family homes from approximately 1,500 sq ft 1973 to 2,300+ sq ft by 2015; severe housing inadequacy from 8% to 3%; multi-unit housing cost burden grew as rents outpaced income growth); metropolitan oversamples (NYC AHS 2021, Los Angeles 2019, Chicago 2021, Boston, Philadelphia, Houston, Atlanta; metro surveys enable comparison of housing quality and cost burden across metros; AHS metro tabulation at census.gov/programs-surveys/ahs); HUD Worst Case Housing Needs (biennial report to Congress using AHS microdata; defines worst case needs as very low income renters below 50% AMI with no housing assistance paying more than 50% income or in severely inadequate housing; 2023 report 8.5M worst case households continuous increase from 6.4M in 2013; primary driver is ratio of HCV budget to eligible households — program funding has not kept pace with rent growth); public use microdata (PUF download at census.gov; key variables TENURE 1=owner 2=renter 3=vacant, BUILT decade code, ROOMS, BEDRMS, BATHROOMS, UNITSIZE, RENT, VALUE, ZINC household income before taxes, ADEQUACY adequacy classification, WEIGHT replicate weights for standard error estimation; no Census API for AHS — microdata download only; DataFerrett for online cross-tabulations); Python script loading 2021 national PUF CSV, filtering to occupied renter units (TENURE=2, VACANCY not applicable), computing annual rent burden ratio annual_rent/ZINC2, classifying buildings by pre-1940/1940-1969/1970-1999/2000+ construction decade, comparing cost burden rates across cohorts.

  63. Q4Writing

    USDA ERS agricultural economics data deep-dive published

    Long-form article on USDA Economic Research Service: ERS overview (~400 economists/statisticians, all data free at ers.usda.gov, distinct from NASS production surveys — ERS focuses on economic analysis of food and agriculture; principal topics: farm income and wealth, food prices and expenditures, commodity markets and trade, rural economy, food security and access, natural resources and environment, agricultural productivity); farm income (Farm Income and Wealth Data System FADS: net farm income $116B in 2023 historically high from elevated commodity prices, down from $183B peak in 2022; 2.0M farms in US; 90% are small family farms with less than $350,000 gross cash farm income; 4% of farms with more than $1M gross produce approximately 70% of total output; farm household income includes both on-farm income and off-farm wages/salaries/retirement; government payments: USDA commodity programs ARC Agriculture Risk Coverage pays when county revenue falls 14% below benchmark, PLC Price Loss Coverage pays when market price falls below reference price; total government payments fluctuate from $5B to $20B+ in disaster years; crop insurance indemnities separate from direct payments); food prices (Food Price Outlook updated monthly: ERS uses BLS CPI food components and produces 12-month forecasts; 2022 grocery price surge +11.4% highest since 1979 driven by supply chain disruptions/pandemic labor shortages/Ukraine war impact on wheat/sunflower oil/fertilizer; total US food expenditures approximately $2.4T annually: $1.5T food away from home/restaurants, $0.9T food at home/grocery; Commodity Costs and Returns: production costs per acre by commodity enabling profitability analysis at farm level); food security (Annual Household Food Security Survey: December CPS supplement ~50,000 households; 18-item Household Food Security Scale measuring frequency and severity of food access problems; four categories High Food Security, Marginal Food Security, Low Food Security food insecure without hunger, Very Low Food Security food insecure with hunger; 2023: 13.5% food insecure = 17.9M households = 47M people including 13M children; child food insecurity 7.2% of households with children; state-level estimates require 3-year pooling for reliability; SNAP participation inversely correlated with food insecurity reducing rates ~1 percentage point per 10% participation increase; Food Access Research Atlas food desert mapping: low-income census tracts more than 1 mile urban or 10 miles rural from supermarket); commodity markets (Monthly Commodity Outlook for corn soybeans wheat cotton rice livestock dairy poultry eggs; WASDE-comparable supply/demand tables; ARC/PLC Farm Bill reference prices: corn $3.70/bushel, soybeans $8.40/bushel, wheat $5.50/bushel, rice $14.00/cwt; 10-year Farm Bill CBO baseline scoring for commodity program costs; ethanol demand for corn approximately 5 billion bushels annually or 40% of US corn use); rural America (Rural-Urban Continuum Beale Codes 1-9: 1 metro 1M+, 9 most rural isolated; all 3,141 counties classified; Rural-Urban Commuting Area RUCA codes at census tract level for finer geography; rural poverty rate consistently 15%+ vs metro 11%; 180+ rural hospital closures since 2010 particularly in states without Medicaid expansion; broadband: 17% rural households lack access vs 1% urban per FCC Form 477; Atlas of Rural and Small-Town America county-level data on employment/population/income/education); Python script fetching ERS farm income CSV, parsing net farm income/government payments/production expenses 2000-2023, computing government payments as percentage of net farm income by year, identifying years exceeding 50% threshold (2000 droughts, 2019-2020 MFP trade war payments and COVID ad hoc assistance), producing stacked bar chart of income components.

  64. Q4Writing

    BLS Employment Cost Index deep-dive published

    Long-form article on BLS ECI: fixed-weight quarterly measure of employer compensation costs (wages + salaries + benefits) covering civilian workers, private industry, and state/local government; National Compensation Survey design (~18,000 establishments, probability sampling with certainty for large firms, matched establishment-occupation panel); fixed employment-weight basket eliminates industry-mix composition shifts that distort Average Hourly Earnings from CES; quarterly release approximately 30 days after quarter end; private-industry wages+salaries peaked at ~5.7% year-over-year in mid-2022 (highest since 1980s), decelerated to ~4.2% by end-2023; Fed comfort level ~3.5% consistent with 2% PCE inflation target given normal 1% productivity growth; Q1 2024 upside ECI surprise of 1.2% quarterly directly delayed FOMC rate cut timeline by multiple quarters; Fed Chair Powell cited ECI explicitly in press conferences; Employer Costs for Employee Compensation ECEC companion release: health insurance ~$3.50-4.00/hour per worker (highest single benefit), legally required FICA/workers comp/UI ~$3.50/hour, paid leave, retirement, supplemental pay; total benefits ~31% of compensation; state/local government ECI consistently lower than private reflecting union multi-year contracts and budget constraints; BLS API series CIU2020000000000A private industry wages quarterly not-seasonally-adjusted, CIU2010000000000A civilian workers all items; FRED mirrors: ECIWAG (wages/salaries quarterly), ECIALLCIV (total compensation civilian); unit labor cost = ECI growth minus productivity growth — non-inflationary when productivity offsets wages; Python script fetching both ECI and AHE series from BLS API, computing 4-quarter rolling average, plotting with 3.5% Fed target reference line.

  65. Q4Writing

    DOL UI weekly claims data deep-dive published

    Long-form article on DOL unemployment insurance weekly claims: Thursday 8:30 AM ET release by Employment and Training Administration; covers 53 jurisdictions (50 states + DC + Puerto Rico + Virgin Islands); initial claims = first-time UI filers that week; continuing claims = persons still collecting from prior weeks; program eligibility (involuntary separation required — not quit not fired for cause, sufficient base period wages varying by state, actively seeking work, able and available); benefit amounts (state minimum typically $50-150/week, maximum from $235 Mississippi to $1,015 Washington, maximum duration 26 weeks state-level plus extended benefits triggers); historical record: COVID peak 6.87 million initial claims week ending March 28 2020 dwarfing all previous records; prior record 695,000 in 1982 recession; Great Recession 2009 peak ~665,000; pre-COVID historic lows ~200,000 (2018-2019) lowest since 1969; COVID CARES Act additions: $600/week Federal Pandemic Unemployment Compensation FPUC, Pandemic Emergency Unemployment Compensation PEUC 13 additional weeks, Pandemic Unemployment Assistance PUA for gig workers and self-employed not previously UI-eligible; ETA Form 539 advance release format; seasonally adjusted vs. not seasonally adjusted; state breakdown published with 2-week lag vs. national same-week; 4-week moving average standard smoothing method removes seasonal noise from auto plant model-year retooling in July-August, holiday manufacturing layoffs, weather events; Hurricane Katrina 2005 caused Louisiana claims spike; Hurricane Harvey 2017 Texas spike then abrupt reversal as recovery employment surged; FRED series ICSA initial claims seasonally adjusted (1967-present), ICNSA not seasonally adjusted, CC4WSA 4-week moving average continuing claims, IC4WSA 4-week moving average initial claims; market reaction: Treasury yields and equity futures move immediately at 8:30 AM on strong/weak prints; claims as leading indicator vs. monthly BLS unemployment rate as lagging indicator; Python script fetching ICSA from FRED API past 5 years, computing rolling 52-week mean and standard deviation, flagging weeks exceeding +2 standard deviations as shock weeks, annotating with event labels.

  66. Q4Writing

    Census Foreign Trade Statistics deep-dive published

    Long-form article on Census Bureau foreign trade data: Census Foreign Trade Division compiles monthly US import and export goods statistics from administrative data — Automated Export System AES electronic export declarations filed by shippers and CBP Automated Commercial Environment ACE import entry records from customs brokers and importers; FT-900 monthly release (joint Census/BEA): advance goods trade balance released approximately 30 days after month end, full goods plus services approximately 60 days; all Census trade data at usatrade.census.gov USA Trade Online; Harmonized System HS product classification: 2-digit chapters (01 live animals through 98 special provisions), 4-digit headings, 6-digit internationally standardized subheadings, US extension to 10-digit Schedule B (exports) and Harmonized Tariff Schedule HTS (imports); Schedule B code lookup at scheduleB.census.gov; Section 301 tariffs by USTR applied at HTS-10 level with specific lists for 7.5% and 25% China tariff tiers; 2023 aggregates: goods exports $2.02T, goods imports $3.08T, goods deficit $1.06T; top export categories: mineral fuels/petroleum $190B (shale revolution made US net petroleum exporter in some months), civilian aircraft/parts $131B, soybeans/grains, industrial machinery, semiconductors; top import categories: petroleum $297B, vehicles $215B, smartphones/computers $180B, pharmaceutical preparations $140B; bilateral country balances: US-China goods deficit $279B 2023 down from $419B peak 2018 pre-tariff, US-Mexico deficit $152B, US-Canada deficit $64B, US-EU deficit $213B; tariff shift effects: Section 301 reduced China share of US imports from 21% to 14% as sourcing shifted to Vietnam ($50B imports in 2017 to $114B in 2023), Mexico, Taiwan; USA Trade Online features: HS-10 level data by country and port, domestic exports (US-origin) vs. foreign exports (re-exports), general imports (all entering) vs. imports for consumption (cleared through customs), 420 port districts; Census API endpoint api.census.gov/data/timeseries/intltrade/ with COMM_LVL for HS aggregation level, CTY_CODE for country, GEN_VAL_MO for import value; Python script pulling semiconductor HS 8542 imports by source country 2018-2024 showing China-to-Taiwan-Vietnam substitution as stacked area chart.

  67. Q4Writing

    NIFC wildfire data deep-dive published

    Long-form article on National Interagency Fire Center wildfire data: NIFC headquartered Boise Idaho 1965, joint operation of USDA Forest Service, DOI Bureau of Land Management, National Park Service, Bureau of Indian Affairs, Fish and Wildlife Service plus state agencies; coordinates national firefighting resource dispatch through 11 Geographic Area Coordination Centers GACCs; National Interagency Coordination Center NICC tracks active fires; statistical history from 1926 with pre-1983 reporting caveats; modern era 1983-present: 2006 9.87M acres, 2012 9.32M acres, 2015 10.13M (record for years), 2017 10.03M, 2020 10.12M with California alone burning 4.1M acres (first year California exceeded 4M); 10-year rolling average acreage roughly doubled from 1980s to 2020s; fire season extending from traditional summer 5-month window to year-round in California; landmark fires: Camp Fire November 2018 Paradise CA — 153,336 acres, 85 deaths, 18,804 structures destroyed, approximately $16.5B insured losses (record at time), Paradise entire town destroyed; August Complex Fire 2020 — first California gigafire exceeding 1 million acres; Dixie Fire 2021 — 963,310 acres first fire to cross Sierra Nevada crest; Caldor Fire 2021 forced South Lake Tahoe evacuation; Maui Lahaina fire 2023 — 99 deaths, 2,200+ structures, deadliest US fire since 1918 Cloquet; USFS Fire Occurrence Database FOD: individual fire records 1992-present, approximately 2.3M fires, variables include state, cause code (lightning, equipment use, smoking, campfire, debris burning, railroad, arson, children, miscellaneous), size class A through G (A less than 0.25 acres, G greater than 5,000 acres), latitude/longitude, start/containment/control dates; Monitoring Trends in Burn Severity MTBS: Landsat-based dNBR burn severity mapping for fires larger than 1,000 acres East or 500 acres West from 1984-present at 30-meter resolution, 25,000+ fires mapped, available at mtbs.gov as GeoTIFF and shapefile; fire weather: NWS Red Flag Warning criteria (relative humidity below 15%, sustained winds above 25 mph, fine fuel moisture below critical threshold), Keetch-Byram Drought Index KBDI 0-800 scale measuring soil moisture deficit (California regularly exceeds 700 in drought years), Vapor Pressure Deficit VPD as key driver (warmer air demands more moisture from vegetation increasing flammability); suppression costs: USFS FY2022 $2.6B in suppression, 2018 Consolidated Appropriations Act fire suppression cap replacing previous transfer-from-other-programs mechanism; NIFC data access: annual statistics CSV at nifc.gov/fire-information/statistics, NIFC ArcGIS REST services for active fire perimeters, GeoJSON perimeter archive at ftp.wildfire.gov; Python script downloading NIFC historical CSV, parsing 1983-2023, computing 10-year rolling averages, fitting linear regression to acreage trend, reporting additional acres-per-decade from trend.

  68. Q4Writing

    Social Security OASDI data deep-dive published

    Long-form article on SSA OASDI program: program structure (Title II Social Security Act 1935; FICA 6.2% employee + 6.2% employer on wages up to $168,600 wage base 2024; two trust funds OASI Trust Fund and DI Trust Fund; Board of Trustees annual report; ~$1.4T paid in benefits to ~70M beneficiaries in 2024; SSA administers through 1,200+ field offices); benefit categories (retired workers ~57M; disabled workers ~8M Title II DI — distinct from SSI Title XVI means-tested; spouses, dependent children, survivors; Full Retirement Age 67 for born 1960+; early claiming at 62 permanently reduces benefits 25-30%; delayed claiming to 70 increases 8% per year; average retirement benefit $1,907/month 2024; Special Minimum Benefit for low-wage long-career workers; student and young survivor benefits); benefit formula (AIME Average Indexed Monthly Earnings from 35 highest earning years indexed to national average wage growth; PIA Primary Insurance Amount = 90% of first $1,174 AIME + 32% of AIME between $1,174 and $7,078 + 15% above $7,078 — 2024 bend points; worked example: $4,000 AIME yields PIA = 90%×$1,174 + 32%×$2,826 = $1,057 + $904 = $1,961); COLA (Consumer Price Index for Urban Wage Earners and Clerical Workers CPI-W third-quarter average year-over-year; 3.2% in 2024 effective January; 8.7% in 2023 largest since 1981; CPI-E alternative index weighted toward elderly consumption not currently used; COLA applies to all benefit categories simultaneously); WEP and GPO (Windfall Elimination Provision reduces PIA for workers with pension from non-covered employment — maximum reduction $587/month 2024; Government Pension Offset reduces spousal/survivor benefit by 2/3 of government pension; affects approximately 2.2M beneficiaries; repeal efforts in Congress recurring but unfunded); SSA data ecosystem (data.ssa.gov monthly statistical snapshot — total beneficiaries by type, total expenditures, average benefit; Annual Statistical Supplement 700+ tables covering decades of beneficiary counts, benefit amounts, worker ratios, state breakdowns; Social Security Statement at my.ssa.gov shows individual earnings history and projected benefits; OACT Office of the Chief Actuary publishes Trustees Report with 75-year actuarial projections and stochastic uncertainty analysis; disability program publishes Disability Statistics from SSA Annual Statistical Report); fiscal outlook (2024 Trustees Report: OASI trust fund depletion projected 2033, DI trust fund projected beyond 75 years; at OASI depletion, incoming revenues cover approximately 77% of scheduled benefits; long-range actuarial deficit = 3.33% of taxable payroll as percentage of GDP roughly 1.2%; worker-to-beneficiary ratio declined from 3.3 in 1980 to 2.7 in 2024 projected to fall to 2.0 by 2050 as baby boomers age through retirement; policy options: raise payroll tax rate 6.2% to 7.2%, raise wage base to $350k+, raise FRA to 68-69, reduce COLA formula, means-test benefits, introduce investment accounts); SSI distinction (Supplemental Security Income Title XVI means-tested federal income support for aged/blind/disabled — not funded by FICA, funded by general revenue; 2024 federal benefit rate $943/month individual/$1,415/couple; state supplements vary $0-$700; approximately 7.5M SSI recipients; most dual-eligibles receive reduced SSI and Medicare/Medicaid simultaneously; SNAP, housing assistance, Medicaid all interact with SSI income and resource tests); state variation (Annual Statistical Supplement Table 5.J beneficiaries by state; Florida highest absolute retired worker count reflecting retiree migration; West Virginia, Alabama, Kentucky, Mississippi highest disability rates per capita reflecting older workforce demographics, coal mining legacy conditions, limited alternative employment; California highest absolute SSI count; states with higher early claiming rates correlate with lower-wage, physically demanding occupations); Python script fetching two consecutive SSA Monthly Statistical Snapshots from SSA.gov statistical data pages, parsing beneficiary counts by program type (retired workers, disabled workers, survivors, SSI), computing year-over-year percentage change for each category, identifying fastest-growing benefit type and largest absolute dollar expenditure change.

  69. Q4Writing

    Census CPS poverty and income data deep-dive published

    Long-form article on Census Current Population Survey: survey design (joint Census Bureau and BLS program since 1940; ~60,000 housing units monthly from Master Address File derived from decennial census; 4-8-4 rotating panel — 4 consecutive months in sample, 8 months out, 4 months back in, then exit; Computer-Assisted Telephone Interviewing CATI primary with Computer-Assisted Personal Interviewing CAPI for refusals and special populations; proxy interviews allowed for absent household members; CPS ASEC Annual Social and Economic Supplement collected February-April each year with ~95,000 household expanded sample, covers prior calendar year income from 15+ sources); monthly labor force data (BLS uses CPS as the household survey producing unemployment rate, labor force participation rate, employment-population ratio; U-1 through U-6 measures: U-3 official unemployment rate requires actively looking past 4 weeks; U-4 adds discouraged workers; U-5 adds marginally attached; U-6 adds part-time for economic reasons; household survey diverges from establishment CES payroll survey — CPS counts multiple jobs once per person, CES counts each job; January seasonal adjustment revisions reconcile; CPS captures self-employment, agriculture, private households, active military not in CES; pandemic remote-work classification issues 2020); official poverty measure (Mollie Orshansky 1963 thresholds derived from USDA Economy Food Plan budget × 3 multiplier based on 1955 survey showing families spend 1/3 income on food; thresholds vary by family size and composition but not geography; adjusted annually by CPI-U not updated for changed consumption patterns or geographic costs; 2023 poverty threshold family of 4 = $30,900; 2023 official poverty rate 11.1% = 36.8M people; limitations: excludes in-kind benefits SNAP/Medicaid/housing, excludes payroll taxes and work expenses, uses pre-tax cash income, no geographic price adjustment); supplemental poverty measure (SPM since 2011 following NAS National Academy of Sciences 1995 Measuring Poverty panel recommendations; thresholds based on Consumer Expenditure Survey spending on food, clothing, shelter, utilities FCSU bottom third; updated for geographic differences in housing costs using ACS owner-occupied costs; resource measure adds SNAP, school lunch, housing subsidies, EITC, child tax credit and subtracts payroll taxes, income taxes, work expenses, medical out-of-pocket; 2023 SPM 12.9% = 42.5M people; 2021 Child Tax Credit expansion reduced SPM child poverty to 5.2%, expiration raised it back to 12.4%; SPM more responsive to policy changes making it preferred for program evaluation); income and inequality (median household income $80,610 in 2023; income quintile upper bounds: bottom quintile ~$34k, second ~$60k, middle ~$100k, fourth ~$152k; top 5% share ~23% of aggregate income; Gini coefficient ~0.482 in 2023 — higher than all peer OECD nations; EITC lifts approximately 5.6M out of poverty under SPM; SNAP lifts approximately 3.2M; Social Security lifts approximately 15M including elderly); IPUMS CPS (University of Minnesota IPUMS harmonizes all CPS basic monthly and ASEC waves back to 1962; key variables FTOTVAL total family income, INCTOT personal income, POVERTY ratio of income to poverty threshold, OFFPOV official poverty flag, POVSPMU SPM unit poverty flag; ipumsr R package and ipumspy Python package provide structured microdata access; enables 60-year income inequality time series); data access (api.census.gov CPS ASEC table API; key variables DP03 economic characteristics from ACS vs CPS differences; DataFerrett at dataferrett.census.gov for custom tabulations; Census FTP for basic monthly CPS microdata in fixed-width text; FRED series UNRATE unemployment rate, MEHOINUSA672N real median household income, SIPOVTHRS poverty threshold); Python script using Census API to fetch state-level poverty rates from CPS ASEC, two most recent years, computing year-over-year change, ranking states by change, flagging states with statistically significant shifts.

  70. Q4Writing

    BEA International Transactions balance of payments deep-dive published

    Long-form article on BEA International Transactions Accounts: BOP framework (comprehensive double-entry accounting of all economic transactions between US residents and rest of world; BEA publishes quarterly International Transactions Accounts under IMF Balance of Payments and International Investment Position Manual 6th edition BPM6 standards; three accounts: Current Account recording trade in goods and services, primary income investment returns, secondary income transfers; Capital Account negligible for US; Financial Account recording FDI, portfolio investment, other investment, reserve assets; by accounting identity Current Account + Capital Account = Financial Account + statistical discrepancy; quarterly release approximately 90 days after quarter end); current account components (2023: goods exports $2.02T, goods imports $3.08T, goods balance -$1.06T; services exports $998B, services imports $705B, services balance +$293B; primary income receipts $1.22T, payments $1.02T, net +$196B; secondary income receipts $125B, payments $347B, net -$222B; total current account -$905B, 3.3% of GDP; goods deficit driven by industrial supplies/materials and consumer goods; petroleum goods balance improved dramatically from shale production — US became net petroleum exporter in some months; services surplus reflects US comparative advantage in financial services, intellectual property royalties, education, tourism, business services); bilateral balances (US-China goods deficit $279B in 2023, reduced from $419B peak 2018 pre-tariff; US-Mexico goods deficit $152B; US-Canada goods deficit $64B; US-EU goods deficit $213B; bilateral balances mislead because of global value chains — Chinese iPhone assembly using Korean/Japanese/US components attributed to China; Trade in Value Added TiVA framework more accurate but less timely); financial account (FDI definition: 10%+ equity ownership; US FDI abroad outflows ~$500B; foreign FDI into US inflows ~$350B; direct investment income reinvested earnings largest component; portfolio investment: Treasury securities, agency securities, equities; Treasury TIC data supplementary monthly source for foreign holdings of US securities; reserve assets changes: Federal Reserve foreign exchange reserves ~$35B plus gold valued at $42.22/troy oz historic convention plus IMF SDRs; statistical discrepancy reflects data collection lags and coverage gaps); international investment position (IIP = stock complement to BOP flows; US net IIP = -$20.6T at end 2023, meaning foreigners hold $20.6T more in US assets than Americans hold abroad; negative NIP does not imply future crisis because US earns positive net primary income — US external assets earn 5-6% average return while US liabilities pay 3-4% average, generating annual net income surplus of $200B+; this return differential reflects dollar reserve currency status, VC/equity risk premium, and composition of US foreign assets toward higher-return FDI vs. US liabilities toward lower-yield Treasuries — Barry Eichengreen exorbitant privilege concept); policy context (trade deficit = goods+services deficit only; full current account includes primary income net positive; common misconception: trade deficit does not directly cause job loss — driven by macroeconomic savings-investment imbalance S-I=NX; US trade deficit reflects US consumer spending exceeding domestic production and US being preferred destination for global savings; Section 232 steel/aluminum tariffs 2018 reduced steel imports but shifted to other sectors; Section 301 China tariffs 2018-2019 reduced US-China deficit but shifted to Vietnam/Mexico/EU without reducing total; exchange rate appreciation mechanism in long run but slow adjustment); BEA API (apps.bea.gov/api; dataset ITA; TableName ITA; Indicators: BalCurrentAcct, DefGoods, SurServices, IncomeReceipts, IncomePayments; Frequency Q and A; Year LAST10 or specific range; Python requests example fetching all five current account components quarterly); Census FT-900 (monthly advance goods trade release 30 days after month end; goods exports/imports by end-use category capital goods, consumer goods, industrial supplies/materials, automotive, foods/feeds/beverages; FRED BOPGSTB goods trade balance monthly, BOPBCA quarterly current account, NETFI net financial investment quarterly); Python script fetching quarterly current account balance and all component series from BEA ITA API, computing 4-quarter rolling average total and by component, identifying which component drove largest swing in most recent 5-year period, outputting summary statistics for goods/services/income breakdown.

  71. Q4Writing

    NOAA NCEI climate data deep-dive published

    Long-form article on NOAA National Centers for Environmental Information: NCEI overview (created 2015 from merger of NCDC National Climatic Data Center, NGDC National Geophysical Data Center, NODC National Oceanographic Data Center; headquarters Asheville NC; 150+ petabytes of atmospheric, coastal, geophysical, oceanic data; 25+ billion online data requests per year; funds state climatologist network; principal US climate archive used by IPCC, academic researchers, insurance actuaries, FEMA, utilities); GHCN datasets (Global Historical Climatology Network Daily GHCN-Daily: ~120,000 stations worldwide, daily maximum/minimum temperature TMAX/TMIN, precipitation PRCP, snowfall SNOW, snow depth SNWD; quality control flags for 19 failure tests; homogenization algorithm corrects for station relocations, instrument changes, time-of-observation bias shifts using pairwise comparison; GHCN-Monthly: ~7,280 stations monthly averages, Central England Temperature record from 1659 oldest continuous, some US stations back to 1895; Berkeley Earth, NASA GISS, UK Met Office HadCRUT all use GHCN as primary input with independent homogenization producing similar but not identical temperature trends); US Climate Normals (30-year averaging periods updated each decade per WMO standard; current period 1991-2020; variables: monthly and annual temperature means/extremes, precipitation, snowfall, snow depth, heating degree days HDD, cooling degree days CDD, frost dates, wind, humidity; published for 15,000+ COOP Cooperative Observer Program stations; Normals used in energy utility load forecasting, agricultural planting calendars, insurance actuarial tables, building energy codes; 1901-2000 century-long normals also published for long-record stations); NOAAGlobalTemp and temperature trend (NOAAGlobalTemp combines GHCN-Monthly land surface with ERSST v5 Extended Reconstructed Sea Surface Temperature; 2023 global average surface temperature 1.45°C above pre-industrial 1850-1900 baseline — warmest year on record surpassing 2016 El Nino year; contiguous US average temperature +3.2°F since 1901; El Nino years consistently rank warmest: 2023, 2016, 2020, 2019, 2015; nClimDiv dataset provides divisional temperature and precipitation data for 344 US climate divisions back to 1895 — used in Climate at a Glance CAAG interactive tool); climate extremes (US Climate Extremes Index CEI measures fraction of contiguous US experiencing much-above or much-below normal conditions in six indicator categories including temperature extremes, heavy precipitation, drought; Billion-Dollar Weather and Climate Disasters database: events exceeding $1B in CPI-adjusted losses from 1980 to present; 2023: 28 events totaling $94B, each event independently verified through insurance industry data, federal disaster declarations, academic post-event assessments; 2021 record 22 events/$148B; high-temperature record-to-low-temperature record ratio approximately 2:1 nationally indicating warming; Heat dome June 2021 Pacific Northwest: 116°F Portland, 121°F Lytton BC, hundreds of excess deaths); related datasets (HURDAT2 Atlantic and East Pacific hurricane best-track database from 1851 — NHC National Hurricane Center position/intensity at 6-hour intervals; extended best-track adding wind radii from 1988; NEXRAD Next Generation Radar Level-II and Level-III archive at AWS Open Data — 1991 to present, 162 WSR-88D sites, 5-10 minute polar-coordinate reflectivity/velocity/spectrum width; NOAA Tides and Currents tidesandcurrents.noaa.gov: 200+ long-term water level stations, sea level trends from linear regression; US average sea level rise 3.6mm/year, Virginia/North Carolina 6-7mm/year from combination of sea rise and land subsidence, Alaska 0-2mm/year; COOP Cooperative Observer Program 11,000 volunteer stations some with 100+ year continuous records — backbone of US precipitation climatology); CDO API (Climate Data Online REST API at www.ncdc.noaa.gov/cdo-web/webservices/v2; endpoints /datasets, /datacategories, /datatypes, /locationcategories, /locations, /stations, /data; free token registration at ncdc.noaa.gov/cdo-web/token; rate limit 1,000 requests/day 5 requests/second; key parameters: datasetid GHCND daily or GSOM monthly summary or GHCNDMS; stationid GHCND:USW00094728 format; datatypeid TMAX TMIN PRCP; startdate/enddate YYYY-MM-DD; limit 1000 max per request; format json; chunking required for multi-year pulls due to 1000-record limit); Python script fetching monthly mean temperature from GHCND for Central Park New York station USW00094728 over 1900-2023 using CDO API with date chunking, computing baseline average for 1901-1960, calculating temperature anomaly for each year, fitting linear trend, plotting anomaly bar chart with 10-year rolling mean overlaid and El Nino record years annotated.

  72. Q4Writing

    VA disability benefits and PACT Act data deep-dive published

    Long-form article on VA disability compensation system: VA structure (VBA Veterans Benefits Administration adjudicates claims, VHA Veterans Health Administration delivers healthcare, NCA National Cemetery Administration; combined $130B+ annual benefits spending covering 22M+ living veterans with 9M enrolled in VA healthcare); disability rating system (0-100% in 10% increments using whole-person combined ratings formula — second disability rated against remaining healthy percent not added directly, so 50% + 30% yields 65% not 80%; 2024 monthly compensation rates table: 10%=$171, 20%=$338, 30%=$524, 50%=$1,075, 70%=$1,716, 100%=$3,737; Special Monthly Compensation SMC for severely disabled veterans requiring regular aid and attendance or having catastrophic disabilities; COLA adjustments tied to Social Security CPI adjustments; total disability compensation recipients approximately 5.5M in 2024 up from 3.5M in 2010); claims processing (Compensation and Pension C&P exams ordered from VA medical centers or contract examiners QTC/LHI/Veterans Evaluation Services; historic backlog peaked at 884,000 in 2013 from Iraq/Afghanistan claims surge; VBMS Veterans Benefits Management System digitalized paper claims files; Appeals Reform Act 2017 created three review lanes: Supplemental Claim for new evidence, Higher-Level Review for de novo review by senior rater, and Board of Veterans Appeals direct or evidence submission; average processing time goals 125 days for rating decisions); PACT Act 2022 (Sergeant First Class Heath Robinson PACT Act; expanded presumptive conditions for toxic exposures: burn pit exposure from post-9/11 combat (23 new presumptive conditions including constrictive bronchiolitis, rare respiratory cancers, rare head/neck/GI cancers), Agent Orange Vietnam era (bladder cancer, hypothyroidism, Parkinsonism), radiation exposure, Camp Lejeune water contamination 1953-1987 (8 conditions including bladder cancer, non-Hodgkins lymphoma, Parkinson disease); approximately 3.5M additional veterans newly eligible for disability claims; VA received more than 2M PACT Act claims in first year after August 2022 enactment; Congressional Budget Office estimated $280B+ 10-year cost; significant VA staffing and IT system strain from claims volume surge); GI Bill education benefits (Post-9/11 GI Bill Chapter 33: up to 36 months educational benefits, pays actual tuition at public in-state rate or $27,120 annual cap at private institutions, monthly housing allowance calculated as BAH E-5 with dependents at school location ZIP code, $1,000 annual books and supplies stipend; 90%+ benefit level requires 36+ months aggregate active duty service; entitlement transferable to spouse or dependents after 10 years service commitment with 4 additional years service obligation; Yellow Ribbon Program supplements private school tuition gaps above the cap through 50/50 institution-VA cost sharing; Montgomery GI Bill Chapter 30 as older alternative requiring $1,200 contribution from service member; Vocational Rehabilitation and Employment Chapter 31 for service-connected disabled veterans to achieve employment; total GI Bill expenditure approximately $10-12B per year); VA Home Loan Guaranty (no down payment required, no private mortgage insurance PMI, competitive interest rates at or below conventional market; VA funding fee ranges from 1.25% to 3.3% of loan amount depending on service type/down payment/subsequent use — waived for veterans with service-connected disability ratings; more than 4M loans guaranteed in FY2022; maximum loan amounts match FHFA conforming limits ($766,550 standard in 2024); Blue Water Navy Veterans Act 2019 extended VA loan benefits to veterans who served in offshore waters of Vietnam; HMDA loan type code VA=2 enables filtering VA-guaranteed loans in CFPB HMDA disclosure data); VA Open Data portal (data.va.gov with Socrata API; VBA Benefits Utilization state-level data: compensation recipients by state, average monthly compensation, total expenditures, diagnostic category breakdown; VHA utilization data by VISN Veterans Integrated Service Network; NCA national cemetery data; National Veteran Suicide Prevention Annual Report and supporting data; MISSION Act community care referral and utilization data; VA.gov/data as developer-facing data access point); Veterans Service Organizations (DAV Disabled American Veterans, American Legion, VFW Veterans of Foreign Wars, Vietnam Veterans of America act as VA-accredited claims agents free of charge; private attorneys charge maximum 20% of backdated award only after favorable decision; nexus letter from independent physician establishing medical nexus between current disability and military service often decisive; Individual Unemployability TDIU allows 100% compensation rate for veterans rated 60%+ single or 70%+ combined who cannot maintain substantially gainful employment — approximately 370,000 recipients; Total Disability Individual Unemployability distinct from 100% schedular rating); mental health and MST data (VA mental health program treats approximately 2M veterans annually; PTSD prevalence 11-20% among OIF/OEF veterans per VA National Center for PTSD; Military Sexual Trauma MST recognized as independent basis for PTSD service connection without in-service medical record — only requires credible statement; veteran suicide rate approximately 1.5× age-adjusted civilian rate — 2022 National Veteran Suicide Prevention Annual Report: 6,392 veteran suicides; MISSION Act expanded community mental health care access; Veterans Crisis Line 988 press 1); Python script fetching VA Benefits Utilization State Data from Socrata API (data.va.gov), joining to Census ACS B21001 veteran population estimates by state, computing disability compensation recipients per 10,000 veterans, ranking states by utilization rate and by average monthly payment.

  73. Q4Writing

    USGS National Water Information System deep-dive published

    Long-form article on USGS water resources data: USGS Water Resources mission (National Water Information System as federal hydrologic database; 8,000+ active streamflow gauging stations plus 4,000+ groundwater monitoring wells plus water quality sites; data from 1887 oldest active gauge; cooperative funding with states/cities/water districts — 65% of USGS water program budget from cooperators paying 50/50 with federal funds; USGS measures/monitors while EPA regulates/enforces distinction); streamflow gauging methodology (Acoustic Doppler Current Profiler ADCP uses Doppler effect to measure water velocity across cross-section without entering stream; traditional current meters still used at low flows; stage-discharge rating curves derived from periodic manual discharge measurements at each gauge — hydraulic cross-section and velocity at various stages creates the rating; stage measured continuously by pressure transducer or float recorder, discharge calculated from rating; provisional data published in near-real-time before quality control, approved data after annual review; WaterWatch real-time national map of streamflow percentiles colored by severity); flood prediction connection (USGS gauges transmit 15-minute data by satellite to NWS River Forecast Centers — 13 regional RFCs across US; NWS National Water Model launched 2016 covering 2.7 million stream reaches across continental US — hydraulic routing model forced by NWS precipitation analysis; short-range 15-minute forecasts, medium-range 10-day, long-range 30-day; flood stage thresholds at each gauge: Action (staff should monitor), Flood (minor lowland flooding), Moderate Flood (significant inundation of structures), Major Flood (extensive inundation requiring evacuations); USGS annual peak discharge record used in flood frequency analysis — 100-year flood equals 1% annual exceedance probability calculated from observed record; FEMA Flood Insurance Rate Maps and floodplain delineation use USGS discharge records as primary hydrologic input); groundwater monitoring (National Groundwater Monitoring Network NGWMN aggregating federal and state well data; aquifer water level monitoring at thousands of wells; major aquifer systems: Ogallala/High Plains Aquifer covering 174,000 square miles under 8 states from South Dakota to Texas, supplies irrigation for 30% of all US groundwater-irrigated cropland, water level declined 1-3 feet/year in Kansas and Texas portions as pumping exceeds recharge, some wells now 200+ feet lower than 1950 levels creating permanent loss of irrigation capacity; Floridan Aquifer System as largest confined aquifer in southeastern US covering 100,000 square miles, supplies 10M people, artesian pressure declining from over-pumping in coastal Georgia and Florida; Central Valley Aquifer California suffering land subsidence from agricultural pumping — Tulare/Kings counties subsided 28+ feet since 1900, USGS InSAR satellite interferometry measuring ongoing 1-2 foot/year subsidence); water quality monitoring (National Water-Quality Assessment NAWQA program monitoring 42 major study units across US; parameters at continuous sensors: specific conductance as salinity proxy, pH, dissolved oxygen, turbidity from suspended sediment, water temperature, nitrate where sensors deployed; pesticide and nutrient loads calculated as concentration × discharge for annual load estimates; PFAS per-fluoroalkyl substances emerging contaminant monitoring expansion 2020-2024; Water Quality Portal wqp.waterqualitydata.us aggregating EPA STORET/WQX, USGS NWIS, and state agency data into unified REST API — 300M+ result records); National Water Dashboard (waterdata.usgs.gov replacing legacy NWIS web; WaterWatch at waterwatch.usgs.gov for flood/drought monitoring maps with percentile-colored gauges; StreamStats at streamstats.usgs.gov for drainage basin delineation and regression-based flood frequency estimation at ungauged locations — used by engineers for bridge design and stormwater management); water use compilation (5-year national water use survey, most recent 2015 with 2020 underway; categories by sector: public supply 12%, domestic self-supplied 1%, irrigation 37% (largest consumptive use), thermoelectric power 41% (largest withdrawal but mostly non-consumptive return), industrial 5%, aquaculture 2%, mining 1%, livestock 1%; by source: surface water 73%, groundwater 27%; declining agricultural use from drip/micro-irrigation adoption; thermoelectric shifting to closed-loop recirculating cooling reducing gross withdrawals); drought and low-flow (WaterWatch daily streamflow percentile classification: 0-9th percentile = Much Below Normal, 10-24th = Below Normal; 7Q10 seven-day low flow with 10-year recurrence period used in NPDES wastewater discharge permits to ensure adequate dilution — state regulators require that wastewater discharge not cause violations during low-flow conditions; USGS StreamStats computes 7Q10 for any watershed; 2012 drought: Missouri/Mississippi Rivers fell to lowest levels in decades halting commercial barge navigation for weeks, costing approximately $300M in lost commerce); NWIS API (waterservices.usgs.gov/nwis REST services; instantaneous values iv/, daily values dv/, statistics stat/, site information site/, peak values peak/; key parameters: sites= as 8-digit USGS site number; statCd= 00003 mean/00001 max/00002 min; parameterCd= 00060 discharge in cubic feet per second/00065 gage height in feet/00010 water temperature Celsius/00300 dissolved oxygen mg/L; format= waterml or json or rdb tab-delimited; Python dataretrieval package wrapping NWIS API natively with pandas output); Python script downloading daily mean discharge for Missouri River at Hermann MO site 06934500 over 5 years, computing 30-day rolling average, identifying dates below 10th percentile drought threshold using historical statistics, plotting hydrograph with drought periods shaded and 2012 drought labeled.

  74. Q4Writing

    OPM federal workforce data deep-dive published

    Long-form article on OPM Central Personnel Data File and FedScope: OPM scope (manages HR for 2.1M+ federal civilian workforce excluding USPS 600k workers, uniformed military 1.3M active duty, intelligence community under separate pay authorities, legislative branch 30k, judicial branch 35k; Central Personnel Data File CPDF as longitudinal personnel record system; FedScope data tool at fedscope.opm.gov for interactive cube-style analysis; 2025 DOGE Department of Government Efficiency workforce reduction context — fork-in-the-road email, Schedule F executive order, senior career civil servant buyouts; OPM as central hub for hiring standards, classification, and benefits administration); FedScope data structure (cube-style exploration at fedscope.opm.gov; dimensions: agency, sub-agency, state, occupation OPM series code 4-digit, grade, pay plan GS/GM/SES/WG/WD/WT wage grade, appointment type permanent career/career-conditional/temporary/term/excepted/SES; September 30 reference date for annual snapshots; salary distribution by occupation and agency; attrition and separation rates; average length of service by agency); federal workforce demographics (geographic distribution: DC/MD/VA metro area houses approximately 18% of federal civilian workers but 50%+ of agency leadership; 82% of workers outside DC area serving programs nationwide; average employee age approximately 47 years; retirement eligibility wave — OPM projects 30%+ of workforce eligible to retire within 5 years creating succession risk; race/ethnicity composition more diverse than private sector: Black employees 19% vs. 12% private sector, Hispanic 9% vs. 18% private sector showing underrepresentation; women represent 45% of federal workforce); General Schedule pay system (GS-1 through GS-15 grades, 10 steps per grade; 2024 base pay rates: GS-7 Step 1 = $46,696 entry-level professional, GS-9 Step 1 = $57,118 common master degree entry, GS-11 Step 1 = $69,107, GS-13 Step 1 = $99,508 journey level, GS-15 Step 10 = $191,900 maximum base; locality pay adjustments published annually by OPM: 48 locality areas ranging from RUS Rest of United States +16.82% to San Francisco-San Jose +44.15% to Washington DC +33.26% to New York +37.09%; GS pay tables and locality rate tables publicly available at opm.gov/policy-data-oversight/pay-leave/salaries-wages/); Senior Executive Service (approximately 8,000 career SES positions plus approximately 4,000 political appointees under Schedule C and PAS Presidential Appointment with Senate confirmation; ES pay range $148,016 to $221,900 in 2024; SES diversity challenge: approximately 54% male, 73% white in career SES; OPM SES Desk Guide and annual SES diversity report; revolving door: Lobbying Disclosure Act requires 1-year cooling off period for very senior officials, 2-year for Cabinet secretaries); federal hiring process (USAJOBS.gov as single job announcement portal with 20,000+ active announcements; Delegated Examining Authority DEA for open competitive hiring — external candidates; Merit Promotion for internal career advancement; Schedule A authority 5 CFR 213.3102(u) for persons with disabilities without competitive examining; Direct Hire Authority DHA for severe shortage occupations particularly IT, cybersecurity, healthcare nursing where 100-day standard process creates impossible time-to-hire problem; average federal time-to-hire approximately 100 days vs. 45 days private sector; OPM Pathways Programs: Internship for current students, Recent Graduates 1-year development program, Presidential Management Fellows 2-year competitive fellowship for advanced degree holders); FERS retirement system (three-tier: FERS basic annuity 1.0-1.1% × high-3 salary × years of service with full benefit at MRA+30 or age 62+5 years, Social Security participation unlike CSRS, and Thrift Savings Plan; TSP at $800B+ assets under management is world's largest defined-contribution retirement plan by participant base; C Fund tracks S&P 500, S Fund tracks Dow Jones US Completion TSM, I Fund tracks MSCI EAFE international, F Fund tracks Bloomberg Aggregate bond index, G Fund invests in special Treasury securities guaranteed not to lose value; FERS employees receive automatic 1% agency contribution plus up to 4% matching for 5% total agency match; COLA: CSRS gets full CPI-W COLA, FERS gets CPI minus 1 percentage point above 2% creating smaller adjustment; FEHB Federal Employees Health Benefits covers 4M+ covered lives with government paying 72% of premiums); FedScope download (opm.gov/data/index.aspx with annual employment data files in CSV format; key fields: AGYSUB agency/subagency code, OCC occupation code, GSEGRD GS grade, PATCO professional/administrative/technical/clerical/other, LOS length of service, SALARY converted pay, TOA type of appointment, LOC state code; PYPLAN pay plan code GS/ES/WG/SL/ST/EX; Python script downloading September employment file, filtering to GS pay plan, computing median GS grade by major agency, plotting age distribution histogram, identifying agencies with fastest workforce growth 2019-2024 using multi-year files).

  75. Q4Writing

    NSF research grant data deep-dive published

    Long-form article on National Science Foundation research funding: NSF overview ($9B+ FY2024 budget, independent federal agency established by NSF Act 1950, funds approximately 25% of all federally funded basic research at US universities excluding life sciences which go to NIH; seven science research directorates: BIO Biological Sciences, CISE Computer and Information Science and Engineering, ENG Engineering, GEO Geosciences, MPS Mathematical and Physical Sciences, SBE Social Behavioral and Economic Sciences, TIP Technology Innovation and Partnerships created 2022; Education and Human Resources EHR directorate for STEM education; Office of Polar Programs OPP for Arctic and Antarctic research); grant types and program structure (Research grants: standard fixed-budget grants, continuing grants with annual increments, renewal grants; CAREER Faculty Early Career Development Program most prestigious junior faculty award: $500k-$600k over 5 years requiring both research and education plans, signal of career trajectory; RAPID Grants for Rapid Response Research for time-sensitive events; EAGER Early-concept Grants for Exploratory Research for high-risk concepts; RUI Research at Undergraduate Institutions recognizing different faculty workloads; MRI Major Research Instrumentation for shared equipment; Research Coordination Networks; NSF Engineering Research Centers ERCs and Science and Technology Centers STCs as 10-year center-scale awards; NEON National Ecological Observatory Network as continental-scale ecological monitoring infrastructure); proposal submission and review (Research.gov portal replaced FastLane 2021 for proposal submission; Proposal and Award Policies and Procedures Guide PAPPG revised annually; dual merit review criteria mandatory since 1997: Intellectual Merit examining scientific quality and potential for discovery, and Broader Impacts examining societal benefit including STEM education, diversity, infrastructure; ad-hoc (mail) review plus panel review by program; typical funding rates: BIO approximately 25%, CISE approximately 20-22%, ENG approximately 25%, SBE approximately 17-20%; approximately 40,000-50,000 proposals per year with approximately 12,000-13,000 funded; NSF program officers have significant discretion within portfolio); NSF award database (research.gov Award Search allows public search of all NSF awards since 1960; key fields: Award Number unique identifier, Title, Principal Investigator and Co-PIs, Institution name and address, NSF Sponsor (program/division/directorate), Start Date, Expected End Date, Obligated Amount total, Abstract; bulk download via research.gov in XML or CSV; Awards API at api.nsf.gov/services/v1/awards.json with query parameters: keyword, principalInvestigatorName, institutionName, fundsObligatedAmtFrom/To, startDateStart/End, awardeeStateCode; 600,000+ awards in database; offset/rpp pagination for bulk retrieval); top institutional recipients (institutional concentration: top 100 universities receive approximately 75% of NSF research funding creating debate about geographic equity; consistently top-funded: MIT, Stanford, University of Michigan, Caltech, University of Illinois, Georgia Tech, UC San Diego, University of Washington; EPSCoR Established Program to Stimulate Competitive Research allocates set-aside funding to 28 states traditionally underrepresented including Alaska, Hawaii, Montana, Mississippi, Puerto Rico; HBCUs and MSIs have targeted programs but remain significantly underrepresented relative to research capacity); GRFP and STEM education (Graduate Research Fellowship Program GRFP: $37,000/year stipend for 3 years, $12,000/year cost-of-education allowance; approximately 2,000 awarded annually from approximately 12,000-16,000 applicants; acceptance rate approximately 13-16%; highly competitive career signal for PhD students; CAREER as the faculty-level equivalent fellowship signal; Research Experiences for Undergraduates REU for undergraduate research at universities; Research Experiences for Teachers RET for K-12 teachers in lab settings; NSF Scholarships in STEM S-STEM for financially disadvantaged undergrads; EHR directorate budget approximately $950M-$1.1B annually); AI and emerging technology (National AI Research Institutes: NSF and partners (NIH/DHS/USDA/DOD) funded 25 AI institutes 2019-2023 totaling more than $200M at universities across the US; TIP Technology Innovation and Partnerships directorate created 2022 as first new directorate in 30 years focused on use-inspired research, translation to practice, national competitiveness; NSF Engines Regional Innovation Engines: 10-year $160M awards to catalyze regional innovation ecosystems in 26 regions — Texas Semiconductor Manufacturing, Appalachian Energy Transition, NY Quantum Computing; CHIPS and Science Act 2022 authorized NSF budget doubling but appropriations have lagged authorization); international science (bilateral agreements with DFG Germany, UKRI United Kingdom, JST Japan, other counterpart agencies enabling collaborative grants; Office of International Science and Engineering OISE managing international programs; International Research Experiences for Students IRES funding US students for research at foreign institutions; FCOI Foreign Conflicts of Interest disclosure requirements: NSF requires disclosure of all foreign financial interests over $5,000 and all foreign affiliations with foreign entities; China Initiative Department of Justice prosecutions 2018-2022 chilled Chinese-American researchers disclosure practices; NSF now uses more systematic disclosure review process); open science requirements (January 2023 NSF Desirable Characteristics of Data Repositories; NSF effective January 25 2025 requires all NSF-funded publications be deposited in an approved public access repository with zero embargo upon publication — stricter than NIH which went to zero embargo in 2025 from 12-month embargo; Data Management Plans required in all proposals since 2011; NSF Award General Conditions require data sharing; Public Access Repository reporting through Research.gov compliance system; replication crisis intersection: NSF SBE directorate funded preregistration and open data initiatives, Psychological Science Accelerator for large-scale replication studies); Python script using NSF Awards API to search for CAREER awards with title containing CAREER over 5-year window, paginating through results, mapping divisionCode to directorate label, computing average award size per directorate, ranking top 20 institutions by total CAREER award count, and printing the top 10 emerging keywords in award abstracts using word frequency.

  76. Q4Writing

    BTS airline on-time performance deep-dive published

    Long-form article on BTS airline statistics: BTS overview (Bureau of Transportation Statistics within DOT, Transtats portal as gateway to aviation/freight/multimodal data, statutory mission as independent statistical agency); ATOP/ASQP Airline On-Time Performance data (mandatory reporting threshold of 1%+ domestic scheduled service, ~10 reporting carriers covering major US airlines, ~6 million flight records per year, fields including UNIQUE_CARRIER, FL_NUM, ORIGIN/DEST, CRS_DEP_TIME, ARR_DELAY_GROUP, all five cause-code delay columns in minutes, CANCELLED with cancellation code A/B/C/D for carrier/weather/NAS/security, DIVERTED indicator); five delay categories (Carrier delay: mechanical failures, crew scheduling, fueling, cabin cleaning, maintenance; NAS National Aviation System: ATC volume decisions, non-extreme weather to NAS standards, runway/airport construction, heavy traffic; Weather delay: severe weather beyond NAS standard including thunderstorms, winter storms, hurricanes; Late Aircraft: cascading delay from same aircraft arriving late from prior leg — largest cause category by minute contribution; Security: terminal evacuations, screening reprocessing); Consumer Air Travel Report (DOT monthly report using ATOP data, tracking on-time rate percentage within 15 minutes of scheduled arrival, cancellation rate, extreme delays over 3 hours domestic, tarmac delays, mishandled baggage per 1,000 passengers, involuntary denied boarding rate by carrier); T-100 domestic and international (Form T-100D for all domestic scheduled/charter service by all carriers, T-100I for international including foreign carriers operating US routes; monthly by carrier-origin-destination: available seat miles ASM as capacity measure, revenue passenger miles RPM as traffic measure, load factor RPM/ASM peaked at 85%+ in 2019, revenue passengers carried, aircraft departures; used for market share, hub concentration, international route analysis); Form 41 carrier financials (quarterly DOT mandatory financial filing; Schedule P-1.2 revenue items for RASM revenue per available seat mile calculation; Schedule T-2 aircraft operations cost data for CASM cost per available seat mile; Schedule F fuel consumption and cost — fuel averaged 20-30% of carrier operating costs 2019-2023, peaked during 2022 energy crisis; Schedule P-12 employment by work group — pilots, flight attendants, ground crew, management; Schedule B-1 balance sheet for leverage analysis; antitrust merger review by DOJ/DOT uses Form 41 for market overlap and competitive effects analysis); COVID airline crisis (April 2020: 96% RPM decline year-over-year, 369M total 2020 passengers vs. 926M in 2019, $35B combined losses for Big Four carriers; CARES Act Payroll Support Program PSP1/PSP2/PSP3 totaling $54B in grants and loans conditional on no involuntary furloughs through March 2021, no buybacks or dividends during restriction period; 2021-2022 recovery uneven with staffing shortages causing operational meltdowns; Southwest Airlines December 2022: schedule collapse when CREW scheduling software failed during winter storm Elliott, 17,000 flights cancelled over 10 days, $140M DOT civil penalty settlement, requirement to update crew scheduling technology); baggage and complaint data (Air Travel Consumer Report monthly: mishandled baggage rate per 1,000 enplaned passengers — peaked at 6-7 in 2007 declined to 2-3 with bag fees incentivizing fewer checked bags, then spiked 2021-2022 from staffing shortages; consumer complaints by category: flight problems 55-60%, baggage 15-20%, reservations/ticketing/boarding 10%; involuntary denied boarding IDB compensation DOT rules require 200-400% of one-way fare up to $1,550 depending on domestic/international and delay duration); tarmac delay rule (3-hour domestic/4-hour international maximum before carrier must allow passengers to deplane; $27,500 per passenger civil penalty for violations; effective December 2009 after JetBlue Valentine Day 2007 left passengers on tarmac 6+ hours in JFK ice storm with no food/water; pre-rule ~700 tarmac delays over 3 hours per year declined to under 30 per year post-rule; DOT enforcement has levied $1.6M+ in penalties against airlines for tarmac delay violations); data access (BTS Transtats portal at transtats.bts.gov with interactive tool and bulk ZIP downloads for ATOP by year; T-100 summary and market tables; Form 41 financial reports; BTS supports OData API for T-100 data; FRED carries aggregate BTS aviation statistics; Python script downloading 12 monthly ATOP ZIPs, filtering by carrier code, computing monthly on-time rate and average arrival delay, dual-axis matplotlib chart with cancellation rate overlay).

  77. Q4Writing

    Federal Reserve Z.1 Financial Accounts deep-dive published

    Long-form article on the Federal Reserve Financial Accounts of the United States (Z.1 release): Z.1 overview (quarterly release by Federal Reserve Board, published ~10 weeks after quarter end, formerly called Flow of Funds Accounts of the United States, modeled as framework for BIS national financial accounts internationally; three table types: F tables for flows of funds, L tables for outstanding levels of financial assets and liabilities, B tables for balance sheets at market value; free download at federalreserve.gov/releases/z1/); sectoral breakdown (seven key sectors: Households and Nonprofit Organizations, Nonfinancial Corporate Business — the largest private borrower sector, Nonfinancial Noncorporate Business — proprietorships/partnerships/farms, Federal Government, State and Local Governments — with pension and OPEB liabilities, Domestic Financial Sectors — commercial banking/insurance/pension funds/money market funds/GSEs, Rest of World — foreign holdings of US assets minus US holdings abroad; each sector has assets and liabilities yielding a net financial position); household net worth (most widely cited Z.1 statistic, from Table B.101 household balance sheet: total assets minus total liabilities; peaked at approximately $156T in late 2021 as both stock market and real estate hit records; declined approximately $8T during 2022 as Federal Reserve rate hikes crushed both equity and bond prices simultaneously — a historically unusual simultaneous decline in both asset classes; recovered through 2023-2024 as equities rebounded despite higher rates; household net worth relative to disposable personal income as measure of wealth effect on consumption — ratios above 7x historically associated with elevated spending and financial fragility); Distributional Financial Accounts DFA (Federal Reserve extension of Z.1 adding distributional breakdowns by wealth percentile; quarterly since Q3 2019 using Survey of Consumer Finances interpolation; top 1% hold approximately 30-31% of total wealth, 90-99th percentile hold approximately 37%, 50-90th hold approximately 30%, bottom 50% hold approximately 3%; post-COVID distributional dynamics: all groups saw net worth gains but top holdings are concentrated in equities which gained faster initially; bottom 50% gained disproportionately in percentage terms from 401k matching and home equity appreciation); flow of funds mechanics (two-sided accounting: every financial asset for one sector is a liability for another; sectoral identity: household saving surplus + business retained earnings + government deficit + rest of world current account surplus must sum to zero; financing gap analysis used by Fed economists to assess when a sector is borrowing faster than saving; capital account flows vs. financial account flows distinction); corporate balance sheets (Nonfinancial Corporate Business Table L.103: equipment and software investment, net equity issuance negative from buybacks exceeding new issuance, corporate bond and bank loan borrowing; corporate debt-to-GDP peaked at approximately 50% in late 2020 from pandemic borrowing; leverage ratios visible in Z.1 before credit events); real estate in the Z.1 (Table B.101 household sector: residential real estate at market value estimated by Fed using CoreLogic and FHFA repeat-sales price indices applied to housing stock; household real estate grew from approximately $25T in 2019 to approximately $43T in 2024 — the largest two-source household wealth gain in the post-COVID era; mortgage debt outstanding approximately $12T+; home equity equals real estate value minus mortgage debt — home equity surged as prices rose faster than new mortgage debt); federal debt position (Federal Government sector L.106: Treasury securities outstanding approximately $26T+, student loan assets approximately $1.7T, GSE equity from conservatorship, net financial position deeply negative; State and Local Government pension fund assets and actuarial liabilities visible — aggregate underfunding became acute 2001-2003 and again 2007-2009); Rest of World sector (Table L.133: US net international investment position — foreign assets held by Rest of World exceed US assets held abroad by approximately $18T; foreign holdings include Treasury securities approximately $7.5T, equities approximately $12T, corporate bonds; US holdings of foreign assets concentrated in equities through multinational subsidiaries); data access (federalreserve.gov/releases/z1/; DDP data files — annual and quarterly downloads; FRED carries all Z.1 series with standardized mnemonics: TNWMVBSNNCB for total household net worth, HNOREMV for household real estate at market value, HHMSDODNS for household mortgage debt outstanding, NCBEILQ027S for corporate nonfinancial business total liabilities, FGSDODNS for federal debt; Python script using fredapi library to download quarterly household net worth and household debt-to-disposable-income ratio, deflate by CPI, compute wealth-to-income ratio, plot with NBER recession shading).

  78. Q4Writing

    Census LEHD longitudinal employment deep-dive published

    Long-form article on Census Bureau Longitudinal Employer-Household Dynamics: LEHD overview (program created at Census Bureau from Unemployment Insurance wage records — quarterly employer-reported earnings for nearly all private workers and most state/local government workers — linked to employer records from QCEW and household records from decennial census/ACS/Medicare/vital statistics; covers approximately 95% of private employment across all 50 states plus DC; state data-sharing partnerships: each state UI agency provides wage records under Title 13 confidentiality protections to Census; distinguishes from QCEW by operating on individual worker-level records enabling demographic and mobility analysis not possible from establishment payroll aggregates); Quarterly Workforce Indicators QWI (Core LEHD published product: quarterly employment counts, beginning-of-quarter employment, full-quarter employment, net job creation, job creation from startups, job destruction from closings, hires, separations, and monthly payroll; simultaneously tabulated by state × county × NAICS industry × worker characteristic; four worker characteristics: sex, age group 14-18/19-21/22-24/25-34/35-44/45-54/55-64/65-99, education level 4-way less-than-HS/HS/some-college/BA+, and race/ethnicity 5-way WNH/BNH/ANH/AIANNH/Hispanic; publication lag approximately 12 months after reference quarter; LEHD Online Data Analysis System LEDAS web tool and QWI API at api.census.gov/data/timeseries/qwi); LODES Origin-Destination Employment Statistics (annual, block-level data released 18+ months after reference year; three file types: WAC Workplace Area Characteristics by census block of workplace, RAC Residence Area Characteristics by census block of residence, OD Origin-Destination matrix linking home census block to work census block with count of workers; OD file enables commuting pattern analysis at block level — identifies job-rich cores, suburb-to-core flows, reverse commuting, and suburban employment clusters; OnTheMap interactive visualization tool at onthemap.census.gov allows geographic selection and commuting distance analysis; block-level data aggregated to any geographic unit by user); Job-to-Job Flows J2J (LEHD product tracking quarter-to-quarter employer transitions; identifies workers who had earnings from employer A in quarter T and employer B in quarter T+1 with no employment gap; measures gross job-to-job mobility rate, destination industry after changing jobs, and earnings changes associated with switching; consistent finding: voluntary job switchers earn 7-10% higher wage growth in switch quarter than job stayers — annual magnitude varies; separates employer-to-employer from employer-to-nonemployment-to-employer using employment gap criterion; the great resignation visible in J2J data as mobility rate peaked 2021-2022 especially in service sectors); business dynamics in LEHD (firm age as primary predictor of net job creation per Haltiwanger, Jarmin, Miranda research using LEHD/LBD linked data; startups — firms age 0 — create all net jobs on average because their job creation exceeds their own mortality rate; high-growth firms gazelles defined as 20%+ annual growth for 3+ years contribute disproportionately to employment growth in any given year; geographic concentration of startups: tech clusters in SF Bay Area, Boston Route 128, NYC visible in LEHD startup employment); race/ethnicity wage structure (QWI breakdown by race/ethnicity allows county-level analysis of wage gaps between demographic groups; education × race × industry wage structure visible through cross-tabulated QWI; immigrant earnings trajectories trackable in cohort analysis using LEHD linked to visa records in experimental linked data); COVID labor market in LEHD (LODES 2021 vs. 2019 OD matrix comparison shows dramatic reduction in long-commute flows reflecting remote work adoption — downtown office-district employment counts fell sharply while suburban residential area workplace counts increased; J2J mobility rate spiked 2021-2022 among younger workers and service sector; sector-level QWI shows leisure/hospitality hires rate peaked 2021-2022 then normalized); OnTheMap and LEHD Explorer tools (OnTheMap at onthemap.census.gov: polygon or point selection of area, OD analysis showing top-10 origin states/counties/cities of workers, commute distance distribution histogram, industry breakdown of jobs within area, longitudinal trend for selected area; LEHD Explorer for QWI time series by custom geography/industry/characteristic cross-tabulations; Census Data API for QWI at api.census.gov/data/timeseries/qwi with variables for all QWI metrics and facets); LEHD vs. other programs (QCEW: same UI wage record source but processed as establishment-level quarterly payroll aggregates without individual worker linkage or demographic characteristics; CES: monthly sample-based employment counts, earlier release but no demographic or mobility detail; ACS: survey-based labor force characteristics with demographic depth but no longitudinal tracking or job transitions; LEHD uniquely links individual worker to employer to geographic location enabling mobility, commuting, and demographic wage analysis impossible from other sources; confidentiality protection via noise infusion and suppression for small cells while preserving aggregate distributional patterns); data access (lehd.ces.census.gov; QWI API at api.census.gov/data/timeseries/qwi with full variable and geography documentation; LODES bulk flat-file download at lehd.ces.census.gov/data/lodes/ with state-level ZIP files for WAC/RAC/OD; OnTheMap at onthemap.census.gov; FSRDC Federal Statistical Research Data Centers for restricted microdata access; Python script using Census QWI API to download county-level employment for NAICS 23 construction industry workers age 25-34 for 2019/2022/2024, computing county-level recovery index, and ranking counties by young construction worker employment growth).

  79. Q4Writing

    BEA Regional Accounts state/county economic data deep-dive published

    Long-form article on BEA Regional Accounts: Regional vs. National distinction (BEA regional program allocates national NIPA control totals to sub-national geographies using indicator data — wage and salary data from BLS QCEW, farm receipts from USDA, dividends and interest from IRS SOI, transfer payments from Treasury/SSA/HHS; four distinct products: SAGDP/SQGDP GDP by State annual and quarterly, SAINC Personal Income by State quarterly, CAINC Personal Income by County annual, MAGDP GDP by Metropolitan Statistical Area annual; no BEA County GDP product — only personal income at county level); GDP by State (annual back to 1997, quarterly back to 2005; NAICS industry detail to 2-digit in annual; top states by total GDP: California $3.5T, Texas $2.2T, New York $2.0T, Florida $1.4T, Illinois $900B; post-COVID growth leaders: Texas and Florida 15-20% real growth 2020-2023 from population/business migration and energy; California nominal GDP largest but real growth slower from housing costs and domestic outmigration; oil-state boom-bust cycles visible: North Dakota GDP grew 80%+ from 2007-2014 Bakken shale development, declined 2015-2016 when WTI dropped to $26, recovered post-2021 Permian expansion; GDP per capita by state: Connecticut and Massachusetts at approximately $90k-95k, New York $85k, Washington state $80k driven by tech; Mississippi approximately $45k, West Virginia approximately $42k — representing a 2:1 interstate income gap); Personal Income by State (quarterly SAINC1 table with five income components: wage and salary disbursements from all employers, supplements including employer healthcare/retirement contributions, proprietors income including farming, property income including dividends/interest/rental, transfer payments including Social Security/Medicare/unemployment/veterans; subtracting employee social insurance contributions yields personal income total; the 2020-2021 transfer payment surge visible as unprecedented expansion in transfer payment component — exceeded wage income growth in many rural states; state-to-state migration income effects: Florida and Texas gained high-income households from California and New York creating per-capita income convergence — IRS migration data shows $39B in AGI moved from California 2020-2023); Personal Income by County (CAINC1 table: annual per-capita and total personal income for all 3,100+ counties; released approximately 18 months after reference year; income components available at county level via CAINC30 table series; identifies boom counties: Midland/Ector TX Permian Basin counties exceed $120k per capita during oil price spikes; San Francisco/San Mateo CA tech wealth concentration; agricultural commodity county income patterns — corn belt counties spike with crop price; Great Plains counties have high proprietors income from farm operators); GDP by MSA (MAGDP table: approximately 380 Metropolitan Statistical Areas, annual; largest metro economies: NYC-Newark-Jersey City MSA approximately $2T+, Los Angeles approximately $1.1T, Chicago approximately $700B, Dallas-Fort Worth approximately $600B, San Francisco-Oakland approximately $600B; top-10 MSA concentration at approximately 50% of US GDP; metro specialization: Las Vegas concentrated in arts/entertainment/recreation 25%+ of GDP, Houston in mining/oil 18%+, San Jose in information/tech 35%+, Orlando in tourism; metro-rural divergence post-COVID: rural counties gained population from remote work but productivity and GDP per capita remain far below metro average); state income tax policy connection (BEA personal income is the economic base for state income tax; California top marginal rate 13.3% vs. Texas/Florida/Nevada/Wyoming 0%; SALT federal deduction cap at $10,000 since TCJA 2017 effectively raises cost of living in high-tax states; IRS Statistics of Income migration data shows net out-migration of adjusted gross income from CA/NY/IL to TX/FL/AZ/TN 2020-2024; BEA personal income growth rates by state lag migration-adjusted changes by 1-2 years); transfer payments in regional accounts (SASUMMARY table and CAINC35 component tables show Social Security, Medicare, Medicaid, unemployment insurance, veterans benefits, SNAP, earned income tax credit by state; red state vs. blue state federal transfer receipt: Mississippi, West Virginia, Alabama, and Kentucky receive 30-40%+ of personal income from transfers; COVID transfer explosion 2020: unprecedented $4.6T in total federal payments including stimulus checks, enhanced unemployment, PPP visible as 50%+ spike in transfer component 2020-2021; 2022-2023 unwinding visible in BEA transfer data lagged by CAINC release timing); energy state boom-bust (BEA mining GDP by state captures oil/gas production cycles precisely: North Dakota SAGDP mining sector grew from $3B 2007 to $15B 2014 then collapsed to $5B 2016 as WTI hit $26; Wyoming coal mining GDP declined persistently 2012-2023; Texas mining GDP most volatile — Permian Basin drove Texas to briefly exceed California in GDP growth 2021-2022; New Mexico Permian Basin GDP grew 30%+ 2021-2022 making it fastest-growing state economy that year; Alaska Permanent Fund Dividend shows as transfer payment in personal income each October); BEA Regional API (apps.bea.gov/api/ with Regional dataset; key parameters: TableName (SAGDP1 for state GDP total, SAGDP2D for state GDP by industry, SAINC1 for state personal income, CAINC1 for county personal income, MAGDP1 for MSA GDP); LineCode (industry code or income component within table); GeoFips (two-digit state FIPS or five-digit county FIPS or MSA codes); Year as comma-separated list or ALL for full history; BEA FRED carries all state series with mnemonics like TXNGSP for Texas nominal GSP; bea.gov/itable interactive data tool for visual exploration; Python script registering free BEA API key, calling CAINC1 for all states 2010-2024, computing cumulative per-capita personal income growth, ranking states, identifying top-5 growth states vs. national average of approximately 40% real gain, and flagging COVID transfer payment spike using LineCode 20 transfer payments component).

  80. Q4Writing

    USDA NASS crop survey data deep-dive published

    Long-form article on USDA National Agricultural Statistics Service: 400+ surveys per year since 1867 reaching 3 million respondents, mandatory reporting authority backed by statute; Crop Production report (monthly August-November, final in January, acreage times yield per acre for corn/soybeans/winter wheat/spring wheat/cotton/rice/sorghum/sunflowers); geographic hierarchy (nine crop reporting districts per state, state summaries, national aggregates; QuickStats agg_level_desc parameter); WASDE World Agricultural Supply and Demand Estimates produced by WAOB World Agricultural Outlook Board (distinct from NASS, uses NASS acreage/yield estimates as inputs, adds usage/stocks/trade/price data for global supply-demand balance sheets; lockup procedures matching BLS — no early access; stocks-to-use ratio as the key market signal); commodity market impact (CME corn and soybean futures tick on every NASS release, August-November Crop Production reports especially volatile as they refine growing-season yield; 2012 drought: Iowa yield 147 bu/acre vs. 170 trend, corn at $8.49/bushel and soybeans above $17 as NASS sequentially cut forecasts); QuickStats API parameters table (commodity_desc, statisticcat_desc, unit_desc, domain_desc, agg_level_desc, state_alpha, year, freq_desc; 50,000 record limit; bulk download at quickstats.nass.usda.gov; (D) suppression for confidentiality); livestock surveys (Cattle on Feed monthly — placements, marketings, total on feed; Hogs and Pigs quarterly — breeding herd and market hog inventory, farrowing intentions; Poultry Production weekly slaughter and monthly layer inventory); agricultural prices (Prices Received monthly average price for 50+ commodities, Prices Paid inputs index, parity ratio as historical farm income measure); Crop Progress weekly (Monday 3 PM release, 18 crops, developmental stage columns — emerged/silking/dough/dented/mature for corn, blooming/setting bolls/opening for cotton; Good/Excellent ratings are primary indicator traders watch; state-level breakdown from surveyed farmers); Python script using NASS QuickStats API to download state-level corn yield per acre for top 5 producing states over 20 years and plot time series.

  81. Q4Writing

    EIA energy data deep-dive published

    Long-form article on Energy Information Administration data: EIA statutory independence from DOE policy arm (established by EPCA 1976, publishes data independently of administration policy; mandatory survey authority with civil penalty enforcement for non-reporters; mission is statistics not advocacy); Short-Term Energy Outlook STEO (monthly forecast for WTI/Brent crude, Henry Hub natural gas, regular gasoline/diesel, electricity retail prices, US crude production, natural gas production, renewable generation; OPEC+ interactions — EIA demand forecasts vs. OPEC production targets; SPR Strategic Petroleum Reserve release effects on WTI; STEO as political document with administration optics dimension); Weekly Petroleum Status Report WPSR (published every Wednesday 10:30 AM Eastern by EIA; most market-moving federal statistical release after jobs report; Cushing Oklahoma crude inventory — delivery point for NYMEX WTI futures contracts; $1-2/barrel price sensitivity per 1M barrel surprise; refinery utilization rate; total petroleum stocks including gasoline/distillate/kerosene-jet; SPR inventory; US crude production back-calculated from API gravity); Natural Gas Storage Report (weekly Thursday 10:30 AM; EIA-914 survey of storage operators; five regions: East, Midwest, Mountain, Pacific, South Central; working gas vs. base gas; 2022 European energy crisis drove Henry Hub from $3/MMBtu to $9 in August; seasonal heating degree day relationship); EIA-860 power plant database (mandatory annual survey of all electric generators 1 MW or larger; ~15,000 generators; status codes OP operating/SB standby/RE retired/P planned/U under construction; Plant Code as persistent identifier linking EIA-860 to EIA-923; generator characteristics: nameplate capacity, prime mover technology, energy source code, state; planned retirements — coal capacity largest near-term retirement wave; planned additions — utility-scale solar and battery dominating 2023-2027 queue); EIA-923 fuel receipts survey (monthly fuel consumption, generation, and heat rates for electric generators; plant level data linked to Form 860; annual heat rate by technology documenting combined-cycle CCGT efficiency gains vs. older combustion turbines); Electric Power Monthly EPM (generation by fuel type, retail sales by sector, average retail electricity prices, capacity factors by technology; definitive archive of US electricity generation fuel mix transition from coal to gas to renewables); Petroleum Supply Monthly PSM (state-level crude oil production, crude imports by country of origin, refinery processing gains, product supplied as demand proxy); EIA Open Data API v2 (500,000+ series; v2 faceted structure with route/frequency/data/facets/sort; v1 legacy series IDs work via compatibility endpoint; key series: PET.RWTC.W WTI weekly, NG.RNGWHHD.W Henry Hub weekly, PET.W_EPC0_SAX_YCUOK_MBBL.W Cushing stocks; pagination via offset; registered API key from eia.gov); Python script using EIA v2 API to pull Cushing crude stocks and Henry Hub weekly prices, create dual-axis chart, annotate 2022 European crisis spike.

  82. Q4Writing

    Census building permits and housing starts deep-dive published

    Long-form article on Census Bureau housing construction data: three linked releases (Building Permits Survey BPS monthly, New Residential Construction joint Census/HUD release with housing starts and completions, New Residential Sales joint Census/HUD release with contract-signed timing; approximately 1-month permit-to-start lag and 6-month start-to-completion lag for single-family homes creating leading indicator value); BPS survey methodology (~20,000 permit-issuing jurisdictions covering 96% of US construction activity; non-permit areas estimated separately via start surveys; SAAR seasonally adjusted annual rate using Census X-13 ARIMA seasonal factors; geographic breakdown to MSA/county/place level from Census Building Permits data system; monthly data published around 19th of following month); Housing starts methodology (~900 SOC survey of construction areas; field agents physically visit sites; SFH single-family starts vs. multifamily 2-4 unit and 5+ unit distinction; permits-to-starts ratio varies by geography — non-permit rural areas have starts without permits; quarterly revision cycle based on additional survey data); historical context (2006 peak 2.07M SAAR all housing starts; 2009 trough 554K as mortgage crisis foreclosures overwhelmed new demand; 2020-2021 COVID surge to 1.8M driven by remote work migration, low-rate frenzy, lumber panic; 2022-2023 pullback as 30-year fixed mortgage went from 3.0% to 7.8% by October 2023 — fastest rate increase since 1980; 2024 SFH stabilization despite rates as lock-in effect created severe existing home supply shortage); SFH vs. multifamily bifurcation (2022-2023 divergence: SFH pulled back sharply with rate shock while 5+ unit multifamily held relatively strong through 2022 as rental demand remained elevated from unaffordable for-sale market; missing middle 2-4 unit historically underproduced due to zoning barriers); geographic data (HUD SOCDS State of the Cities Data Systems with building permit data back to 1980; MSA-level starts from New Residential Construction; Sun Belt construction concentration: Texas 15-18% of national SFH permits, Florida 10-12%; permit growth rates lagged in Midwest/Northeast Rust Belt; mountain states surge from remote work migration 2020-2022); New Residential Sales (Census monthly survey of builders and selling agents; contract-signed timing unlike NAR existing-home sales which use closing date — typically 30-60 days earlier; median new home price by region; months supply as supply/demand indicator; cancellation rate important for builder earnings signal in 2022 rate shock); Fed policy connection (30-year mortgage rate tied to 10-year Treasury + spread; rate transmission to housing: faster than most sectors because monthly payment math is immediate; mortgage lock-in effect — homeowners at 2.5-3.0% rates unwilling to sell into 7%+ market creating existential supply shortage; Beige Book uses housing starts as regional economic barometer; permits as Conference Board Leading Economic Index component); lumber and materials (CME random length lumber futures trading at $1,700/MBF in 2021 — more than quadruple pre-pandemic level — from mill closures in 2020 and demand surge; normalized to $400-500 range 2022-2023 as supply recovered; lumber represents 15-18% of SFH construction cost; concrete, copper, drywall as other key inputs); data access (Census Bureau BPS API at api.census.gov/data/timeseries/bps with unittype and geo_type parameters; FRED series: PERMIT total, HOUST total starts, HOUST1F SFH starts, HOUST5F 5+ unit starts; HUD SOCDS for historical MSA data; Python script using Census BPS API to download state-level permit data, compute 12-month rolling average, rank states by permit growth rate).

  83. Q4Writing

    BLS OEWS occupational employment wages deep-dive published

    Long-form article on BLS Occupational Employment and Wage Statistics program and Employment Projections: OEWS survey mechanics (1.1M establishment survey mailed semiannually in May and November; BLS uses 3-year rolling pooling creating ~3.3M total observations for publication; sampling stratified by NAICS industry × size class × state ownership type; state agencies collect under BLS contract; excludes self-employed, farm workers, private household workers, unpaid family workers; covers all private industry and all government levels); OEWS data structure (key fields: OCC_CODE SOC code, OCC_TITLE, TOT_EMP total employment, EMP_PRSE employment percent relative standard error, H_MEAN/A_MEAN hourly/annual mean wage, H_PCT10 through H_PCT90 wage percentiles, WAGE_ENT entry-level proxy 3rd quartile within job zone, WAGE_EXP experienced-level proxy, LOC_QUOTIENT employment location quotient vs. national share; geographic levels: national all-industries and by sector, state, 590+ MSAs, BOS balance of state areas); Standard Occupational Classification SOC (23 major groups, 98 minor groups, 459 broad occupations, 867 detailed occupations in 2018 SOC; six-digit codes XX-XXXX format; O*NET crosswalk adds task/skill/knowledge/ability descriptors at detailed level; Labor Market Information Institute LMI uses SOC as standard across BLS data); key OEWS findings (top-paying: surgeons $250,000+ mean annual, anesthesiologists, psychiatrists, oral surgeons, airline pilots $260,000+, chief executives; largest by employment: retail salespersons 4.5M, fast food and counter workers 3.5M, registered nurses 3.2M, home health aides 3.0M, stockers 3.1M); Employment Projections EP 2022-2032 (BLS biennial 10-year employment outlook by detailed occupation; methods: industry output projections using macroeconomic models → industry employment using IO ratios → occupational employment using staffing pattern matrices → occupational openings accounting for separations; fastest-growing occupations: home health and personal care aides +924,000 (+22%), nurse practitioners +123,000 (+46%), solar photovoltaic installers +52%; fastest-declining: word processors -68%, telephone operators -56%, cashiers -216,000 from self-checkout automation); Occupational Outlook Handbook OOH (BLS public-facing career guide at bls.gov/ooh; six-section profile: what they do, work environment, how to become one, pay, job outlook, similar occupations; used by high school counselors and workforce development programs; links to OEWS data; 2022-2032 OOH projections); O*NET occupational information network (DOL-sponsored database at onetonline.org; occupational data: tasks, tools/technology, work activities, skills, knowledge, abilities, work context, work values, job zones indicating typical education/experience; O*NET-SOC crosswalk not perfectly 1:1; O*NET Content Model as framework; skills-based hiring matching employers to O*NET skill vectors); wage inequality applications (90th-to-10th percentile ratio as within-occupation inequality metric; CPS microdata for demographic decomposition — race, gender, education wage gaps; minimum wage workers concentrated in food preparation and personal care at bottom of OEWS distribution; Autor Katz Kearney routine-biased technical change framework using OEWS/OES data to document occupational polarization); data access (FTP flat files: oe.data.0.All and oe.data.1.AllData at download.bls.gov; API series ID format for specific geographies OEUS00000000000000000ALLINDUSTRY; annual May tables at bls.gov/oes/tables.htm; Python script downloading national OEWS ZIP, filtering to SOC 29-xxxx and 31-xxxx healthcare occupations, scatter plot of median wage vs. employment with annotated labels for surgeons, RNs, MDs, pharmacists, physical therapists).

  84. Q4Writing

    FHWA highway infrastructure data deep-dive published

    Long-form article on FHWA federal highway datasets: National Bridge Inventory NBI (620,000+ public road bridges, mandatory biennial AASHTO NBIS inspection, nine-level element condition ratings 0-9 for deck/superstructure/substructure, NBI Sufficiency Rating 0-100 composite weighted 55% structural adequacy / 30% serviceability / 15% essentiality, structurally deficient vs. functionally obsolete distinction — SD means deterioration requiring attention not imminent collapse risk, Francis Scott Key Bridge context — catastrophic failure due to ship strike not structural deficiency), Highway Performance Monitoring System HPMS (state-reported pavement condition using International Roughness Index IRI measured by laser profilometers at highway speed, Good/Fair/Poor thresholds: NHS interstate Good <95 IRI, Fair 95-170, Poor >170 inches/mile; 900,000+ road segments, annual state submissions, acknowledged inconsistency in state measurement practices), traffic monitoring (Annual Average Daily Traffic AADT from continuous ATR count stations expanded to full network using short-count factors, WIM weigh-in-motion stations for vehicle weight distribution, VMT computation: 3.2 trillion annual vehicle miles traveled), Highway Statistics annual publication (road mileage by functional class/jurisdiction/surface type, registered vehicles and licensed drivers, fuel consumption and gas tax revenues, motor carrier data), federal-aid structure (47,856 Interstate miles, 161,000 NHS miles, Federal Lands Highway, NHPP National Highway Performance Program formula funding), IIJA 2021 Infrastructure Investment and Jobs Act $40B bridge repair funding (first dedicated bridge formula program, $225M per year minimum, focus on structurally deficient NHS bridges), Highway Trust Fund solvency crisis ($0.184/gallon gas tax frozen since 1993 creating chronic shortfall, EVs bypass gas tax entirely creating revenue gap, IIJA $118B general fund transfer as stopgap, NEVI EV charging infrastructure $7.5B program), Freight Analysis Framework FAF commodity-flow origin-destination matrices by mode (truck/rail/water/air/pipeline), VIUS Vehicle Inventory and Use Survey, border crossing freight data, Python script downloading state NBI CSV, identifying structurally deficient bridges, binning by sufficiency rating, plotting bar chart.

  85. Q4Writing

    BLS Current Employment Statistics monthly jobs report deep-dive published

    Long-form article on the BLS Current Employment Statistics program: two surveys released simultaneously on Jobs Friday (first Friday of each month at 8:30 AM Eastern), the Establishment Survey CES (580,000+ business establishments and government agencies sampled from UI records/QCEW, stratified by industry-state-size, response rate approximately 60% preliminary improving to 90%+ with subsequent revisions, source of nonfarm payroll employment headline), the Household Survey CPS (60,000 households per month, source of unemployment rate U-3 and labor force participation rate), definitional distinction causing the two surveys to often diverge (someone working even one hour/week is employed in CPS; CES counts jobs not workers so multiple jobholders appear multiple times; self-employed included in CPS but excluded from CES), net birth/death model (BLS model to account for establishment births and deaths between QCEW benchmarks, adds/subtracts estimated jobs from uncounted new/closed businesses, controversial in turning points like 2008 and COVID), key metrics table (total nonfarm payroll, private vs. government, 11 supersectors, average hourly earnings AHE, average weekly hours, diffusion index), three-tier revision cycle (preliminary → first revised → second revised with two subsequent monthly updates, then annual benchmark revision each February using complete QCEW), annual benchmark revision mechanics (BLS replaces CES sample-based estimates with nearly complete count from QCEW Unemployment Insurance records; January 2024 benchmark reduced 12 months of estimates by 818,000 jobs — the largest downward revision since 2009; financial markets moved on release of preliminary benchmark in August 2023), X-13ARIMA-SEATS seasonal adjustment (US Census Bureau / Bank of Canada joint program replacing X-12-ARIMA; models calendar effects, trading days, moving holidays; both seasonally adjusted and not seasonally adjusted series published; year-over-year comparisons can use NSA; debate about whether pandemic disrupted seasonal factors 2020-2022), industry dynamics (healthcare consistently adds jobs through economic cycles and adds millions post-COVID; government employment cycles around elections and census; manufacturing structural decline since 1980 — 19.4M peak 1979 to 12.8M today; leisure/hospitality most volatile — COVID lost 8.2M in two months, recovered by 2023), COVID collapse and recovery (April 2020: -20.5M jobs single month, largest one-month loss in recorded history; by July 2022 all jobs recovered on count basis but composition shifted dramatically to services and remote work; "great resignation" quit rate peaked 3.0% April 2022 from JOLTS complementary data), market impact of 8:30 AM release (front-running not possible for early access since BLS lockup ended 2012; Fed watches for labor market cooling as indicator of rate policy; 10-year treasury yield moves 5-15 bps on surprise; equity market volatility highest Jobs Friday mornings; ADP employment report released prior Wednesday as private-sector preview with frequent divergence from official data), Python script using BLS API to download CES0000000001 total nonfarm and CES7000000001 leisure/hospitality, compute 12-month gains, plot recession-bar chart with NBER recession shading.

  86. Q4Writing

    SEC EDGAR XBRL financial statement database deep-dive published

    Long-form article on SEC EDGAR XBRL machine-readable financials: XBRL mandate history (phased 2009-2011 by filer size, all US public companies now required, ~7,000 active filers, transition from HTML-only to inline iXBRL since 2020 where XBRL facts are embedded directly in HTML filing), US-GAAP taxonomy (FASB maintains taxonomy of 17,000+ concept elements with labels/references/definitions/calculation relationships; three key namespaces: us-gaap for financial statement concepts, dei document and entity information, srt SEC reporting taxonomy for industry-specific items; key income statement concepts: us-gaap:Revenues, us-gaap:CostOfGoodsSoldExcludingDepreciationDepletionAndAmortization, us-gaap:GrossProfit, us-gaap:OperatingExpenses, us-gaap:OperatingIncomeLoss, us-gaap:NetIncomeLoss, us-gaap:EarningsPerShareBasic; balance sheet: us-gaap:Assets, us-gaap:Liabilities, us-gaap:StockholdersEquity, us-gaap:CashAndCashEquivalentsAtCarryingValue, us-gaap:Goodwill, us-gaap:LongTermDebt; cash flow: us-gaap:NetCashProvidedByUsedInOperatingActivities; dei namespace: dei:EntityCentralIndexKey, dei:EntityCommonStockSharesOutstanding), Company Facts API (data.sec.gov/api/xbrl/companyfacts/{CIK}.json returns all XBRL-tagged facts for a company across all filings; structure: companyfacts object with cik/entityName/facts; facts organized by namespace→concept→units→array of fact values; each fact: accn (accession number), end (period end date), start (for duration facts), val (numeric value), filed (filing date), form (10-K/10-Q/etc.), frame (CY2023Q4I for instant, CY2023Q4 for duration — I = instant measurement like balance sheet date)), Company Concept API (data.sec.gov/api/xbrl/companyconcept/{CIK}/us-gaap/{concept}.json returns time series of a single concept for one company — preferred for clean longitudinal analysis of one metric), Frames API (data.sec.gov/api/xbrl/frames/us-gaap/{concept}/{unit}/{period}.json returns cross-sectional data: all companies reporting that concept for a specific period in one response; enables sector aggregation, peer comparison, screening; period codes: CY2023Q4 for calendar year Q4, CY2023 for annual), data quality challenges (custom extension elements: companies create new XBRL concepts when no standard fits, approximately 30% of all facts are custom extensions making cross-company comparison unreliable for those items; taxonomy concept changes: ASC 606 revenue recognition in 2018 caused mass concept renaming requiring linkage logic; dimensional data: segment and product line data uses XBRL dimensions adding complexity; different fiscal year ends creating calendar-year alignment issues; zero values vs. truly missing), analytical applications (valuation multiple screening using Frames API for P/E, EV/EBITDA across all Russell 1000; Beneish M-score fraud detection — all eight components available in XBRL; time-series financial modeling with concept continuity; ESG data extraction from disclosure items; academic event studies using standardized financials), bulk data access (SEC quarterly financial statement data sets ZIP at sec.gov/dera/data: num.txt financial facts, sub.txt filing submissions, pre.txt financial statement presentation structure, cal.txt calculation relationships; rate limits: 10 requests/second maximum, User-Agent header required identifying application), Python script fetching Apple CIK 0000320193 Company Facts, extracting annual revenue and net income from 10-K filings, deduplicating restatements, printing margin table, creating dual-axis matplotlib chart.

  87. Q4Writing

    CMS Skilled Nursing Facility data deep-dive published

    Long-form article on CMS Care Compare SNF quality data: five-star composite rating system (health inspection stars weighted 50%, staffing stars 30%, quality measure stars 20% for overall star), the 3x4 scope/severity A-L deficiency grid (Immediate Jeopardy at J-L triggering mandatory civil money penalties starting $3,050/day), Payroll-Based Journal PBJ staffing system replacing self-reported data in 2016 with five HPRD metrics (RN, LPN, CNA, total nurse aide, total nursing), COVID-19 nursing home crisis (170,000+ deaths in nursing homes equaling 38% of early US COVID deaths, NHSN weekly mandatory reporting, targeted infection control surveys), Minimum Data Set MDS Resident Assessment Instrument driving both quality measures and PDPM reimbursement (five payment components: PT, OT, SLP, nursing, non-therapy ancillaries), private equity ownership transparency via PECOS beneficial ownership disclosures and research showing 5-10% worse outcomes under PE ownership, SNF star ratings effect on Medicare Advantage preferred provider networks and ACO contracts, and a Python script to download CMS Care Compare CSV files and compute state-level star rating distributions and deficiency rates.

  88. Q4Writing

    BLS occupational injuries SOII deep-dive published

    Long-form article on the BLS Survey of Occupational Injuries and Illnesses: SOII scope (~230,000 establishment survey, BLS/state partnership, OSHA 300 Log as basis), two data streams (summary incidence rates by industry/case type vs. case-and-demographic microdata with injury characteristics), OSHA recordkeeping requirements (Form 300 Log/300A Summary/301 Incident Report, recordability thresholds: work-related injury requiring more than first aid or resulting in DAFW/DJTR/other recordable), Total Recordable Incidence Rate formula (number of cases times 200,000 hours divided by total hours worked), industry variation table (construction 3.7/100 FTE, healthcare 4.8, transportation/warehousing 4.5, finance/insurance 0.6), musculoskeletal disorder supplement and ergonomics standard repeal 2001, Census of Fatal Occupational Injuries CFOI as separate fatal census using multi-source verification (~5,500/year, construction fatal four: falls/struck-by/caught-in/electrocution, logging and fishing highest rates), the pervasive underreporting problem (academic research shows SOII captures only 40-69% of actual injuries due to employer financial incentives and worker fear), NAICS sampling design and staffing agency misclassification, and a Python BLS API script to download TRIR series and compare incidence rates across major industries.

  89. Q4Writing

    EPA Air Quality System deep-dive published

    Long-form article on the EPA Air Quality System: AQS monitor network overview (4,000+ sites, cooperative federalism with states/locals/tribes, hourly to annual data since 1980), six criteria pollutants framework (PM2.5 primary NAAQS 9 ug/m3 annual 2024, PM10, ozone 70ppb 8-hour, CO 35ppm 1-hour, SO2 75ppb 1-hour, NO2 100ppb 1-hour) with NAAQS primary vs. secondary standards, FRM vs. FEM monitor equivalence and SLAMS/NAMS/PAMS network types, Air Quality Index 0-500 scale (six categories Good through Hazardous, worst-of-pollutants daily AQI calculation), nonattainment designation process and State Implementation Plan requirements with NAAQS political revision cycle (2020 Trump freeze, 2023 Biden 9 ug/m3 tightening), data quality (QA codes, 75% completeness threshold, AQS vs. AirNow real-time vs. PurpleAir low-cost sensor accuracy gaps), health burden analysis (Harvard Six Cities Dockery 1993 study, Pope ACS cohort, BenMAP model, 100,000+ annual PM2.5-attributable deaths), environmental justice monitoring gaps correlated with race/income and EPA EJScreen complement, wildfire smoke exceptional events provisions and 2020-2023 western US AQI events overwhelming attainment, and a Python EPA AQS API script to download California daily PM2.5 readings and identify NAAQS exceedance days.

  90. Q4Writing

    HUD Point-in-Time homeless count deep-dive published

    Long-form article on HUD annual Point-in-Time homeless count: PIT methodology (last 10 days of January, 400+ CoC regions, mandatory since 2005, sheltered count from HMIS vs. unsheltered street count using teams of trained volunteers), 2023 scale (653,100 total: highest since reporting began, up 12% from 2022, California 28% of national total, ~40% unsheltered), Continuum of Care system (geographic organizational unit, McKinney-Vento grant competition, Coordinated Entry requirement), Homeless Management Information System HMIS (longitudinal individual-level tracking of shelter entries/exits/services, HUD data standards, returns to homelessness within 6/12/24 months as primary system performance measure, de-identification challenges for research), veteran homelessness (37,000+ veterans in 2023 PIT, HUD-VASH Housing Choice Voucher for veterans, VA HCHV Community Homelessness Assessment Local Education and Networking Groups outreach), chronic homeless definition (12+ consecutive months or 4+ episodes totaling 12 months with disabling condition), methodological limitations (one-night January snapshot affected by weather, volunteer counting variation, literal homelessness definition excluding doubled-up and couch-surfing, HUD-commissioned undercount research), COVID impacts (2021 count disruption, Project Roomkey hotel/motel isolation vouchers, 2023 surge from eviction moratorium expiration and inflation), Housing First evidence (rapid rehousing vs. transitional vs. PSH cost-effectiveness research, Finland Y-Foundation 70% chronic homeless reduction), and a Python script to download HUD Exchange PIT CSVs, aggregate by state, and compute per-capita homeless rates joined to Census population estimates.

  91. Q4Writing

    FAA aviation safety data deep-dive published

    Long-form article on federal aviation safety databases: four-database ecosystem (NTSB Aviation Accident Database covering every civil aviation accident since 1962 with probable cause analysis; FAA Accident/Incident Data System AIDS covering all accidents and incidents reported to FAA; Aviation Safety Reporting System ASRS operated by NASA since 1976 — voluntary confidential non-punitive near-miss and safety concern reports with 10-day enforcement immunity window creating a unique repository of normally-unreported safety information; FAA Wildlife Strike Database with voluntary strike reports); NTSB investigation process (mandatory reporting requirements for accidents and certain incidents; investigation hierarchy — NTSB major investigation vs. field investigation vs. accident brief based on severity and public interest; probable cause determined after investigation typically 12-18 months; factors: flight crew, weather, aircraft, ATC, airport, organizational, environmental; general aviation accidents 1,200-1,500/year with 350-400 fatalities; commercial Part 121 scheduled airlines: essentially zero US carrier fatal accidents 2009-2024 with exceptions); accident taxonomy (CFIT controlled flight into terrain; LOC-I loss of control in flight; VFR into IMC; midair collision; runway excursion; runway incursion; mechanical failure; weather; phase of flight distribution: approach and landing most fatal accidents, takeoff second, en route third; pilot error as probable cause 70%+ of general aviation accidents); Boeing 737 MAX crisis (Lion Air JT610 October 29 2018: 189 fatalities; Ethiopian Airlines ET302 March 10 2019: 157 fatalities; MCAS Maneuvering Characteristics Augmentation System: single AoA sensor input without crew knowledge or training; FAA Order 8110.4 certification delegation — Boeing Organization Designation Authorization allowed Boeing employees to self-certify; House Transportation Committee investigation: FAA management overrode safety engineers; FDR CVR data publicly released after NTSB investigation); ASRS confidentiality design (NASA acts as independent third party from FAA enforcement; reporter submits within 10 days, receives de-identified confirmation; FAA waiver for non-criminal first offense; report database at asrs.arc.nasa.gov searchable by hazard type, phase of flight, contributing factor; 1M+ reports; identified systemic problems before accidents: G-AWST runway incursion precursors, go-around non-standard procedures); runway incursions 2022-2023 spike (Category A: collision narrowly avoided; Category B: significant potential for collision; B and C categories increased 2022-2023 reaching 1,700+ incursions; FAA Safety Summit February 2023 after JFK taxi incident; ASAP Aviation Safety Action Programs: voluntary safety reporting at each airline under FAA oversight — identified similar taxonomy of runway hot spots); wildlife strikes (17,000 strikes reported 2022 — substantially underreported; Canada Goose double engine strike US Airways 1549 January 15 2009 at 2,800 feet; Smithsonian Institution National Museum of Natural History Feather Identification Lab: uses microscopy and DNA to identify bird species from remains — essential for engineering mitigation; database at wildlifestrike.faa.gov; height distribution: 90% occur below 3,500 feet; species: bird 97% of strikes, gulls/doves/sparrows/waterfowl most common by frequency, waterfowl/raptors most damaging); FAA Civil Aviation Registry (N-number database: every civil aircraft registration in the US since approximately 1930; ~200,000+ active registrations; fields: N-number, serial number, make/model/year, airworthiness certificate type standard/limited/experimental/primary, registrant name and address, aircraft category, aircraft type; bulk download at faa.gov; used for fleet age analysis — average US commercial airline fleet 15+ years; average general aviation aircraft 30+ years; experimental category covers homebuilt, exhibition/air race, and research aircraft); Python script downloading NTSB bulk accident CSV, filtering to Part 91 general aviation fixed-wing past 5 years, grouping by phase of flight, computing fatal accident frequency per phase, and cross-references to nhtsa-fars-fatality-data, bts-border-crossings, and fmcsa-safety-ratings.

  92. Q4Writing

    NRC nuclear safety data deep-dive published

    Long-form article on Nuclear Regulatory Commission data: NRC overview (established 1975 from AEC split after concern that AEC both promoted and regulated nuclear power; regulates 99 operating power reactors at ~60 plants as of 2024, nuclear materials licenses, low-level waste disposal, high-level waste interim storage; four regional offices: Region I Atlanta/southeast, Region II Chicago/midwest, Region III Dallas/south, Region IV San Francisco/west; publishes inspection findings and performance data online); Reactor Oversight Process ROP (three cornerstones: Initiating Events — preventing reactor trips and loss of coolant accidents; Mitigating Systems — ensuring safety systems available when needed; Barrier Integrity — maintaining fuel cladding and reactor coolant pressure boundary; each cornerstone has quantitative performance indicators; inspection program: ~2,500 baseline inspection hours per plant per year with additional inspections triggered by findings; action matrix: plants assigned to one of five columns — Column 1 Licensee Response routine monitoring, Column 2 Regulatory Response enhanced NRC engagement, Column 3 Degraded Cornerstone problem list review, Column 4 Multiple/Repetitive Degraded escalated enforcement, Column 5 Unacceptable Performance immediate action — license suspension possible); performance indicators (NRC publishes quarterly PI data on nrc.gov for every plant unit; key PIs: unplanned automatic scrams per 7000 critical hours — threshold for White >1.0, Yellow >2.0; safety system actuations per 7000 critical hours; safety system unavailability index; reactor coolant system identified leakage; all PIs have four color thresholds: Green very low significance, White low-to-moderate, Yellow substantial significance, Red high significance; threshold crossings trigger escalating NRC response; plant-specific action matrix status updated quarterly online); significance determination process SDP (each inspection finding evaluated for safety significance: defense-in-depth screening, then Phase 1 significance determination using pre-defined significance process for equipment/procedure findings; Phase 3 uses the plant-specific probabilistic risk assessment PRA; Green findings: no increased core damage frequency; White: very small increase CDF; Yellow: small increase CDF; Red: large increase CDF triggering escalated response; about 30-40 White/Yellow/Red findings per year across all plants); licensee event reports LERs (10 CFR 50.73 requires written LER within 60 days for: prohibited operation, manual scram, failure of single active component without causing scram, unanalyzed condition, degraded condition affecting safety function, events meeting reportability criteria; ~150-200 LERs per year; searchable at nrc.gov and in ADAMS; typical LER describes initiating event, immediate corrective actions, cause analysis root cause, extent of condition, and corrective actions; LER database shows recurring equipment failures and procedural vulnerabilities across the fleet — enabling industry-wide learning); TMI and Fukushima context (Three Mile Island March 28 1979: LOCA loss-of-coolant accident, operator confusion about core water level, partial core melt, small release of radioactive gases; no significant radiation health effects; catalyzed complete NRC overhaul: NUREG-0660 new requirements for emergency procedures, improved instrumentation, control room ergonomics, Emergency Response Organizations; Fukushima Daiichi March 11 2011: station blackout from tsunami, loss of cooling, three core melts, hydrogen explosions; NRC Near-Term Task Force NTTF: 12 orders and recommendations; FLEX program: portable backup equipment deployed at all US plants for beyond-design-basis events; reevaluation of flooding and seismic hazards; implementation complete 2015-2019); probabilistic risk assessment (every licensed plant has PRA model: level 1 core damage frequency CDF, level 2 large early release frequency LERF; industry average CDF ~1E-5 per reactor-year meaning 1 chance in 100,000 of core damage; LERF ~1E-6; PRAs in ADAMS as public documents; NRC uses risk-informed regulation: 10 CFR 50.65 maintenance rule uses PRA to prioritize maintenance; risk-informed licensing allows changes to technical specifications; NUREG-1150 flagship PRA study established industry methodology); ADAMS document system (Agency-wide Documents Access and Management System: 7M+ publicly available NRC documents dating from 1970s; full-text searchable at nrc.gov/reading-rm/adams; document types include inspection reports, LERs, orders, correspondence, NUREG reports, environmental impact statements, license applications; primary research tool for nuclear regulatory history; license renewal applications 800-1000 pages describing aging management programs); nuclear capacity factors and fleet economics (US nuclear capacity factors averaged 92-93% 2015-2024 — highest of any electricity generating technology due to high fixed cost economics driving maximum utilization; ~15 plants retired 2012-2023 (Kewaunee, Vermont Yankee, Fort Calhoun, San Onofre, Oyster Creek, Pilgrim, Davis-Besse units, Indian Point 2&3, others) primarily due to competition from low-cost natural gas and flat electricity demand, not safety; NRC license renewals extend operation to 60-80 years; EIA publishes capacity factor data by plant; Python parsing NRC quarterly PI XML files and cross-references to eia-electricity-data, epa-toxic-release-inventory, and ferc-energy-enforcement.

  93. Q4Writing

    Bureau of Prisons federal inmate data deep-dive published

    Long-form article on federal prison population data: BOP overview (Federal Bureau of Prisons established 1930; manages 121 federal facilities including federal correctional institutions FCI, US penitentiaries USP, federal medical centers FMC, federal detention centers FDC, federal transfer centers FTC, prison camps adjacent to secure facilities; as of 2024 ~148,000 inmates in BOP facilities plus ~14,000 in privately contracted beds; peak population 219,000 in 2013; decline driven by reduced mandatory minimum sentences, compassionate release expansion, FIRST STEP Act reforms, and COVID releases; published statistics at bop.gov/about/statistics updated weekly); inmate population breakdown (by security level: minimum security camps ~17%, low security ~38%, medium security ~29%, high security USP ~12%, ADX administrative maximum Florence ~0.5%; by gender: male ~93%, female ~7%; by race/ethnicity: White non-Hispanic 57%, Black non-Hispanic 38%, Hispanic 2%, Other 3%; offense categories: drug 43.4%, weapons/firearms 19.8%, immigration 10.8%, sex offenses 10.4%, robbery 3.4%, extortion/fraud/bribery 3.0%, burglary/larceny 1.1%, other 8.1%; average sentence 8 years 5 months; average time served 4 years 11 months); drug offense concentration (21 USC 841 trafficking: mandatory minimum 5 years for 500g powder cocaine or 5g crack; 10 years for 5kg powder or 50g crack; 100:1 crack/powder ratio created by Anti-Drug Abuse Act 1986 — Black defendants received dramatically longer sentences than White defendants for equivalent conduct since Black drug market concentrated more in crack; Fair Sentencing Act 2010 reduced ratio to 18:1; FIRST STEP Act 2018 made FSA retroactive — ~3,000 resentencings; mandatory minimums still account for majority of federal drug sentences; Booker v. United States 2005 Supreme Court made Sentencing Guidelines advisory rather than mandatory but prosecutorial charging still drives minimums); FIRST STEP Act 2018 (bipartisan criminal justice reform; retroactive sentencing for crack cocaine; risk/needs assessment system: Risk and Needs Assessment NCA tool renamed PATTERN Prisoner Assessment Tool Targeting Estimated Risk and Needs; earned time credits: minimum and low risk inmates earn 10-15 days per 30 days of productive activities; these credits can accelerate transfer to supervised release or halfway house; implemented slowly 2020-2024 amid implementation controversy); BJS National Prisoner Statistics (Bureau of Justice Statistics — DOJ statistics agency distinct from BOP — collects annual data from all 50 state DOCs plus BOP via NPS survey; published in annual "Prisoners" bulletin; the definitive source for total US incarceration: ~1.9M people in state prisons, local jails, federal prisons combined before COVID; declined ~20% during COVID; state systems hold 87% of US prisoners for state crimes; BJS also publishes Survey of Prison Inmates SPI with individual-level characteristics, Mental Health of Prisoners, HIV in Prisons); US Sentencing Commission data (USSC publishes annual Statistical Report on every sentenced federal criminal case; data files at ussc.gov contain case-level records since 1991 with: offense guideline, criminal history category, guideline range, actual sentence, type of departure, mandatory minimum applicable, whether substantial assistance motion filed, defendant demographics — race, gender, age, citizenship, education; the canonical data source for federal sentencing disparity research; Starr and Rehavi 2014 published controlling for all observables finding 11.5% longer sentences for Black males; USSC gender disparity study; Fast Facts publications for specific offense types); PACER federal court records (Public Access to Court Electronic Records; every federal criminal docket accessible; CR docket: complaint, arrest warrant, indictment or information, arraignment, plea or trial, presentence report PSR filed under seal, sentencing; PACER charges $0.10/page access fee; RECAP browser extension and Free Law Project CourtListener archives PACER documents for free; case.law provides state court appellate records; federal criminal dockets show every step from arrest through supervised release revocations); supervised release mechanics (Sentencing Reform Act 1984 eliminated federal parole — all federal sentences are determinate; after serving 85-100% of sentence inmates receive fixed "supervised release" term (SR replaces parole); SR conditions set at sentencing: reporting to probation officer, employment requirements, drug testing, geographic restrictions, association restrictions; violations can result in revocation and additional imprisonment up to SR term remaining; BJS publishes revocation statistics; probation officers supervise ~150,000 federal supervised releases); private prison contracting (CoreCivic formerly CCA and GEO Group hold 14-18% of federal inmates under DOJ BOP contracts; Obama DOJ 2016 phase-out order for private prisons reversed 2017; Biden DOJ 2021 EO 14006 mandating non-renewal of private prison contracts; stalled by congressional resistance and contract terms; USASpending.gov shows CoreCivic and GEO Group contracts: CoreCivic BOP contract worth ~$400M/year, GEO ~$300M/year; political lobbying by private prison firms on sentencing policy documented by Justice Policy Institute); Python script downloading BOP weekly population statistics HTML, parsing by offense category, computing current distribution, creating ranked bar chart visualization, and cross-references to fbi-ucr-crime-data, doj-false-claims-act, and nlrb-union-elections.

  94. Q4Writing

    USCIS immigration adjudication data deep-dive published

    Long-form article on USCIS data: USCIS overview (US Citizenship and Immigration Services: adjudicates immigration benefits only — not enforcement which is ICE/CBP; independent agency within DHS; ~20,000 employees across 86 offices; funded almost entirely by application fees ~$3.6B/year rather than Congressional appropriations — fee increases can become barriers to access; processes family petitions I-130, employment petitions I-140, adjustment of status I-485, work authorization I-765, naturalization N-400, asylum I-589, DACA I-821D; publishes data through USCIS Data Hub at data.uscis.gov and Immigration Data and Statistics at uscis.gov); naturalization data (N-400 Application for Naturalization; requirements: 5 years LPR or 3 years if married to US citizen throughout that period; continuous residence in the US; physical presence for at least 30 of prior 60 months; good moral character — no disqualifying criminal history; English language ability — written and oral tested; civics knowledge — 10 oral questions from 100-question bank in current 2008 test, must answer 6+ correctly; fee $725 as of 2024 plus $85 biometrics; USCIS publishes annual naturalization statistics by country of birth, state of residence, age group, country of prior citizenship; ~875,000 naturalizations FY2023; top countries: Mexico, India, Philippines, Cuba, Dominican Republic; peak: ~1M in FY2008 amid pre-election drive); LPR green card categories (family-sponsored: immediate relatives of US citizens unlimited — no per-country cap, no annual limit; approximately 450,000-500,000 immediate relative admissions per year; family preference: F-1 unmarried sons/daughters of USCs, F-2A spouses/children of LPRs, F-2B unmarried adult children of LPRs, F-3 married children of USCs, F-4 siblings of adult USCs — 226,000 combined annual cap with per-country limits; employment-based: EB-1 priority workers 40,000; EB-2 professionals with advanced degrees 40,000; EB-3 skilled workers 40,000; EB-4 special immigrants 10,000; EB-5 investors 10,000; total 140,000 with 7% per-country annual cap; diversity visa DV-55000: 50,000 winners from underrepresented countries by lottery — 14 million applicants annually; refugee/asylee adjustment: unlimited backlog clearance of those granted protection); India and China EB backlog catastrophe (7% per-country cap: no more than 7% of the 140,000 annual EB green cards from any single country; India files approximately 70-80% of EB-2 and EB-3 petitions due to H-1B holder demographics; 70% of demand vs. 7% of supply creates backlog that grows every year; USCIS Visa Bulletin shows the cutoff priority date for each preference category and country — EB-2 India cutoff: approximately January 2012 as of 2024, meaning an applicant with January 2024 priority date must wait until 2076+; Cato Institute Bier and Kovacs 2020 estimated 1.4M applicants in the EB backlog; 99% of EB-3 India applicants have priority dates before 2015; bipartisan bills to eliminate per-country caps (HR 1044/S 386) passed House 2019 and 2020 but failed Senate due to objection from states that would lose DV lottery spots); H-1B lottery (cap-subject: 65,000 regular cap plus 20,000 US Master's degree cap; FY2025 lottery: 470,342 registrations for 85,000 slots = 18% selection rate, down from 29% FY2021; USCIS shifted to wage-ranked selection July 2023: highest-wage registrants selected first — attempted to reduce low-wage H-1B filings; court challenge struck down wage-ranked selection October 2023; reverted to random lottery; cap-exempt: universities, nonprofits, government research organizations — no cap, no lottery; Top employers: Infosys, Wipro, Tata, Cognizant, Capgemini, Amazon, Google, Microsoft; LCA prevailing wage Level I-IV filings data published by DOL; H-1B Data Hub at USCIS shows approvals/denials by employer); asylum system (I-589 Application for Asylum or Withholding of Removal; two pathways: affirmative asylum = filed with USCIS if not in removal proceedings, adjudicated by asylum officers; defensive asylum = filed with immigration judge in EOIR proceedings; USCIS affirmative asylum backlog: 1.7M+ pending cases as of December 2023, up from 342,000 in FY2014; annual receipts exceeded adjudication capacity by 2017; affirmative asylum grant rates vary by nationality: Venezuelans ~67%, Chinese ~45%, Guatemalans ~18%, Mexicans ~10%; EOIR immigration court backlog: 3.3M+ pending cases; EOIR asylum grant rates: national average ~41% but varies from 0% to 95% by judge — the documented "judge lottery" in immigration court); DACA program (Deferred Action for Childhood Arrivals: Obama executive action June 2012; protects "Dreamers" brought to US as children from deportation for 2-year renewable periods; also grants work authorization; not a path to citizenship or green card; requirements: arrived in US before age 16, continuously resided since June 2007, born after June 1981, in school or graduated or military, no felony or significant misdemeanor; ~530,000 active DACA recipients December 2023; USCIS publishes quarterly DACA data by state and country of birth; Crane v. Johnson Texas District Court July 2021 declared DACA unlawful; 5th Circuit upheld; DHS published final DACA rule August 2022; Supreme Court cert watch; those already with DACA can renew but no new initial grants); data access (USCIS Data Hub at data.uscis.gov; Socrata API with quarterly caseload data; DHS Yearbook of Immigration Statistics: comprehensive annual report covering all immigration benefits across all DHS agencies; TRAC Immigration at trac.syr.edu: court-level outcomes, judge-level approval rates, representation rates; State Department monthly NIV statistics at travel.state.gov; FOIA for individual case status); Python script and cross-references to uscis-h1b-data, dol-h2-visa-disclosures, and ice-enforcement-removals.

  95. Q3Writing

    FBI UCR crime statistics deep-dive published

    Long-form article on FBI Uniform Crime Reporting: UCR history (program launched 1929 from IACP proposal; voluntary participation of law enforcement agencies; evolved from manual tally to electronic submission; ~18,000 agencies in 2024 covering ~95% of US population); two parallel reporting systems (Summary Reporting System SRS: legacy system producing aggregate monthly offense counts in Part I and Part II tables; National Incident-Based Reporting System NIBRS: incident-level detail with offense/offender/victim/property/arrestee segments; FBI mandated transition by Jan 1 2021 but NYC, LA, Chicago and many others missed the deadline; 2021 national crime statistics missing ~40M people; most incomplete report in decades; 2022-2023 improved as agencies onboarded); Part I Index Crimes (8 offenses tracked since 1930: violent crime — murder and nonnegligent manslaughter [definition: willful killing not negligent], rape [legacy definition changed to broader 2013 definition], robbery [taking by force or threat], aggravated assault [assault causing serious injury or with deadly weapon]; property crime — burglary [unlawful entry to commit felony/theft], larceny-theft [the most common, includes shoplifting, pickpocketing], motor vehicle theft, arson [added 1979]); NIBRS incident structure (master incident record: incident date, location type, incident hour; up to 10 offense records each with offense type using 53 Group A offense categories including 09A murder, 13A aggravated assault, 23H larceny, 35A drug possession, 40A pornography; up to 99 offender records with age/sex/race/ethnicity; up to 999 victim records with injury type, relationship to offender, victim characteristics; property records with property type and loss amount; arrestee records when applicable; the hierarchy rule applied only in SRS — NIBRS counts all offenses per incident); 2020-2022 murder surge (national murder rate: 6.0 per 100K 2019 → 7.8 per 100K 2021, largest single-year increase +29.4% from 2019 to 2020 since national tracking began in 1960; causes debated: COVID pandemic disruptions to courts and social services; police legitimacy crisis and "de-policing" following George Floyd May 2020; illegal firearms trafficking; breakdown in community conflict resolution mechanisms; ProPublica/Marshall Project analysis of city-level data: surge concentrated in jurisdictions with pre-existing violence concentrations — not a uniform national phenomenon; rates began declining 2022 in most major cities; 2023 preliminary data showed continued decline); hate crime data (published separately: ~11,000-13,000 reported incidents/year — widely believed to severely undercount; many agencies report zero or do not report at all; bias motivation categories: race/ethnicity/ancestry, religion, sexual orientation, gender identity, disability; offense types same as UCR but flagged; 2021-2023 anti-Asian hate crimes surged 300%+ from 2019 base in cities with significant Asian populations amid COVID-19 stigmatization; anti-LGBTQ+ crimes rose; anti-Semitic crimes rose in 2023 following October 7); LEOKA Law Enforcement Officers Killed and Assaulted (annual report: officers feloniously killed by weapon type firearm/vehicle/other; officers accidentally killed; officers assaulted with type of weapon; by region and agency type; historically 40-70 felonious killings per year; spikes in 2021 and 2022 — ambush-style killings increased); dark figure of crime and NCVS (UCR counts only crimes known to police; unreported crime = "dark figure"; Bureau of Justice Statistics National Crime Victimization Survey NCVS surveys ~240,000 people annually regardless of police contact; NCVS consistently shows 2-3x more victimization than UCR for property crimes and 2x+ for violent crimes except murder; NCVS showed declining victimization 2021-2022 even as UCR showed murder increases — reconciliation essential; NCVS uses stratified cluster sample of housing units, not voluntary police reporting); CDE API and data access (Crime Data Explorer at cde.fbi.gov; API endpoints: /api/participation/national, /api/summarized/agencies/offense-type, /api/summarized/national/offense-type; API key required via free registration; returns JSON; limited to aggregate summary data; full NIBRS microdata as annual CSV files at cde.fbi.gov; state and agency-level files; Python CDE API script pulling state-level violent crime totals per 100K, computing 2019-2021 murder rate change, and cross-references to nhtsa-fars-fatality-data, dol-osha-inspections, and fbi-nics-background-checks.

  96. Q3Writing

    CMS Medicare Hospital Cost Reports deep-dive published

    Long-form article on CMS Medicare Cost Reports and HCRIS: MCR overview (every Medicare-certified hospital files annual Medicare Cost Report MCR via MAC Medicare Administrative Contractor; forms: FFIEC 2552-10 for short-term acute hospitals, Long-Term Care Hospital LTCH, Inpatient Psychiatric Facility IPF, Inpatient Rehabilitation Facility IRF, Critical Access Hospital CAH, Children's Hospital; MCR reconciles actual costs to Medicare payments received — determining settlement payment due to or from hospital; CMS publishes "as submitted" cost reports on data.cms.gov and through HCRIS Healthcare Cost Report Information System at CMS and NBER hospital.nber.org; covers all Medicare-certified hospitals including for-profit, nonprofit, and government regardless of whether they file 10-K); Worksheet S statistical data (hospital identifier, provider type, fiscal year, beds by type, total inpatient days, Medicare inpatient days, Medicare discharges, outpatient visits, emergency visits, total FTEs by category, interns and residents by type for GME computation, rural/urban designation, critical access hospital status); Worksheet A cost centers (trial balance of all costs assigned to ~45 cost centers: routine inpatient nursing, ICU, CCU, surgical and obstetrical, anesthesia, radiology, laboratory, pharmacy, medical supplies, physical therapy, respiratory therapy, outpatient departments, emergency, clinic, dietary, housekeeping, medical records, nursing administration, central services, total plant overhead, employee benefits, administrative and general); Worksheet B cost allocation (stepdown method: allocate overhead cost centers to direct patient care using relative value unit statistics like pounds of laundry, square footage, meals served; each RVU denominator in Schedule B-1; the stepdown order matters — each allocated cost center absorbs from upstream but not downstream); Worksheet C reimbursable cost (computation of Medicare allowable cost: total costs after stepdown × Medicare utilization ratio; yields Medicare cost for inpatient and outpatient combined; the Medicare cost-to-charge ratio CCR = Medicare costs / Medicare charges — published by CMS for each hospital and used for outlier payment calculation); Worksheet E reimbursement calculation (total Medicare inpatient reimbursement: base DRG payments + outlier payments + DSH adjustment + IME adjustment + GME payments + uncompensated care pool allocation; Medicare outpatient reimbursement from OPPS APCs; total settlement = reimbursement minus interim payments already made); cost-to-charge ratio and charge master problem (US hospital charge masters have no relationship to actual costs or insurance payments; charge markup = total charges / total costs; ranges from 2x for small nonprofit community hospitals to 10x+ for for-profit hospital systems like HCA; CMS requires HCRIS CCR for outlier payment computation; researchers use CCR to deflate Medicare claims charge data to cost-equivalent values for cost-effectiveness analysis); DSH payments (DSH Disproportionate Share Hospital payment: adjustment for hospitals serving disproportionate share of low-income patients; CMS computes DSH percentage = Medicare SSI beneficiary days / total Medicare days + Medicaid days / total patient days; ACA 2014 restructured: uncompensated care pool = 75% of prior DSH reductions × Factor 1 percent of uninsured × Factor 2 each hospital's share of audited uncompensated care from Worksheet S-10; DSH payments visible in Worksheet E); teaching hospital adjustments (IME indirect medical education adjustment: statutory formula multiplier 1.35 × ((1 + intern-to-bed ratio)^0.405 - 1); the 1.35 multiplier embeds both direct teaching cost and a volume effect for tertiary case referral; GME direct graduate medical education: Medicare pays per-resident-amount PRA × number of FTE residents; PRAs frozen at hospital-specific 1984 costs with subsequent inflation adjustment limited; GME and IME together ~$12-15B/year to teaching hospitals; visible in Worksheet E); Worksheet S-10 uncompensated care (added 2016, audited beginning 2020; reports charity care at cost and bad debt expense for DSH uncompensated care pool allocation; required CMS audit began 2020; total charity care at cost as fraction of net patient revenue; differs from Form 990 Schedule H which uses charges not cost basis; the combination of S-10 and 990 Schedule H allows triangulation of charity care); HCRIS access (CMS publishes raw alpha-format files — each Worksheet with variable labels — at data.cms.gov and CMS research files; NBER hospital.nber.org provides consistent annual HCRIS extracts 1996+ with cost-normalized versions; key variables: NET_INCOME operating income, TOT_COSTS total costs, TOT_CHRGS total charges, NET_PAT_REV net patient revenue, DISCHRGS_MDCR_IPPS Medicare DRG discharges, MDCR_DAYS Medicare patient days, IME_PYMTS IME payments, GME_PYMTS GME payments); Python script downloading HCRIS 2552-10 from NBER, filtering to short-term acute CCN suffixes 0001-0879, computing CCR, charge markup, and operating margin by ownership type, and cross-references to cms-hospital-quality, cms-medicare-advantage, and medicare-part-b-data.

  97. Q3Writing

    SBA 7(a) and 504 loan program data deep-dive published

    Long-form article on SBA loan program data: SBA lending overview (SBA does not lend directly — it guarantees loans made by approved private lenders; the guarantee means if borrower defaults SBA pays lender the guaranteed portion and then pursues borrower for recovery; this credit enhancement allows lenders to approve loans they would otherwise decline for small businesses that lack sufficient collateral or credit history; FY2023 7(a) approvals: ~$27B in 57,000 loans; 504 approvals: ~$7B in 5,300 projects; combined annual guarantee authority ~$40B); 7(a) program in depth (eligible borrowers = small businesses per SBA size standards by NAICS code — typically <500 employees for manufacturers, revenue-based limits for service industries ranging $7.5M-38.5M; eligible uses = working capital, equipment purchase, leasehold improvements, real property purchase/construction, business acquisition, refinancing of existing debt; maximum loan $5M; maturity: 25 years for real estate, 10 years for equipment and working capital; interest rate = prime rate + spread capped by SBA: 2.75pp for loans >$50K and maturity >7 years, 2.25pp for shorter; guarantee percentage: 85% for loans ≤$150K, 75% for loans >$150K; 7(a) Small Loan ≤$500K with simplified eligibility; SBA Express ≤$500K with 50% guarantee and 36-hour approval turnaround; CAPLines revolving credit for seasonal or cyclical working capital needs); 504 three-party structure (private lender first mortgage = 50% of total project cost at market rate; CDC Certified Development Company second mortgage = 40% of project cost funded by SBA-guaranteed debenture sold to investors at below-market rate; borrower equity = 10% of project cost, 15% for startups or single-purpose properties; CDC debenture sold at fixed rate to bond market with SBA guarantee — creating long-term fixed rate financing attractive for real estate; maximum CDC/SBA portion $5.5M basic, $5.5M for public policy/green/export projects; eligible uses limited to fixed assets: commercial real estate, equipment with 10+ year useful life; cannot use for working capital, inventory, or debt refinancing); public loan data (SBA publishes loan-level data for all approved 7(a) and 504 loans at sba.gov and data.sba.gov; FY2010 forward with some earlier data available; key 7(a) fields: LoanNumber, ApprovalDate, GrossApproval, SBAGuaranteedApproval, InitialInterestRate, TermInMonths, BusinessName, BorrCity, BorrState, BorrZip, NaicsCode, FranchiseCode, BusinessType corporation/LLC/sole proprietor, BusinessAge, LoanStatus charged off/paid-in-full/active/cancelled, GrossChargeOffAmount, SubprogramDescription SBAExpress/Standard, RuralUrban, LMIIndicator low-moderate income, BusinessDemographic minority/women/veteran owned); lender concentration (top 7(a) lenders by volume: Live Oak Bank #1 by number of loans specializing in veterinary, dental, funeral, agribusiness, brewery niches using proprietary industry underwriting models; Wells Fargo, JPMorgan, Bank of America for dollar volume; specialized SBA lenders: Newtek Business Services, CDC Small Business Finance; Community Advantage program: CDFI and nonprofit lenders targeting underserved communities with SBA guarantee for loans ≤$350K; 2014 OIG report identified high-risk SBA 7(a) lenders with charge-off rates 10-25% vs. program average; used public loan data to compute lender-level default rates); default and charge-off analysis (7(a) cumulative default rates by cohort historically 10-15% of loans by count; pandemic cohort: EIDL Economic Injury Disaster Loan direct SBA lending $400B+ not in public loan database; PPP Paycheck Protection Program separate table; SBA 7(a) post-pandemic cohort shows elevated charge-off rates for 2020-2022 vintages; the GrossChargeOffAmount field minus any SBA recovery = net loss to SBA; analysis methodology: group by approval year cohort, compute charge-offs as percentage of gross approvals weighted by dollar amount); SBIC program (Small Business Investment Company: SBA-licensed private equity and venture capital funds; raise low-cost debt from SBA to invest in US small businesses; $5-6B/year investment; Intel 1961, Apple 1978 received early SBIC financing; SBIC data published annually but less granular than 7(a)/504 loan data); equity and access findings (SBA collects minority-owned, women-owned, veteran-owned flags; research consistently shows these groups receive smaller average loan amounts despite similar creditworthiness; Community Reinvestment Act scoring includes SBA lending in some frameworks; SBA 8(a) Business Development Program: separate from lending, channels federal contracts to socially and economically disadvantaged businesses — 8(a) firms receive set-asides and sole-source awards; Socrata API at data.sba.gov for programmatic access; Python script downloading 5 fiscal years of 7(a) loans, computing by-sector default rates and top-10 lender charge-off rates, and cross-references to usaspending-federal-contracts, fdic-call-report-data, and census-county-business-patterns.

  98. Q3Writing

    BLS American Time Use Survey deep-dive published

    Long-form article on the BLS American Time Use Survey: ATUS overview (launched 2003 as collaborative BLS/Census Bureau survey; surveys ~10,000-26,000 Americans per year — budget cuts reduced sample size over time; participants are CPS household members who complete a 24-hour retrospective time diary for a randomly assigned reference day; the only federal dataset providing systematic measurement of all daily activities across the civilian non-institutional population age 15+; sample design: subsample of CPS households 2-5 months after CPS interview; ATUS response rate ~45%); time diary methodology (respondent recounts all activities in chronological order for the 24-hour period from 4am to 4am; interviewer codes activities using hierarchical ATUS Lexicon — 6-digit codes organized under 17 first-level categories with ~400 total activities; activity attributes: duration in minutes, location using 7 category location codes (home/workplace/someone else's home/restaurant/bar/place of worship/grocery store/sporting/recreational area/vehicle/other), who was present using "who codes" for spouse/partner/own child/other household/friends/coworkers/boss/other nonhousehold; secondary activities: simultaneous activities like exercising while watching TV; diary approach provides actual time rather than recalled estimates which suffer from recall bias — respondents in surveys consistently overestimate work hours and underestimate TV watching); 17 major activity categories (1-Personal care: average ~9.45 hours including sleeping 8.76 hours, washing/dressing, medical care; 2-Household activities: average 1.93 hours including food preparation 33 minutes, food cleanup 17 minutes, interior cleaning 30 minutes, laundry/linens 20 minutes, exterior maintenance 14 minutes, lawn/garden 20 minutes, animal/pet care 21 minutes; 3-Caring for household members: primary childcare activities — physical care of children, education, activities with children, supervision; 4-Caring for nonhousehold members: eldercare for parents, helping neighbors/friends; 5-Work and work-related: employed adults 7.6 hours on workdays; 6-Education; 7-Consumer purchases; 8-Professional/personal care services; 9-Household services; 10-Government services; 11-Eating and drinking: 67 minutes average including 20 minutes breakfast, 30 minutes dinner; 12-Socializing, relaxing, leisure: 5.0 hours including television 2.82 hours; 13-Sports/exercise/recreation: 0.3 hours average across population; 14-Religious/spiritual; 15-Volunteering: 0.13 hours; 16-Telephone calls; 17-Traveling: 1.2 hours); gender gap (the most replicated ATUS finding: women average 2+ hours/day more in household activities + caring for household members combined; men average ~0.5 hour/day more paid work time; men average ~0.4 hour/day more total leisure time — primarily television; the "total work" concept of Gershuny: paid work + unpaid work roughly equal by gender (Aguiar and Hurst 2007) — but leisure quality differs; COVID-2020 special module: mothers increased childcare by 0.5 hours/day while fathers increased by 0.3 hours/day during school closures; Dingel et al. 2020 used ATUS to estimate teleworkable jobs during pandemic — linked ATUS activity codes to occupation telework feasibility); parental time investment trend (parents with children under 6 average 2.3 hours/day primary childcare; Ramey and Ramey 2010 documented intensive parenting trend: college-educated parents significantly increased childcare time 1975-2007 despite working more; the pressure-filled "helicopter parenting" norm among high-education parents documented through ATUS; time with children strongly stratified by parental education — college graduates spend more time with children despite working similar hours); working from home (ATUS asks whether work was done at home on workdays; pre-2020: ~23% of employed Americans did some work from home on any workday; April 2020 COVID peak: 42% — the direct federal measurement of the remote work shift; 2022-2023: ~28-30% maintaining structural permanence; Dingel and Neiman 2020 Harvard paper on teleworkable jobs used ATUS occupation-activity linkage to estimate 37% of US jobs could fully work from home); leisure inequality (Aguiar and Hurst 2007 American Economic Review: leisure rose sharply for less-educated Americans 1965-2003, driven by TV watching; college graduates watch 3+ hours/day less TV than high school graduates but have similar total leisure; sports and exercise 5x more common in college graduates; the "leisure paradox" — higher income associated with less time watching TV but more time in active leisure); ATUS data access (BLS at bls.gov/tus: three main files — ATUS Activity, ATUS Respondent, ATUS Case Header; IPUMS-ATUS at ipums.org/time-use: harmonized 2003-2023 merged data with consistent variable names, CPS demographic characteristics linked, free researcher access after registration; ATUS activity lexicon downloadable; weights: TUFINWGT person-level weight for population estimates; must use weights for nationally representative estimates; ATUS summary tables published by BLS for major demographic groups); special modules (2010/2012/2013/2021 Well-Being Module: respondents rate each activity's pleasantness, meaningfulness, stress level, pain level — produces life satisfaction research data; 2006-2008 and 2014-2016 Eating & Health Module: detailed eating behaviors, grocery shopping, food preparation; 2011 and 2017-2018 Leave Module: paid leave availability, vacation taken; these modules make ATUS analytically unique for welfare economics beyond simple time allocation); Python script merging ATUS Activity and Respondent files by TUCASEID, computing weighted average minutes per day by major activity category separately for men and women, calculating gender gap in household/caregiving vs. paid work and leisure, and cross-references to bls-qcew-employment, bls-oews-occupational-wages, and census-acs-data.

  99. Q3Writing

    FDIC Call Report quarterly banking data deep-dive published

    Long-form article on FDIC Call Report data: three filing forms (FFIEC 031 for banks with international offices, FFIEC 041 for domestic banks, FFIEC 051 for community banks under $5B assets since 2017 reducing regulatory burden); filing mechanics (due 30 days after quarter end, extended to 45 for large institutions; CEO attestation under penalty of perjury; FDIC/Fed/OCC triagency FFIEC coordination; reconciles to 10-K/10-Q for bank holding companies); RC balance sheet schedules in depth (RC-A: cash and due-from balances; RC-B: investment securities HTM vs. AFS with 7 security type categories including Treasuries, Agency, CMBS, municipal, corporate, ABS, other; RC-C: loans and leases by 15 categories covering C&I, construction, multifamily, 1-4 family residential, consumer installment, credit cards, agricultural, lease financing; RC-D: trading assets; RC-E: deposits broken into demand/NOW/MMDA/savings/time by maturity and CD size; RC-F/G: other assets and liabilities; RC: aggregate summary balance sheet with total assets, equity, AOCI); Schedule RI income statement (interest income by earning asset category; interest expense by funding type; provision for credit losses; noninterest income — service charges, fiduciary, trading, origination fees, securitization gains, other; noninterest expense — salaries and benefits, occupancy, professional fees, FDIC insurance premiums, other; pre-tax income, taxes, net income); asset quality Schedule RC-N (30-89 day past due by loan category; 90+ day past due; nonaccrual by loan category; troubled debt restructurings TDR; ALLL allowance for loan and lease losses — transitioned to CECL Current Expected Credit Loss model 2023 for large banks; net charge-off rate = charge-offs minus recoveries / average loans — the key credit cycle indicator); capital adequacy Schedule RC-R (CET1 Common Equity Tier 1, Tier 1, Total Capital ratios computed against risk-weighted assets; Tier 1 leverage ratio against average total assets; PCA Prompt Corrective Action thresholds: well-capitalized 10%+ total risk-based, adequately capitalized 8-10%, undercapitalized 6-8%, significantly undercapitalized <6%, critically undercapitalized <2% tangible equity; GSIB supplementary leverage ratio 5% vs. 3% for others); SVB 2022 Call Report warning signs (year-end 2022: HTM securities $91.3B booked at amortized cost with $15.9B unrealized loss excluded from CET1 under FASB rules; AFS securities with additional $2.5B unrealized loss flowing through OCI and reducing CET1; deposit concentration in venture-backed startups >$250K FDIC limit creating ~94% uninsured deposit ratio; FHLB advance dependence at $15B; net interest income sensitivity: rate shock analysis in RC-L off-balance-sheet showing severe NIM compression under rate hike scenario; all visible in December 2022 Call Report filed February 2023, 3 weeks before the March 8-10 2023 run); Texas Ratio (nonperforming assets including nonaccrual loans + 90+ day past due + OREO other real estate owned divided by tangible common equity plus loan loss reserves; >100% historically predicts bank failure with 80%+ accuracy during GFC; banks with Texas Ratio >100% in 2007 had 87% failure rate by 2012; still used by community bank analysts for screening); FDIC BankFind Suite API (banks.data.fdic.gov/api; /institutions endpoint for bank directory; /financials endpoint for quarterly metrics by RSSD ID or name; /history for open/close/merger events; key financial fields: REPDTE report date, ASSET total assets, NETINC net income, NIM net interest margin, INTINC interest income, NONII noninterest income, NONIX noninterest expense, LNLSNET net loans and leases, DEPLSAMT deposit liabilities, DRLNLS net charge-offs to average loans, ROA return on assets, ROE return on equity, INTEXP interest expense, EQUITY total equity capital); Python script using BankFind API to pull quarterly data for all community banks in a state, computing Texas Ratio + efficiency ratio + CET1 for each, ranking by Texas Ratio to screen for stress, and cross-references to fdic-bank-failures, fed-h8-bank-balance-sheets, and occ-bank-enforcement.

  100. Q3Writing

    BLS Multifactor Productivity deep-dive published

    Long-form article on BLS Multifactor/Total Factor Productivity data: two BLS productivity programs (Labor Productivity and Costs: output per hour, quarterly, widely watched; Multifactor Productivity: output per combined labor AND capital inputs, annual — the Solow residual capturing technological progress and efficiency gains); Solow residual and growth accounting (Robert Solow 1957 paper decomposed GDP growth into capital contribution weighted by capital income share + labor contribution weighted by labor income share + unexplained residual MFP; the residual averaged +1.5%/yr 1948-1973 "golden age," collapsed to +0.3%/yr 1973-1995 productivity slowdown caused by oil shocks/OPEC/structural adjustment — Dale Jorgenson and Griliches disputed measurement; recovered to +1.0%/yr 1995-2004 IT revolution; slowed to +0.4%/yr 2004-2019; 2020s: hotly debated whether generative AI will revive MFP to 1990s levels); BLS MFP measurement methodology (output = chained GDP or gross output for industry accounts; capital input = BLS capital services index aggregating structures/equipment/intellectual property products/inventories/land using Hall-Jorgenson rental price weights — accounts for both depreciation rate and user cost of capital; labor input = hours worked adjusted for composition changes: BLS computes a labor quality index weighting hours by education/experience/gender using CPS microdata; Tornqvist index aggregates sub-inputs using two-period average cost shares; MFP = Tornqvist output index minus Tornqvist combined input index); private business sector MFP back to 1947 (excludes general government, nonprofit institutions, owner-occupied housing — these sectors have no market output prices; separate manufacturing MFP for total manufacturing, durables, nondurables; computer and electronic products shows highest MFP growth because BEA uses hedonic quality-adjusted price indexes for computers: a processor that costs the same as 2010 but is 4x faster shows 4x more "output" — explaining apparent MFP surge in IT sectors); industry-level MFP (published for ~60 detailed NAICS industries from 1987; capital decomposed into equipment/structures/IP/inventories/land by industry; enables comparative productivity growth attribution: retail trade transformation 1987-2000 from Walmart logistics/big-box model; healthcare productivity puzzle — vast R&D and input growth but health outcomes improvement hard to monetize for output measurement); labor productivity vs. MFP (labor productivity = output/hours — can rise from capital deepening: more machines per worker with zero technological progress; MFP requires either genuine efficiency gain or technological progress; the distinction matters for real wage sustainability — real wages can only permanently rise at the rate of MFP growth in the long run; capital deepening raises labor productivity but doesn't require MFP improvement); unit labor costs (ULC = compensation per hour / output per hour = labor cost per unit of output; ULC growth = nominal wage growth minus labor productivity growth; Fed watches ULC as primary driver of "non-housing core services" PCE inflation; 2021: wage growth surged but productivity lagged → ULC spiked +6.7% — feeding services inflation; 2023: productivity recovered → ULC grew only 1.6% → services inflation moderated; FRED series: ULCNFB nonfarm business unit labor costs); productivity paradox and AI hypothesis (Robert Solow 1987 quip "you can see the computer age everywhere except in the productivity statistics"; computers installed throughout economy 1980s-1990s but BLS MFP showed no acceleration until 1995; Erik Brynjolfsson and Lorin Hitt proposed lagged learning-and-reorganization explanation — companies needed 7-10 years to restructure around computers; historical analog: electrification 1880-1930 only showed in productivity data ~25 years after widespread adoption as factory layouts restructured; AI hypothesis: LLMs deployed 2023 → productivity measurement lag → BLS MFP may not show AI contribution until 2028-2032; 2023 nonfarm business productivity grew 2.7% — highest in 20 years — could be early AI signal or mean reversion from 2022 productivity collapse); FRED series (OPHNFB = nonfarm business output per hour quarterly, seasonally adjusted annual rate; ULCNFB = nonfarm business unit labor costs; RCPHBS = real compensation per hour business sector; annual BLS MFP downloads from bls.gov/productivity/); Python script using BLS API to pull quarterly OPHNFB and ULCNFB, compute 4-quarter rolling averages, plot dual-axis chart showing 2021-2022 ULC surge and 2023 productivity recovery, and cross-references to bls-cpi-inflation, bea-gdp-accounts, and bls-qcew-employment.

  101. Q3Writing

    Medicaid enrollment and expenditure data deep-dive published

    Long-form article on Medicaid and CHIP enrollment and expenditure data: program overview (Medicaid Title XIX created by Social Security Amendments 1965; jointly financed federal-state program; states administer under federal rules with significant flexibility; mandatory populations: children in families below state thresholds, pregnant women, parents in low-income families, aged/blind/disabled receiving SSI; optional expansion: ACA 2010 added optional expansion of adults to 138% FPL with 90% federal match; ~90M total Medicaid+CHIP enrollees 2023 making it the largest health coverage program by headcount; combined federal+state spending ~$900B FY2023); data sources (monthly enrollment files published by CMS Medicaid.gov data hub: beneficiary counts by state and 8 eligibility groups including children, adult expansion, pregnant women, aged, blind/disabled, foster care, BHP, CHIP; T-MSIS Transformed Medicaid Statistical Information System: comprehensive claims and enrollment data replacing MSIS; TAF Transformed Analytic Files: CMS-processed T-MSIS research files for approved researchers; MBES Medicaid Budget and Expenditure System: quarterly expenditure by state and service category; managed care enrollment reports; Medicaid Drug Utilization database with NDC-level drug reimbursement); ACA Medicaid expansion (Section 1396a(a)(10)(A)(i)(VIII) extended eligibility to adults 19-64 at or below 138% FPL; FMAP for expansion adults phased to 90% after 2020; NFIB v. Sebelius 2012 made expansion optional; as of 2024 37 states plus DC expanded, 13 did not; non-expansion states have coverage gap for adults above 100% FPL who qualify for ACA exchange subsidies but not Medicaid; the 37 vs. 13 divide creates natural experiment for health coverage effects); COVID enrollment surge and unwinding (Families First Coronavirus Response Act March 2020: prohibited states from disenrolling Medicaid beneficiaries while PHE active + enhanced FMAP +6.2pp; continuous enrollment provision: states could not terminate coverage even if circumstances changed; enrollment grew from ~71M January 2020 to ~95M peak March 2023; when PHE ended April 1 2023 states began "unwinding": re-verifying eligibility and disenrolling beneficiaries; 12-month unwinding period required by Consolidated Appropriations Act 2023; states disenrolled 15-20M people through 2024; the unwinding was the largest coverage loss event since program founding; procedural disenrollments — people losing coverage for administrative reasons like wrong address rather than true ineligibility — were the focus of federal monitoring); FMAP mechanics (Federal Medical Assistance Percentage: the share of Medicaid costs the federal government pays, computed annually based on state per capita income relative to national average using three-year average formula; statutory floor 50%, no ceiling; range: 50% for high-income states (California, New York, Connecticut, Hawaii, Massachusetts, New Jersey) to 77-78% for Mississippi; enhanced FMAP: ACA expansion adults 90%; CHIP 88-93% enhanced match; temporary FMAP enhancements as fiscal stimulus: ARRA 2009 +6.2pp through December 2010; FFCRA 2020 +6.2pp through PHE end; Inflation Reduction Act extended +5pp through 2022, +2.5pp through 2023, +1.5pp through 2024); managed care dominance (~70% of Medicaid beneficiaries enrolled in MCOs as of 2023 up from 10% in 1991; states contract with private managed care organizations that receive capitated payments per member per month PMPM; CMS requires actuarially sound capitation rates based on fee-for-service cost data adjusted for risk; HEDIS quality measures and Adult/Child Core Set reported annually; CMS requires encounter data submission to T-MSIS; managed care has grown due to cost predictability for states and better care coordination claims); dual eligibles (~12M people are dually enrolled in both Medicare and Medicaid; highest-cost Medicaid population: average cost $35,000+/year vs. $8,000 for non-duals; Medicaid pays Medicare Parts A/B/D premiums, cost-sharing, and services not covered by Medicare including long-term care; care coordination historically fragmented between the two programs; Financial Alignment Initiative demonstrations 2011-2021; Fully Integrated Dual Eligible SNPs FIDE-SNPs and PACE Programs as integrated delivery models); long-term care (Medicaid pays 42% of all US long-term care expenditure including nursing facility and home and community-based services HCBS; Section 1115 waivers for home-based alternatives to nursing facilities — now outnumber nursing facility slots; nursing facility MECL Medicaid rate-setting; spend-down: individuals with assets above Medicaid limits "spend down" to eligibility after nursing home placement — the practical elimination of savings inherited by Medicaid beneficiaries' families), Python script downloading 24 months of Medicaid enrollment from Medicaid.gov Socrata API, computing state enrollment change from January 2023 pre-unwinding baseline, ranking states by percentage decline during unwinding, flagging expansion vs. non-expansion status, and cross-references to cms-medicare-advantage, cms-hospital-quality, and hhs-oig-exclusions.

  102. Q3Writing

    NLRB union election data deep-dive published

    Long-form article on NLRB union election records: election system overview (NLRA 1935 Section 9 established NLRB election process; 30%+ authorization card showing required to file petition and get election; NLRB regional offices (33 regions) review petition, determine appropriate bargaining unit, and schedule election; historically ~2,000-2,500 RC representation elections per year; results certified if union wins simple majority of votes cast), petition types (RC representation: union-filed seeking certification; RD decertification: workers seeking to remove incumbent union require 30% showing — union can win by withdrawing when losing; RM employer-filed when employer has objective basis to doubt majority union status such as loss of majority support; AC amendment of certification; UC unit clarification; RC elections account for ~80%+ of all elections), election data structure (NLRB publishes election data through Case Activity Tracking System CATS and annual statistical reports; key fields: case number format XX-RC-NNNNNN with region prefix; petition date; election date; employer name, address, NAICS code; union name and local/international affiliation; number eligible voters; votes for union; votes against union; challenged ballots; void ballots; election result — certification issued, election failed, no action; tally of ballots filed on NLRB.gov public case page; NLRB API at api.nlrb.gov with case search endpoints), win rate historical trajectory (1936-1950: win rate >65% as labor movement was ascending; 1950s-1970s: win rate 60-65% while absolute election numbers peaked ~7,000-8,000/year; 1981 PATCO air traffic controller firing by Reagan October 1981 signaled to employers that unions were politically vulnerable — win rate fell to 45% by mid-1980s; 1980s: union avoidance industry expanded — consultants, attorneys, mandatory captive audience meetings, delays; election volume collapsed as employers made organizing more costly; 1990s-2000s: win rate recovered to 55-60% but on much lower volume ~2,000/year; 2010-2021: win rate rose to 65-70% as NLRB general counsels became more union-sympathetic but volume remained depressed; 2022-2024 organizing surge: Starbucks Workers United filed first petition October 2021 at Buffalo Elmwood store; won December 2021; by 2024 had won 350+ elections at Starbucks properties; Amazon Labor Union independent union organized Staten Island JFK8 ~8,000-worker warehouse April 2022; first Amazon union in US history; graduate students at major universities; REI, Trader Joe's, Apple retail stores; NLRB election petitions filed: FY2022 2,510 — highest since 2015; win rate rose to 68-72%; absolute number lower than 1970s peak but accelerating), unit determination (appropriate unit must share "community of interest" — similar wages, benefits, working conditions, common supervision, common management; NLRB regional director determines appropriate unit; employer and union can stipulate to a unit or litigate; Specialty Healthcare 2011 NLRB decision: "micro-units" of small groups allowed — employers argued this made organizing too easy; Kindred Nursing Centers 2016 narrowed micro-units; unit determination is the first strategic battleground), blocking charge reform and election bar (union can file ULP charges "blocking" election if employer commits unfair labor practices during organizing; critics argue unions use blocking charges to avoid decertification when they anticipate losing; NLRB 2020 rule: blocking charges no longer automatically delay elections; Biden NLRB 2021: modified the 2020 rule; after election: election bar prevents new petition for 12 months; contract bar: existing CBA precludes decertification for up to 3 years of contract term), rapid response election rule 2023 (Biden NLRB August 2023 "representation election procedures" rule: eliminates pre-election hearing in most cases; regional director can schedule election 21 days after petition in most cases vs. 5-6 weeks previously; the goal: reduce employer anti-union campaign time; unions strongly favored the rule; employer groups argued insufficient time to communicate with workers; the rule has faced legal challenges), employer anti-union tactics (mandatory captive audience meetings: historically legal under NLRB; NLRB General Counsel memo October 2022 declared captive audience meetings illegal as coercive under NLRA Section 8(a)(1); employer groups disputed; courts divided; discipline of organizing leaders: illegal ULP RC charge basis; promising benefits during organizing: illegal Section 8(a)(1) coercion; surveillance of organizing meetings: illegal unless union is on company property; permanent replacement of striking workers: legal in the US unlike most of the world — a major tool management used against strikes), card check and PRO Act (authorization card majority — when 50%+ of workers sign authorization cards — historically gave some unions recognition without election under Wagner Act; Taft-Hartley 1947 clarified: employer can always insist on secret ballot election regardless of card count; Employee Free Choice Act EFCA proposed mandatory card-check recognition; died in Senate 2009 despite 60-vote supermajority briefly available; Protecting the Right to Organize PRO Act 2021: passed House, failed Senate cloture 50-50), Python script downloading NLRB election data CSVs for FY2019-2024 from NLRB.gov, filtering to RC elections, computing annual win rate, total eligible voters, and count by union affiliation category, plotting the organizing surge with Starbucks and ALU elections highlighted, and cross-references to nlrb-ulp-filings, dol-osha-inspections, and dol-wage-hour-enforcement.

  103. Q3Writing

    DOL Wage and Hour enforcement deep-dive published

    Long-form article on the DOL Wage and Hour Division: WHD mandate and organization (~1,000 investigators across 50+ district offices enforcing FLSA, Davis-Bacon Act, Service Contract Act, FMLA, MSPA, and H-2A/H-2B labor standards); FLSA mechanics ($7.25/hr minimum wage unchanged since July 2009, overtime at 1.5x for hours over 40/week per workweek not pay period, salary basis test and salary threshold $684/week since 2020, three white-collar exemptions requiring both salary basis and duties test); WHISARD database (Wage and Hour Investigative Support and Reporting database — WHD's public disclosure file of all enforcement actions with findings, updated annually; key fields: case_id, trade_nm, legal_nm, city, state_cd, naics_cd, act_id (FLSA/DBA/SCA/FMLA/MSPA), violation_cnt, bw_atp_amt back wages assessed, ee_atp_cnt employees owed back wages, flg_pen civil money penalty flag, cmp_assd_amt penalty amount assessed, open_date/close_date); enforcement scale (~$200-300M back wages per year; ~200,000-300,000 workers annually; fiscal year 2022: $228M recovered for 190,000 workers; FY2023: $274M for 163,000 workers; industry concentration: agriculture/food processing, restaurants/food service, hospitality/hotel, garment/apparel manufacturing, construction); violation taxonomy (minimum wage — most common in agriculture and restaurant; overtime — most back wages in construction and security; off-the-clock work — retail and healthcare; tip theft — FLSA 203(m) as amended by Consolidated Appropriations Act 2018 prohibiting employer retention of tips except for tip credits; child labor FLSA 212 — 14-15 year olds limited hours, 16-17 prohibited from hazardous occupations; recordkeeping — prerequisite for all other violations); worker misclassification (FLSA economic reality test vs. IRS ABC test; DOL Jan 2024 final rule "Employee or Independent Contractor Classification" reinstating six-factor totality-of-circumstances approach; misclassification allows avoidance of minimum wage, overtime, benefits, workers' comp; liquidated damages equal to back wages make total recovery 2x); H-2A enforcement (DOL OFLC certifies H-2A agricultural employers; WHD enforces adverse effect wage rate AEWR by state, housing at no charge, transportation from consulate to worksite to home country, 3/4 guarantee, workers' compensation insurance; H-2A certifications grew from 60,000 in 2012 to 370,000+ in 2023; H-2A violations ~10% of back wages annually); Davis-Bacon Act (prevailing wage requirements on federal contracts >$2,000 for construction alteration or repair; DOL publishes wage determinations by county and trade classification; certified weekly payrolls required from contractors and subcontractors; debarment from future federal contracts for willful violators; helper classification controversy over whether lower-paid helpers can be used alongside journeymen); Service Contract Act (prevailing wages and fringe benefits on federal service contracts >$2,500; covers security guards, janitors, data entry operators, IT support, food service; wage determinations by locality; successor contractor must honor predecessor's collective bargaining agreement for first year); notable enforcement cases (Asplundh Tree Expert $95M 2017 settlement for overtime violations at utility contractor; Holloway Sportswear hot goods provision — WHD can block shipment of goods produced in violation of FLSA; restaurant tip pooling franchise liability cases; Walmart pharmacy off-the-clock work 2023); FLSA criminal prosecution (Section 216(a) willful violations punishable by $10,000 fine and/or 6 months imprisonment for first offense; DOJ prosecutes referrals from WHD; primarily used for egregious child labor and wage theft cases); Python script downloading WHD public enforcement ZIP, filtering to past 5 fiscal years with back-wage findings, grouping by 2-digit NAICS code, computing total back wages and workers owed per sector, and ranking sectors by total back wages with cross-references to dol-osha-inspections, dol-h2-visa-disclosures, and bls-qcew-employment.

  104. Q3Writing

    BLS PPI producer price index deep-dive published

    Long-form article on the Bureau of Labor Statistics Producer Price Index: PPI overview (measures average change over time in selling prices received by domestic producers for their output; complements CPI which measures buyer prices; published monthly since 1902 — called Wholesale Price Index until 1978; ~100,000 price quotes from ~25,000 establishments; all industries excluding farms, households, and nonprofit institutions), three indexing systems (Final Demand PPI launched Nov 2013 with data back to Dec 2009: headline measure covering prices for goods/services/construction sold for personal consumption, capital investment, government, and export; Intermediate Demand PPI: prices for goods/services used as inputs by other producers, broken into processed vs. unprocessed goods at three stages; Traditional Commodity-Based PPI: 10,000+ commodity indexes organized by type of product, the legacy system in continuous use since 1902), FD-PPI structure (FD-Goods = food + energy + core goods; FD-Services = trade services margin + transportation/warehousing + other services; FD-Construction = new construction inputs; the trade services component measures the margin between buying and selling price of wholesale and retail trade — not the price of traded goods themselves, making it a profit margin index), stage-of-processing pipeline (unprocessed goods: raw materials — crude oil, farm products, unprocessed fish; processed goods for intermediate demand: flour, diesel, steel; unprocessed goods for intermediate demand: crude energy; then finished goods for intermediate demand; then final demand; 2021-2022 supply chain inflation: pipeline pressure at every stage in sequence confirmed supply-push origin), PPI vs. CPI leading indicator relationship (PPI for final demand goods leads CPI for goods by 2-3 months; the 2021-2022 surge: PPI FD goods peaked at +22.9% YoY June 2022, approximately 3 months before corresponding CPI goods peak; not true for services where PPI trade margins diverge from CPI service costs; spread between PPI and CPI for same goods category measures retailer margin compression/expansion), key FRED series IDs (PPIFIS Final Demand total; PPIFAF Final Demand Foods; PPIFAE Final Demand Energy; PPICOR Final Demand Less Foods and Energy; WPSID61 Intermediate Demand; PPIACO All Commodities traditional series; WPSFD49207 Processed Goods for Intermediate Demand), BLS API access (api.bls.gov/publicAPI/v2/timeseries/data/ endpoint; PCU + NAICS producer industry codes for industry-specific series; registration key required for more than 25 series or 10 years of data per query), service-sector PPIs (launched 1985; airline transportation, trucking freight, physician services, hospital outpatient, portfolio management; hospital services PPI vs. CPI medical care divergence: PPI measures what insurers/Medicare pay producers while CPI measures what consumers pay for insurance — the gap is insurer and government intermediation), use in economic modeling (BEA uses PPI in GDP implicit price deflator calculation; Davis-Bacon wage determinations reference regional construction cost indexes for escalation; Fed uses PPI as leading indicator for PCE deflator, its preferred inflation measure; long-term contracts use PPI escalation clauses by commodity), notable episodes (2021-2022: Final Demand goods peaked +22.9% June 2022 driven by energy/metals/food/supply chain; April 2020: energy PPI collapsed as WTI oil went negative; 2008 commodity super-cycle: PPIACO peaked +27% July 2008 then collapsed 30% in 6 months), Python script pulling PPIFIS/PPIFAF/PPIFAE/PPICOR from BLS API for past 5 years, computing 12-month percent change for each, plotting 4-line chart with June 2022 surge annotated, and cross-references to bls-cpi-inflation, bea-gdp-accounts, and bls-qcew-employment.

  105. Q3Writing

    Census PL 94-171 redistricting data deep-dive published

    Long-form article on the Census Bureau PL 94-171 redistricting data product: PL 94-171 mandate (Public Law 94-171 of 1975 requires the Census Bureau to provide states with block-level population data needed for legislative redistricting by April 1 of the year following each decennial census; 2020 Census data delivered August 2021, slightly delayed by COVID-19 production disruptions), five data tables (P1: Race alone for total population, 71 categories including all multi-race combinations; P2: Hispanic or Latino and Not Hispanic or Latino by Race — the key table for VRA analysis, 73 categories; P3: Race for population 18 years and over — the voting-age population table; P4: Hispanic/Latino and race for 18+; H1: Occupancy Status — occupied vs. vacant housing units; summary table H1 used for apportionment count verification), geographic hierarchy (nation → region → division → state → county → county subdivision → place → census tract → block group → census block; blocks are the finest unit with ~8 million nationwide averaging ~40 people; PL 94-171 delivers all geographies including blocks; congressional districts require exact population equality hence block-level needed to minimize deviation to <1 person), one-person-one-vote case law (Reynolds v. Sims 1964: both chambers of state legislature must be apportioned by population; Wesberry v. Sanders 1964: congressional districts must have nearly equal populations; Kirkpatrick v. Preisler 1969: states must make good-faith effort to achieve exact mathematical equality in congressional districts; allowable deviations: congressional districts <1 person practical minimum vs. state legislative ±10% maximum under Brown v. Thomson 1983), 2020 apportionment results (US resident population 331,449,281 — the number used for House apportionment via Huntington-Hill method of equal proportions; seats gained: Texas +2, Colorado/Florida/Montana/North Carolina/Oregon each +1; seats lost: California/Illinois/Michigan/Ohio/Pennsylvania/West Virginia each -1, New York -1; New York's 89-person near-miss: if 89 additional NY residents had been counted the state would have kept its 27th seat rather than dropping to 26; one of the highest-stakes potential Census errors of 2020), differential privacy and TopDown Algorithm (2020 was first Census to apply formal differential privacy to published statistics; Census Bureau developed TopDown Algorithm: adds mathematical noise calibrated to privacy-loss parameter epsilon at state level (exact — required for constitutional apportionment), then progressively more noise at county/tract/block-group/block levels; noise injection means block-level racial group counts may be unreliable for small populations; controversy: small rural counties, small racial minority groups at block level may have significant noise; Census Bureau published alternative Demographic and Housing Characteristics file DHC with separate noise injection for redistricting use cases where DHC data is preferred over PL 94-171 for some applications), 63-combination race/ethnicity schema (1997 OMB race categories: White, Black/African American, AIAN, Asian, NHPI, Some Other Race — 6 categories; allowing multi-race combinations generates 2^6 - 1 = 63 race subcategories in P1; Hispanic/Latino is a separate ethnicity overlaid on all 63 race categories in P2; total P2 combinations: Hispanic/Latino alone + NH/White alone + NH/Black alone + NH/AIAN alone + NH/Asian alone + NH/NHPI alone + NH/SOR alone + 56 NH multi-race combinations = 73 categories), Census API access (base URL api.census.gov/data/2020/dec/pl; variable format P2_001N total population, P2_002N Hispanic or Latino, P2_005N NH White alone, P2_006N NH Black alone, P2_007N NH AIAN alone, P2_008N NH Asian alone; geographic filter for block-level: for=block:*&in=state:XX county:YYY tract:ZZZZZZ; state FIPS 2-digit, county 3-digit, tract 6-digit all required for block queries), Voting Rights Act Section 2 (VRA 1965 prohibits electoral practices discriminating based on race; Section 2 requires drawing majority-minority districts where the Gingles three-part test is met: minority group sufficiently large and geographically compact to form majority in a district, minority group politically cohesive, white majority votes sufficiently as a bloc to usually defeat minority-preferred candidate; Milligan v. Allen 2023 reaffirmed Section 2 applies to congressional redistricting; Alabama required to draw second majority-Black congressional district), partisan gerrymandering and Rucho (Rucho v. Common Cause 2019: federal courts lack jurisdiction over purely partisan gerrymandering claims; only racial gerrymandering, VRA violations, and First Amendment challenges are federal questions; political science efficiency gap and wasted vote metrics developed as state-law standards; some state supreme courts have struck partisan maps under state constitutional provisions), Python script using Census API to download P2 table for all tracts in a specified state, computing NH White/NH Black/Hispanic/other shares per tract, aggregating to county level, and cross-references to census-acs-data, census-county-business-patterns, and hmda-mortgage-disclosure.

  106. Q3Writing

    Treasury TIC capital flows deep-dive published

    Long-form article on the Treasury International Capital system: TIC system overview (established 1974, monthly and annual surveys, primary federal source on cross-border flows of securities between US and foreign residents), four main TIC reports (monthly major foreign holders of US Treasury securities — the most-watched table showing country-level holdings updated with ~6-week lag; TIC-S monthly transaction data covering gross purchases and sales of long-term securities by foreigners and by Americans in foreign markets; TIC-B monthly short-term position data covering banking and non-banking claims and liabilities; SHCA annual survey of foreign holdings of US securities providing the most comprehensive snapshot; SHLA annual survey of US holdings of foreign securities), top foreign holders as of recent data (Japan ~$1.1T largest official holder; China peak ~$1.3T in 2013 declined to ~$800B by 2024 amid geopolitical tension; United Kingdom ~$700B — primarily Euroclear/Clearstream custodian accounts not UK beneficial owners; Luxembourg and Cayman Islands each $400-500B reflecting offshore custodian and fund domicile effects), custodian country problem (TIC reports holdings by custodian country — the country where the custodian or clearing institution holds the securities — not beneficial owner country; this systematically overstates UK/Luxembourg/Belgium/Cayman holdings and understates China/Japan/Middle East; SHCA survey partially corrects this with beneficial owner data but only annually; SWIFT and Euroclear routing means a Chinese sovereign wealth fund holding Treasuries through a London custodian appears as UK in monthly data), Belgium/Euroclear anomaly (Belgium showed implausible $200-400B in Treasury holdings 2012-2015 — confirmed to be Euroclear accounts for multiple countries using Belgian custodian; an alert reading of custodian data would confuse Belgium as a sovereign accumulator), China "financial nuclear option" analysis (repeated claims that China could weaponize $800B+ in Treasury holdings; analysis: a rapid sale would primarily harm China through price impact on its own portfolio; Treasuries are the deepest most liquid market globally — the Fed could absorb sales through QE; no historical precedent for sovereign weaponization of reserve holdings; more realistic concern is the announcement effect and coordination with allied holders; 2022 Russia sanctions showed reserves can be immobilized — China has diversified toward gold since 2014), sudden stop risk and 2008 flight-to-safety (TIC data showed net foreign inflows into Treasuries during 2008 crisis — flight to safety from foreign holders; "sudden stop" risk applies more to emerging market funding but any disruption in Treasury market function would trigger Fed backstop; SHCA vs. monthly data reconciliation important for understanding positioning), FRED series HQFLUSQ and Treasury.gov major holders table, Python script downloading the official Treasury Excel workbook parsing country rows and plotting Japan/China/UK holdings over time, and cross-references to treasury-ofac-sanctions, fed-h8-bank-balance-sheets, and bea-gdp-accounts.

  107. Q3Writing

    CDC WONDER mortality database deep-dive published

    Long-form article on the CDC WONDER mortality query system: system overview (WONDER = Wide-ranging ONline Data for Epidemiologic Research; CDC-maintained web interface and API providing access to US death certificate data going back to 1999 for ICD-10-coded causes, 1979-1998 for ICD-9; underpins virtually all US mortality research; data sourced from state vital statistics offices who receive death certificates from funeral directors, attending physicians, and medical examiners), death certificate pipeline (physician/ME completes cause of death section with underlying cause — the disease or injury that initiated the chain of events leading to death — plus up to 20 contributing/multiple causes; cause of death coded to ICD-10 by NCHS nosologists; coded records transmitted to CDC NCHS; published with ~11 month lag for annual data; monthly provisional data available via VSRR Vital Statistics Rapid Release with ~12-week lag), ICD-10 code taxonomy (C codes: malignant neoplasms/cancers; I codes: circulatory system diseases including I21 acute MI, I63 stroke, I11 hypertensive heart disease; F codes: mental/behavioral disorders including F10-F19 substance use; G codes: nervous system including G30 Alzheimer; J codes: respiratory including J18 pneumonia, J44 COPD; K codes: digestive; V-Y codes: external causes of morbidity and mortality — the crucial category for injuries/overdoses/violence/accidents; specific drug codes within external causes: T36-T50 poisoning by drug type, T40.0 opium T40.1 heroin T40.2 other opioids natural/semi-synthetic T40.3 methadone T40.4 synthetic opioids other fentanyl T40.5 cocaine T43.6 psychostimulants — critical for wave-specific opioid analysis), WONDER query interface and suppression rules (WONDER web interface allows queries by year, age group, race/ethnicity, sex, state/county, cause of death ICD code; counts below 10 suppressed to protect privacy — critical limitation for small counties and rare causes; API also available for programmatic access; compressed mortality file CMF available for bulk download from CDC FTP), age-adjusted rates and 2000 US Standard Population (death rates are age-adjusted to the year 2000 US Standard Population of 274.6M to allow cross-time and cross-population comparisons; crude rates dominated by demographic aging; age-adjusted rates remove confounding; the 2000 standard is fixed — all CDC publications use same reference), three-wave opioid crisis in ICD-10 data (wave 1: T40.2-T40.3 natural/semi-synthetic opioid deaths begin rising from ~5,000/year 1999 to peak ~17,000/year 2010-2011 — OxyContin and Vicodin era; wave 2: T40.1 heroin deaths rise from ~3,000/year 2010 to peak ~15,000/year 2016-2017 as prescription opioid crackdowns shifted users to heroin; wave 3: T40.4 synthetic opioid primarily fentanyl deaths rise from ~5,000/year 2016 to ~73,000/year 2022 — fentanyl contamination of heroin and illicit pill supply; total drug overdose deaths: ~16,000/year 1999, ~70,000/year 2017, ~107,000/year 2022; the fentanyl wave is orders of magnitude larger than the prescription opioid wave and crosses demographics previously spared), Case-Deaton deaths of despair research (Princeton economists Anne Case and Angus Deaton published 2015 paper showing rising all-cause mortality among middle-aged non-Hispanic white Americans without college degrees — unique among wealthy countries; excess deaths concentrated in suicide V-Y codes, T40-T42 drug poisoning, K70-K73 alcoholic liver disease — the three "deaths of despair"; subsequent research confirmed geographic concentration in deindustrializing regions; Deaton 2015 Nobel Prize elevated dataset visibility; CDC WONDER is the primary data source for deaths-of-despair replication), COVID-19 mortality and U07.1 code (ICD-10 emergency code U07.1 COVID-19 Virus Identified added 2020; U07.2 COVID-19 Virus Not Identified for clinical diagnoses without positive test — less commonly used; COVID deaths: ~350,000 2020, ~450,000 2021, ~244,000 2022; excess mortality analysis comparing observed all-cause deaths to expected based on prior trends suggests COVID deaths undercounted by 30-50% in official U07.1 count — capturing deaths from overwhelmed healthcare system and untreated conditions), NCHS Multiple Cause of Death public use microdata (fixed-width ASCII file published annually via NBER and CDC FTP; one record per death with state/county, age at death, sex, race/Hispanic origin, education, marital status, place of death, underlying cause, up to 20 multiple causes, and number of entity axis conditions; ~3.1M records/year), Python script parsing NCHS MCOD fixed-width file, extracting overdose deaths by drug-specific T40 subcodes, computing state-level age-adjusted rates using 2000 standard population weights, plotting three-wave fentanyl transition chart, and cross-references to cdc-brfss-behavioral-risk, cms-hospital-quality, and medicare-part-d-data.

  108. Q3Writing

    BLS JOLTS job openings deep-dive published

    Long-form article on the Bureau of Labor Statistics Job Openings and Labor Turnover Survey: JOLTS origin and design (launched December 2000 with data back to December 2000; monthly survey of ~21,000 business establishments across all nonfarm private industries and government; sample rotates over three-year periods; four core metrics published monthly: job openings rate, hires rate, total separations rate, and separations broken into quits/layoffs-and-discharges/other; all expressed as rates per 100 employees for industry/size/region comparisons), four core metrics in depth (job openings: number of positions open on last business day of month that are available for hire, actively being recruited for, and starting within 30 days — industry breakdown by 2-digit NAICS; hires: all additions to payroll during the month including recalls, new hires, and transfers from other locations; total separations = quits + layoffs/discharges + other separations; quits: voluntary separations initiated by employee — the quits rate is the highest-signal metric as it measures worker confidence in labor market alternatives; layoffs/discharges: involuntary separations initiated by employer including layoffs, firings, and downsizing), quit rate as labor market thermometer (the JOLTS quits rate is the best single indicator of labor market tightness: workers quit when they have confidence in finding better jobs; quits rate averaged ~1.9-2.1%/month 2014-2019; collapsed to 1.4% in April 2020 COVID lockdown; surged to record 3.0% in April 2022 — the "Great Resignation" peak; the unprecedented 3.0% rate across all industries including service sectors indicated labor market generational tightness; Fed used JOLTS quits as key indicator in aggressive 525bp rate hike campaign 2022-2023; quits decelerated to ~2.3% by late 2023 as labor market rebalanced), Beveridge Curve rightward shift (standard economics: job openings and unemployment should trade off along a stable Beveridge Curve; 2021-2022 showed an unprecedented rightward shift — historically high openings at low unemployment, meaning the economy needed more openings to achieve the same unemployment rate; interpretation: elevated job openings reflecting labor hoarding, industry restructuring demands, geographic mismatch, skills gaps, and early retirement; the rightward shift implied NAIRU higher than pre-pandemic suggesting more restrictive monetary policy needed to return to balance), labor hoarding phenomenon (2022-2023 saw employer reluctance to lay off workers despite slowing business due to memories of 2021 talent shortages; layoffs remained near historical lows even as hiring slowed; resulted in productivity slowdown — hours worked rose faster than output; JOLTS data showed the combination: openings declining, hires declining, quits declining, but layoffs not rising — classic labor hoarding signal that was resolved by 2024 normalization), JOLTS vs. alternative labor market indicators (Indeed/LinkedIn job postings: higher frequency daily vs. monthly JOLTS lag, but not normalized to workforce size, different job definitions, platform coverage varies; Conference Board Help Wanted Online: 16,000+ online job boards; all alternative sources showed same directional 2022 peak and 2023 deceleration confirming JOLTS signal; JOLTS advantage: official BLS imprimatur, historical consistency, industry breakdown, unit consistency with CPS and CES), FRED series IDs (JTSJOL job openings level; JTSHIL hires level; JTSQUL quits level; JTSLAL layoffs level; JTSQUR quits rate — the key policy signal; JTSLR layoffs rate; all seasonally adjusted; all available at FRED), Python script using fredapi pulling JTSJOL and UNRATE series, plotting Beveridge Curve scatter with date annotation showing 2020 collapse, 2021 recovery, 2022 rightward shift, and 2023-2024 normalization, and cross-references to bls-qcew-employment, bls-oews-occupational-wages, and fed-h8-bank-balance-sheets.

  109. Q3Writing

    NHTSA FARS traffic fatality census deep-dive published

    Long-form article on the NHTSA Fatality Analysis Reporting System: FARS overview (established 1975, annual census — not a sample — of all fatal traffic crashes occurring on US public roads; covers all 50 states, DC, Puerto Rico; ~38,000-43,000 fatal crashes per year involving 40,000-45,000 fatalities; data collected by state coordinators from police accident reports, death certificates, hospital medical records, and toxicology reports; publicly available through NHTSA CDAN Query Tool and FTP bulk download), three-table data structure (Accident table: one record per crash — state, county, city, route type, latitude/longitude, crash date/time, atmospheric conditions, road geometry, light conditions, harmful event, manner of collision, school bus involvement, rail grade crossing, speed limit, fatalities count, persons injured, drunk driving flag; Vehicle table: one record per vehicle involved — registration state, vehicle year/make/model/body type, special use, travel speed, sequence of events up to four codes, hit-and-run, fire, jackknife, vehicle role; Person table: one record per person — age, sex, person type driver/passenger/pedestrian/cyclist, seating position, restraint use, airbag deployment, ejection, injury severity, alcohol BAC, alcohol test status, drug presence, police-reported alcohol impairment), key variable codes and values (HARM_EV harmful event first: 14=motor vehicle in transport, 8=tree, 3=embankment, 17=guardrail face, 54=pedalcycle, 5=pedestrian — highest HARM_EV values reveal what objects or entities are struck; MAN_COLL manner of collision: 0=single vehicle, 1=rear-end, 6=angle, 2=front-to-front, 3=head-on; LGT_COND light condition: 1=daylight, 2=dark-lighted, 3=dark-not-lighted, 4=dawn/dusk — nearly 50% of fatalities occur in dark conditions despite most miles driven in daylight; DRUNK_DR: count of drunk drivers in crash; BAC fields: BAC_RESULT test result in 0.001 increments, BAC_STATUS test status — imputation used when testing not performed using Bayesian model; injury severity INJSEV: 4=fatal, 3=incapacitating, 2=non-incapacitating, 1=possible, 0=no injury), historical trends and COVID anomaly (annual fatalities: ~55,000/year early 1970s declining to 37,000-43,000 range 2005-2019 — reflecting seatbelts, airbags, safer infrastructure, lower blood alcohol limits; 2020 COVID anomaly: vehicle miles traveled declined 13% from 2019 but fatalities rose 7% — fatality rate per 100M miles spiked 24% to 1.37, highest since 2007; explanation: empty roads encouraging speeding, impaired driving, seatbelt non-use; the VMT-fatality correlation that held for decades broke decisively in 2020), alcohol-impaired driving trend (NHTSA defines alcohol-impaired driving as BAC 0.08+ in any driver; alcohol-impaired fatalities: ~20,000-25,000/year in early 1980s declining to ~10,000-11,000/year by 2011 and holding at ~10,500/year 2015-2022; the 50%+ decline attributable to: 0.08 per se laws nationwide by 2004, sobriety checkpoints, MADD campaigns, enhanced DUI enforcement, rideshare availability 2012+; despite plateau, alcohol-impaired fatalities remain ~28% of all traffic deaths — the single largest category), pedestrian fatality rise (pedestrian fatalities: ~4,300/year 2010-2012 declining, then rising to 7,500/year 2022 — 74% increase in a decade; contrasts with overall traffic fatality trend; explanations: smartphone distraction, larger vehicles with worse pedestrian visibility, urban miles driven mix shifting toward pedestrian-heavy areas, opioid crisis increasing pedestrian risk-taking; pedestrian fatality rate per 10,000 people varies dramatically by state — New Mexico, Mississippi, Louisiana, Florida highest; fatality analysis critical for NHTSA grant allocation under FAST Act and IIJA infrastructure law Safe Streets programs), CRSS companion dataset (Crash Report Sampling System replaced NASS GES in 2016; national probability sample of police-reported crashes regardless of injury severity; ~50,000 crashes/year sample representing ~6M annual crashes; same data elements as FARS but covers minor injury and property-damage-only crashes; essential for computing rates since FARS only has fatals), CDAN NHTSA Query Tool for summary tables plus bulk SAS/CSV/DBF data files available from NHTSA FTP, Python script joining FARS Accident/Person tables, filtering to pedestrian fatalities by state, computing per-capita rates using Census ACS population estimates, ranking states, and cross-references to fbi-nics-background-checks, nfip-flood-insurance, and bls-qcew-employment.

  110. Q3Writing

    CMS Medicare Advantage plan data deep-dive published

    Long-form article on CMS Medicare Advantage data: program origin (Balanced Budget Act 1997, expanded Medicare+Choice to Medicare Advantage under MMA 2003), enrollment milestone (51% of 66M Medicare beneficiaries as of 2024 = ~33M in MA, first majority crossover), CMS data ecosystem (Landscape files: annual plan-level benefits/premiums/drug formularies/star ratings by county; monthly enrollment files by contract/plan/state/county; Contract and Enrollment data quarterly snapshots; MA Ratebook with county-level benchmark rates; Chronic Conditions prevalence data for MA enrollees), Star Ratings system (1-5 stars, 40+ measures across 5 domains: staying healthy screening/vaccines, managing chronic conditions, member experience CAHPS survey, member complaints and access, drug pricing Part D; Quality Bonus Payment QBP mechanism adds 5% to county benchmark for 4+ star plans; 72% of MA beneficiaries in 4+ star plans 2024; stars drive supplemental benefit generosity since plans below benchmark receive a rebate they must spend on dental/vision/hearing/OTC/premium reduction), benchmark-bid-rebate payment mechanics (CMS sets county-level benchmarks as blend of MA payment rates and FFS costs; plans submit annual bids for estimated cost to serve average beneficiary; plans bidding below benchmark receive a rebate percentage of the difference which must fund supplemental benefits; below-benchmark bids therefore fund the dental/vision/hearing benefits that differentiate MA from traditional Medicare), HCC risk adjustment model (CMS-HCC Hierarchical Condition Category model adjusts per-capita payments for beneficiary health status — plans receive higher capitation for sicker enrollees; all diagnoses from the prior year fed into the model; upcoding controversy: MA plans have financial incentive to document as many diagnoses as possible to inflate risk scores; OIG estimates $10-30B+ in excess payments attributable to risk score manipulation annually; GAO has repeatedly flagged this issue since 2012; CMS risk score audits RADV Risk Adjustment Data Validation have recovered only a fraction of estimated overpayments due to methodology disputes), market concentration (UHC UnitedHealthcare ~29%, Humana ~19%, CVS/Aetna ~12%, BCBS affiliates ~15% combined, Centene ~5%, Cigna ~4% — top 3 control ~60% of all MA enrollees; market consolidated dramatically from fragmented regional plans 2003-2010 to national insurer dominance 2015-2024), prior authorization controversy (MA plans can require prior authorization for covered services unlike traditional Medicare which generally does not; OIG 2022 report: 13% of prior authorization denials were for services that met coverage criteria — plan-level variation from 0% to 45% in denial rates; KFF analysis: MA plans deny 6-7% of PA requests vs. near-zero for same services in traditional Medicare; CMS issued new prior authorization transparency rules 2023 requiring quarterly public reporting of PA request volumes, approvals, denials by plan), Python snippet downloading CMS monthly enrollment file, aggregating by contract/plan/state, computing parent organization market share by state, identifying states where single insurer controls 50%+ of MA market, and cross-references to CMS Medicare Part D prescribing data, CMS Open Payments physician payments, and CMS Hospital Quality metrics.

  111. Q3Writing

    IRS Statistics of Income deep-dive published

    Long-form article on the IRS Statistics of Income program: history (published annually since 1916, sourced from administrative tax return data — the definitive official record of US taxable income and tax liability as actually filed, distinct from Census survey-based income estimates), individual income tax statistics (1040 data published annually with ~2-year lag; tabulations by AGI Adjusted Gross Income class from $1-$5K through $10M+ in 22 brackets; for each bracket: number of returns, AGI, wages and salaries, taxable interest, qualified dividends, business income, capital gains realized, IRA distributions, Social Security benefits, total income, adjustments to income, AGI, standard vs. itemized deductions, taxable income, tax before credits, AMT, child tax credit, EITC, education credits, total income tax after credits; state-level SOI available as supplement), income concentration findings (top 1% AGI threshold ~$700K, top 0.1% ~$3.3M, top 0.01% ~$15M in recent data; top 1% receives ~20% of all AGI and pays ~40% of all federal income taxes; top 1% receives ~70% of all realized capital gains — making the capital gains preferential rate 20% + 3.8% NIIT vs. 37% ordinary income rate the central mechanism of income-at-top policy debate; Piketty-Saez UC Berkeley World Inequality Database uses SOI data as the primary source for US top income share time series since 1913), EITC and refundable credit distribution (~25M returns claiming EITC, ~$65B in credits annually; SOI tables show EITC amount by number of qualifying children 0/1/2/3+, by AGI range, documenting the phase-in/plateau/phase-out structure and the benefit cliff; EITC is the largest federal program for working families with income below the refundability threshold), High-Income Tax Returns publication (separate IRS SOI annual report specifically on $200K+ and $1M+ returns; median effective federal income tax rate for $1M+ returns fluctuates 25-30% over 2010-2024 vs. 37% statutory top marginal rate; gap explained by: capital gains and qualified dividends taxed at 20% max plus 3.8% NIIT; itemized deductions for charitable contributions mortgage interest and state/local taxes; business losses pass-through; percentage depletion; other preferences), estate tax statistics (annual tabulations of Form 706 estate tax returns filed; gross estate composition: real estate/stocks/bonds/closely held business interests/retirement assets/life insurance; estate tax paid; the stepped-up basis provision: assets inherited receive a basis step-up to fair market value at death eliminating capital gains tax on all appreciation during decedent lifetime — estimated $100B+ in annual foregone capital gains revenue; estate tax raised ~$18-25B/year despite ~$150-200B gross estates above exemption — the stepped-up basis is the larger revenue issue), corporate income tax SOI (1120 filing statistics: number of returns/total receipts/business receipts/net income-deficit/taxable income/income tax before credits/total tax credits including R&D/foreign/general business; shows effective corporate tax rate decline from ~22-25% actual 2005-2017 to ~12-15% actual 2018-2022 post-TCJA; GILTI global intangible low-taxed income and BEAT base erosion anti-abuse tax added 2018 partially offset rate reduction), IRS Public Use File (de-identified microdata sample ~200,000 records, available through IRS directly or via NBER for approved researchers; income/deductions/credits fields with topcoded sensitive values; essential for microsimulation modeling of tax policy changes using TAXSIM model from NBER), Python snippet downloading SOI individual complete report Excel, parsing AGI class table, computing income share/tax share/effective rate for each bracket, plotting progressive tax structure chart, and cross-references to BEA GDP accounts for national income context, IRS Form 990 for nonprofit sector, and SEC EDGAR for corporate income and deductions.

  112. Q3Writing

    OSHA workplace inspection and enforcement data deep-dive published

    Long-form article on the OSHA enforcement database: scope (OSHA created by Occupational Safety and Health Act 1970, effective April 28 1971 — Worker Memorial Day, covers ~130M US workers at ~10M workplaces, federal OSHA has jurisdiction for private-sector workers in states without State Plans; 28 states and territories operate State Plans approved by federal OSHA with at least as stringent standards including California Cal/OSHA, Michigan MIOSHA, Washington L&I, North Carolina, Kentucky, Tennessee — state plan inspections collected separately not in federal OSHA database), inspection types (unprogrammed: worker complaints the most common trigger — OSHA prioritizes by severity from phone/fax to on-site; referrals from other agencies or media reports; related-injury follow-up; fatality/catastrophe mandatory on-site for any work-related death or 3+ inpatient hospitalizations within 24 hours; programmed: National Emphasis Programs NEP targeting specific hazardous industries by planned inspection schedule — current NEPs include combustible dust, heat illness, tree care operations, amputations, process safety management for petroleum refining; follow-up inspections verifying abatement of prior violations), citation taxonomy with 2024 penalty maximums (Willful: employer knowingly violated or showed plain indifference to OSH Act requirements — maximum $156,259 per violation, most serious category; Repeat: cited for substantially similar condition within 5 years of final order — same $156,259 max; Serious: substantial probability of death or serious physical harm — maximum $15,625 per item; Other-than-Serious: violation has direct relationship to safety but not likely to cause death or serious injury — $15,625 max; De Minimis: technical violation not materially affecting safety — no penalty; Failure-to-Abate: citation remains uncorrected after abatement date — up to $15,625 per day per outstanding violation), database schema (activity_nr inspection ID, estab_name, site_address/city/state/zip, naics_code, insp_type L/E/C/J/R/S/U codes, open_date/close_date, union_status, total_current_penalty after informal conference reduction, total_initial_penalty before reduction, nr_in_estab employee count; violation table: citation_id/issuance_date/hazsub hazardous substance/violtype/gravity/penalty/standard the 29 CFR section violated), top-cited standards (fall protection 29 CFR 1926.502 construction — #1 consistently for 12+ years; Hazard Communication 1910.1200 chemical labeling and SDS; Scaffolding 1926.451; Respiratory protection 1910.134; Control of hazardous energy Lockout/Tagout 1910.147; Powered industrial trucks forklifts 1910.178; Machine guarding 1910.212; Electrical wiring 1910.305; fall protection is also #1 in the OSHA Fatal Four — falls/struck-by/electrocution/caught-in-between account for ~60% of construction fatalities), major incidents in data (Imperial Sugar refinery 2008 Port Wentworth GA — combustible sugar dust explosion 14 killed, $8.78M proposed penalty highest at time, triggered OSHA combustible dust NEP; Deepwater Horizon 2010 BP — PSM process safety management violations in onshore facilities; Amazon warehouse injury controversy — third-party studies using OSHA 300 log data and OSHA inspection records showed Amazon recordable injury rates 6.6/100 workers vs. industry average 3.3/100 in 2020, spurred political pressure leading to targeted OSHA inspections), OSHA Severe Injury Reporting System (SIRS: since January 1 2015 all employers must report to OSHA within 8 hours any work-related fatality; within 24 hours any work-related inpatient hospitalization/amputation/loss of eye — creates near-real-time stream of severe incidents searchable on OSHA website), data access (enforcedata.dol.gov OSHA enforcement search, data.dol.gov API, bulk data downloads from OSHA website), Python snippet downloading OSHA inspection and violation CSV data, filtering to inspections closed in past 3 years with penalty >0, groupby NAICS 3-digit sector, computing total and average penalty per inspection by sector, identifying 15 most-penalized industries, and cross-references to BLS QCEW for workforce size context, DOL H-2 visa disclosures for guest worker enforcement, and CFPB enforcement actions for federal enforcement comparison.

  113. Q3Writing

    HMDA mortgage disclosure deep-dive published

    Long-form article on the Home Mortgage Disclosure Act dataset: HMDA 1975 origin and Dodd-Frank 2010 transfer to CFPB administration, coverage (~5,000+ financial institutions, ~9M+ loan applications per year, ~80-85% of all US mortgage originations), full post-2018 enhanced field schema (LEI institution identifier, loan type conventional/FHA/VA/USDA, loan purpose home purchase/refinance/cash-out/improvement, action taken code 1=originated/2=approved-not-accepted/3=denied/4=withdrawn/5=file-closed-incomplete/6=purchased/7=preapproval-denied/8=preapproval-approved-not-accepted, state/county/census tract property location, applicant race with 5 OMB categories and subcategories, ethnicity, sex, age ranges, income in $1,000s, co-applicant fields, rate spread APR minus APOR for covered high-priced loans, HOEPA status, lien status, purchaser type 1=Fannie/2=Freddie/3=Farmer Mac/4=Ginnie/5=commercial bank/6=other, credit score model, combined loan-to-value CLTV, debt-to-income DTI ratio, automated underwriting system AUS results — Desktop Underwriter or Loan Prospector), denial reason codes 1-9 (1=DTI, 2=employment history, 3=credit history, 4=collateral, 5=insufficient cash, 6=unverifiable information, 7=incomplete application, 8=mortgage insurance denied, 9=other — analyzing denial reasons by race reveals differential treatment patterns), redlining investigation methodology (CFPB and DOJ map denial rates by census tract by applicant race, compare minority vs. non-minority application rates in minority-majority MSA areas, geographic clustering of denials, statistical disparity analysis; landmark cases: Trustmark National Bank $5M 2021 first DOJ redlining case in a decade, Cadence Bank $8.5M 2023, City National Bank of New Jersey $31M 2023 largest HMDA redlining settlement; academic research confirming persistence of 1930s HOLC redline map patterns in modern HMDA denial geographies), CFPB HMDA Explorer and HMDA Platform API (ffiec.cfpb.gov/hmda, API at ffiec.cfpb.gov/v2/data-browser-api/view/csv with year/state/county/MSA/institution/action/loan-type/race filters, public LAR flat file annual snapshot), pre-2018 vs. post-2018 structural break (2015 Final Rule dramatically expanded fields including credit score/DTI/CLTV/LTV/AUS/pricing data, effective 2018 filing year — pre-2018 data has longer historical series but fewer variables), CRA Community Reinvestment Act connection (federal bank regulators use HMDA in CRA examinations assessing whether banks meet LMI census tract credit needs, CRA ratings Outstanding/Satisfactory/Needs-to-Improve/Substantial-Noncompliance affect merger approval, HMDA is the lending test evidence base), Python snippet downloading HMDA modified LAR for a given year/state, filtering to conventional home purchase applications, computing approval and denial rates by applicant race for each county, calculating Black-to-White denial rate disparity ratio, identifying 10 counties with the largest racial gap, and cross-references to CFPB enforcement actions for redlining cases, FHFA House Price Index for mortgage market context, and HUD LIHTC for affordable housing complement.

  114. Q3Writing

    Census ACS American Community Survey deep-dive published

    Long-form article on the Census Bureau American Community Survey: replacement of decennial long form (2010 Census went short-form only — ACS took over continuous demographic measurement), survey operations (3.5 million addresses mailed questionnaires per year in monthly waves via the Census Master Address File, ~2.3% annual sample of US housing units, combined mail/internet/CATI/CAPI response modes), coverage (social characteristics: educational attainment, school enrollment, disability, veteran status, grandparents, place of birth/citizenship/year of entry, language at home, ancestry; economic characteristics: employment status, occupation SOC codes, industry NAICS codes, class of worker, wage and salary income, self-employment income, interest/dividends/rent income, Social Security income, retirement income, SNAP/public assistance income, poverty status using SPM and OPM; housing characteristics: tenure own vs. rent, monthly housing costs with gross rent and owner costs, number of rooms/bedrooms, plumbing/kitchen facilities, year structure built, vehicles available, heating fuel, year householder moved in; demographic: age/sex/race 5 OMB categories/Hispanic origin/two-or-more races, household relationship/type/size), 1-year vs. 5-year product distinction (1-year ACS: calendar year estimate published September following year, covers geographies with 65,000+ population — all states/large counties/large cities/many congressional districts; 5-year ACS: published December, pools 5 calendar years of data for all Census geographies including census tracts ~4,000 people and block groups ~1,500 people — the smallest ACS geography; 5-year averaging means estimates span a range like 2019-2023 and should not be treated as point-in-time; margins of error are smaller due to larger pooled sample; trend analysis requires non-overlapping periods), geography hierarchy (FIPS codes: nation → region → division → state 2-digit → county 3-digit → county subdivision → place → census tract 6-digit → block group 1-digit → block — blocks only in decennial; PUMAs Public Use Microdata Areas of 100,000+ population used in PUMS microdata files), margin of error and statistical reliability (90% confidence intervals published with every estimate; coefficient of variation CV = MOE/1.645/estimate; Census Bureau recommends caution for CV >20%, treating CV >40% as unreliable; MOE propagation for derived estimates using Census Bureau formula for ratios and differences; small population subgroups at small geographies often unreliable), Census Bureau Data API (api.census.gov/data/{year}/acs/acs5 for 5-year, acs/acs1 for 1-year; variables follow table/line/suffix pattern: B=base detailed table, S=subject table aggregated, DP=data profile, C=collapsed table; example B19013_001E = table B19013 Median Household Income line 001 Estimate, M suffix for Margin of Error; for= and in= geographic parameters; Census Reporter censusreporter.org as variable lookup tool; 25-variable limit per API call requiring pagination), key research tables (B19013 median household income, B17001 poverty status and rate, B25064 median gross rent, B25003 tenure own/rent, B15003 educational attainment detail, B03003 Hispanic or Latino origin, B02001 race, B08301 means of transportation to work commute mode, S1901 income summary, B27001 health insurance coverage by age/sex, DP03 selected economic characteristics data profile), Python snippet using Census API to pull B19013_001E median income and B25064_001E median gross rent for all census tracts in Ohio from 5-year ACS, computing rent-to-income burden ratio, identifying 20 most cost-burdened tracts, handling -666666666 missing value sentinel, outputting with county and tract name lookups, formula grant applications (HUD CDBG, Head Start, Title I school funding, Medicaid FMAP all use ACS income data for allocation), 2020 ACS disruption (COVID pandemic reduced response rates in some areas, Census Bureau applied adjustment procedures), 2020 decennial Census race question redesign creating discontinuity in MENA Middle Eastern North African category and multiracial coding affecting historical comparability, and cross-references to HUD LIHTC affordable housing, Census County Business Patterns establishment data, and CDC BRFSS for demographic health analysis.

  115. Q3Writing

    BLS Consumer Price Index deep-dive published

    Long-form article on the BLS Consumer Price Index: history (BLS has published a CPI since 1913, current methodology continuously updated, base period 1982-84=100 making current index ~310-320), three CPI variants (CPI-U urban consumers covering ~93% of US population including all city residents regardless of income — the headline inflation number; CPI-W urban wage earners and clerical workers covering ~29% — the Social Security COLA adjustment index; Chained CPI-U C-CPI-U accounting for consumer substitution behavior when prices change — uses a Tornqvist superlative index formula, averages 0.25-0.30 percentage points below CPI-U per year, used in federal income tax bracket indexing since 2018 TCJA), basket construction (Consumer Expenditure Survey CE data from ~30,000 consumer units surveyed on spending, supplemented by Point of Purchase Survey identifying where items bought; weights updated annually since 2002 — biennial updating adopted 2002; approximate 2022-2024 expenditure weights: housing/shelter 34.8%, transportation 15.7%, food 14.3%, medical care 7.1%, education and communication 6.9%, recreation 5.3%, apparel 2.6%, other 13.3%), Owners Equivalent Rent methodology (OER represents 24-26% of total CPI alone, the largest single component; measures what homeowners would pay to rent their own home using a rental equivalence approach rather than actual home prices; surveyed from renters in the same neighborhood; the 12-18 month lag between actual market rent changes and OER CPI response created the shelter inflation persistence controversy 2022-2023 — Zillow observed rent index peaked June 2022 but OER CPI peaked April 2023), CPI-U vs. PCE deflator comparison (Fed targets 2% inflation using PCE Personal Consumption Expenditures price index not CPI; PCE differs: broader scope includes third-party purchases like employer/Medicare healthcare spending, uses chain-weights that update more frequently reflecting substitution, shelter weight smaller ~16% vs. CPI 34%, PCE typically 0.25-0.50pp below CPI-U; both published by separate agencies — BLS publishes CPI, BEA publishes PCE), core CPI (headline minus food and energy — the policy-preferred signal for underlying inflation trend; food and energy are excluded because they are volatile and mean-reverting, core better represents persistent inflation; core CPI and headline diverge during commodity price shocks and then reconverge), 2021-2023 inflation episode (COVID supply chain disruptions mid-2021 triggered first inflation surge; March 2021 ARPA $1.9T stimulus added demand; CPI-U peaked June 2022 at 9.1% YoY — highest since December 1981; drivers: used car prices +26% YoY peak February 2022 due to chip shortage/rental fleet liquidation, gasoline +49% June 2022 due to Russia-Ukraine war, food at home +13% June 2022; shelter inflation lagged with OER peaking April 2023 at 8.1%; Fed raised fed funds rate 525bp March 2022-July 2023 fastest since 1980s; core services ex-shelter proved stickiest and remained elevated into 2024-2025), Social Security COLA and federal spending indexation ($1.4T+ federal spending adjusts via CPI annually: Social Security COLA uses CPI-W Q3-to-Q3 — 2023 COLA 8.7% largest since 1981, 2024 COLA 3.2%, 2025 COLA 2.5%; CSRS federal civil service retirement CPI-W; federal income tax brackets CPI-U / Chained CPI-U since TCJA 2018; TIPS Treasury Inflation-Protected Securities coupon adjusts to CPI-U; $1.6T TIPS market directly linked), BLS API and FRED access (BLS API v2 at api.bls.gov/publicAPI/v2/timeseries/data/, free key for higher rate limits; key FRED series: CPIAUCSL CPI-U seasonally adjusted, CPIAUCNS not seasonally adjusted, CPILFESL core CPI ex food and energy SA, CUSR0000SAH1 shelter CPI, CPIENGSL energy, CPIFABSL food at home, CUSR0000SEHA rent of primary residence, CUSR0000SEHC OER), Python snippet using BLS API to pull monthly CPI-U/core/shelter/energy 10-year history, compute 12-month percent change for each, plot four-series chart of the 2021-2023 inflation episode with peak annotations, and cross-references to BEA GDP accounts for PCE deflator context, BLS QCEW for wage-price dynamics, and Federal Reserve H.8 for monetary policy response.

  116. Q3Writing

    CDC BRFSS behavioral risk survey deep-dive published

    Long-form article on the CDC Behavioral Risk Factor Surveillance System: founding and scope (established 1984 by CDC, conducted continuously in all 50 states plus DC and US territories, ~450,000 adult interviews per year making it the largest health survey in the world by sample size, state health departments conduct the interviews with CDC providing methodology/questionnaire/repository), survey design evolution (1984-2010 landline-only random-digit dialing RDD, 2011 cell phone sampling frame added to address declining landline coverage — created a trend discontinuity in some series because cell-only adults have different health profiles than landline adults, current combined LLCP landline-cellphone file is the primary analysis file), questionnaire structure (Core module: asked every state every year — self-rated general health, healthy days past 30 days for physical and mental health, health insurance coverage, personal doctor access, cost barriers to care, aerobic activity and muscle-strengthening exercise, fruit/vegetable consumption, tobacco use current/former/never, heavy alcohol use and binge drinking, HIV testing ever, seatbelt use always/nearly always; Optional modules: states choose from sleep, diabetes, COPD, depression PHQ-8, sexual behavior, oral health, cognitive decline, falls prevention — adds depth in states that opt in; State-added questions: individual states may add policy-relevant local questions), key national findings (obesity BMI≥30 secular trend from ~14% in 1984 to ~36% by 2022 — the most-cited BRFSS time series, current smoking ~13% adult prevalence down from ~40% 1960s, diagnosed diabetes ~10% with ~38% prediabetes, physical inactivity ~25% report no leisure-time activity, geographic clustering — Mississippi/West Virginia/Louisiana highest obesity and inactivity, Utah/Colorado lowest, rural-urban gradient visible in every indicator), survey weighting methodology (iterative proportional fitting also called raking: calibrates sample weights to Census population counts simultaneously on multiple dimensions — age group/sex/race-ethnicity/educational attainment/marital status/home ownership/phone ownership type; `_LLCPWT` is the final analysis weight; `_STSTR` is the stratum variable; analysts must use complex survey methods — R survey package with svydesign(), Stata svy: prefix, SAS PROC SURVEYLOGISTIC — naive unweighted analysis produces biased estimates), key variables (all begin with underscore prefix for derived variables: `_STATE` FIPS state code, `SEXVAR` sex, `_AGEG5YR` 14 five-year age groups, `GENHLTH` 1-5 self-rated health, `PHYSHLTH`/`MENTHLTH` bad days past 30, `_BMI5` BMI×100 derived from self-reported height/weight, `_RFSMOK3` current smoker indicator, `ALCDAY5` drinks in past 30 days, `_TOTINDA` exercise binary, `HLTHPLN1` health insurance), PLACES project (CDC derived county-level and census tract-level estimates from BRFSS using multilevel regression and poststratification MRP — 27 health measures for all 3,000+ counties and 72,000+ census tracts, addresses BRFSS design limitation that state-stratified sample cannot produce reliable sub-state estimates in small areas; available via CDC Open Data portal Socrata API and bulk download), health equity findings (consistent demographic breakdowns enable disparity analysis: Black adults have higher hypertension/diabetes/obesity rates at every income level not fully explained by socioeconomic confounders; Hispanic paradox visible in some indicators — lower mortality despite lower income; disability/self-reported health gradient closely correlated with later hospitalization), Python snippet downloading BRFSS LLCP XPT file for recent year using pyreadstat, constructing survey design with statsmodels, computing `_LLCPWT`-weighted obesity prevalence by state and age group, identifying 10 highest and lowest obesity states among adults 18-44, data limitations (self-reported height/weight underestimates actual BMI by ~5% compared to measured values; telephone coverage bias excludes homeless/institutionalized/non-English speakers; 2011 methodology change creates discontinuity; health literacy variation affects recall accuracy; state seasonal variation in data collection months), and cross-references to CMS Hospital Quality outcomes data, Medicare Part D prescription patterns, and CMS Open Payments physician payment data.

  117. Q3Writing

    FHFA House Price Index deep-dive published

    Long-form article on the Federal Housing Finance Agency House Price Index: repeat-sales methodology origin (Bailey/Muth/Nourse 1963 first developed, Case-Shiller popularized, FHFA uses weighted repeat sales WRS weighting down transactions with extreme price changes as potential data errors or renovations not pure appreciation), coverage (conforming mortgages purchased or guaranteed by Fannie Mae and Freddie Mac — excludes jumbo loans above conforming limit, FHA/VA loans, cash purchases, commercial properties; conforming loan limit history: $647,200 in 2022, $726,200 in 2023, $766,550 in 2024, $806,500 in 2025 for most areas, $1.2M+ in designated high-cost areas), index variants (purchase-only HPI: uses purchase transactions only, excludes refinancing appraisals, most cited version; all-transactions HPI: includes appraisals from cash-out and rate-term refinancings — larger sample but noisier; expanded-data HPI: adds FHA and USDA Rural Development loan data improving coverage in lower-price market segments and geographic areas with few conforming mortgages, published since 2015; Distressed-sales-excluded HPI: removes REO and short sales for cleaner market signal), geographic coverage (national, 9 Census divisions, 50 states plus DC, 400+ metropolitan statistical areas, quarterly ZIP code HPI with 3-quarter publication lag — ZIP level has higher volatility in small areas), 2020-2022 pandemic housing surge (national HPI rose 40%+ between Q1 2020 and Q2 2022 — fastest appreciation since 1975 index origin, driven by pandemic demand shift to suburban/rural work-from-home enabled relocation plus historically sub-3% 30-year mortgage rates plus constrained supply from zoning restrictions/material costs/labor shortages/homebuilder capacity limits; 2022-2023 rate shock slowdown as 30-year mortgage jumped from 3% to 7%+ — demand collapsed but sellers unwilling to trade low locked-in rate for higher-rate new mortgage creating inventory lock-in suppressing sales volume but not prices in most markets; 2024+ stabilization with tight inventory floor supporting prices despite affordability stress), FHFA vs. Case-Shiller vs. CoreLogic vs. Zillow comparison (FHFA: conforming only, national to ZIP, free, quarterly with experimental monthly for national and 10 metros; Case-Shiller: 20 metro areas, all arms-length sales, widely cited monthly, paywalled for raw data but all major series on FRED as CSUSHPINSA etc.; CoreLogic: proprietary, broadest transaction universe including all liens, used by GSEs and Fed for risk modeling, not freely available; Zillow ZHVI Zillow Home Value Index: all homes including off-market using Zestimate hedonic model, monthly, free download at zillow.com/research), HMDA linkage (Home Mortgage Disclosure Act data provides loan-level origination records underlying HPI — HMDA captures volume, denial rates, pricing, and demographic patterns at the transaction level; FHFA HPI aggregates to price trend signal), Python snippet pulling FHFA purchase-only quarterly state HPI from FHFA bulk download site, computing YoY percent change for most recent 8 quarters, identifying top 5 and bottom 5 states by appreciation, plotting state heat map, data limitations (conforming-only coverage misses luxury/jumbo market and cash-heavy markets like NYC condos; WRS smoothing can lag turning points; ZIP-level estimates volatile in low-volume markets; 3-quarter ZIP publication lag reduces timeliness), and cross-references to HUD LIHTC affordable housing database, NFIP flood insurance property risk data, and CFPB enforcement actions on mortgage market.

  118. Q3Writing

    IRS Form 990 nonprofit financial disclosure deep-dive published

    Long-form article on IRS Form 990 nonprofit public financial disclosures: legal framework (IRC Section 501(c) tax exemption requires public reporting as accountability quid pro quo for exemption from federal income tax, Form 990 is a public inspection document under IRC 6104 — must be made available to any requester within 30 days, IRS publishes images at Tax Exempt Organization Search TEOS), which organizations file what (Form 990 full: gross receipts ≥$200K or total assets ≥$500K; Form 990-EZ: gross receipts $50K-$200K and assets <$500K; Form 990-N e-Postcard: gross receipts ≤$50K — minimal disclosure, just confirmation of continued existence; Form 990-PF: all private foundations regardless of size — more detailed grants paid and investment return disclosures; Form 990-T: unrelated business income tax from activities unrelated to exempt purpose; religious organizations — churches — specifically exempted from all 990 filing requirements), key disclosure sections (Part I Summary: mission statement, prior year/current year total revenue/expenses/net assets; Part III Program Service Accomplishments: narrative description of programs and expenditures; Part VII Compensation: all officers/directors/trustees plus 5 highest-compensated employees earning >$100K — name, title, average hours, reportable compensation from org, from related orgs, other compensation; Part IX Statement of Functional Expenses: 26 expense line items each split between program services/management-general/fundraising — the basis for program expense ratio charity watchdog metrics; Schedule A Public Support Test: 509(a)(1) vs 509(a)(2) public charity status, one-third support test calculation; Schedule B Schedule of Contributors: donors giving >$5,000 disclosed to IRS but name/address NOT publicly available — protects donor privacy; Schedule H Hospital: community benefit expenditures, financial assistance/charity care, unreimbursed Medicaid, research, community programs — basis for hospital community benefit analysis; Schedule L Transactions with Interested Persons: loans to officers/directors/key employees, business transactions with insiders, grants to related persons; Schedule R Related Organizations: parent orgs, subsidiaries, joint ventures, unrelated partnerships), AWS S3 bulk XML dataset (IRS has posted electronically filed 990 XML to s3://irs-form-990/ since 2016, covers 2011+ e-file submissions with annual index JSON at https://s3.amazonaws.com/irs-form-990/index_{year}.json, ~4M filings total, excludes paper filers who submit scanned PDFs not in structured dataset, XML schema changed multiple times — version handling required, EIN as stable identifier across organization name changes), ProPublica Nonprofit Explorer API (propublica.org/nonprofits, API at https://projects.propublica.org/nonprofits/api/v2/organizations/{EIN}.json, returns financial summary by fiscal year, rate limit 100 requests/minute, also provides PDF filing links for non-XML paper filers, Business Master File crosswalk to NTEE National Taxonomy of Exempt Entities codes), research applications (executive pay benchmarking: hospital CEO pay Part VII vs. CMS hospital quality metrics from Care Compare; charity efficiency: program expense ratio = Part IX program services / total expenses, fundraising ratio = Part IX fundraising / total contributions — charity watchdog metrics for GuideStar/Charity Navigator; dark money: 501(c)(4) social welfare organizations must file 990 with financial totals but are NOT required to disclose donors publicly unlike 501(c)(3), enabling tracking of Crossroads GPS/Priorities USA/NRA political spending without donor transparency; hospital market concentration: Schedule H community benefit and charity care total vs. tax exemption value estimated by Lown Institute; university endowment: Harvard $50B+ Yale $40B+ Princeton $35B+ annual investment returns and alternative asset allocation disclosed in 990 Part X balance sheet and Part XI reconciliation), data quality limitations (18-month maximum lag from fiscal year end to IRS processing to public availability, electronic filing not universal — paper filers excluded from XML corpus, expense functional allocation inconsistency between organizations making program ratio comparisons imperfect, Schedule B contributor data private preventing donor identification, organization size threshold means smallest nonprofits on 990-N with minimal disclosure), Python snippet downloading IRS 990 annual index from AWS, filtering to full 990 filers with total assets >$100M, fetching individual XML with ElementTree parsing, extracting EIN/name/total revenue/total expenses/program service revenue/highest officer compensation, computing exec-to-revenue ratio, outputting CSV, and cross-references to SEC EDGAR public company financials, USAID foreign assistance implementing partner nonprofit data, and DOJ Corporate Prosecution Registry nonprofit fraud enforcement.

  119. Q3Writing

    Federal Reserve H.8 bank balance sheet deep-dive published

    Long-form article on the Federal Reserve H.8 Assets and Liabilities of Commercial Banks statistical release: publication cadence (every Friday 4:15 PM ET for the week ending prior Wednesday, reflects FR 2644 weekly selected balance sheet survey of approximately 875 large banks plus quarterly universe Call Report data interpolated for smaller institutions), scope (all domestically chartered US commercial banks plus US branches and agencies of foreign banks, aggregated not institution-level — key distinction from quarterly Call Reports which are institution-level), history (data available back to 1973 on FRED, series redesigned multiple times reflecting accounting and regulatory changes), current scale ($23T+ total assets for all commercial banks as of 2025, dwarfing the $8T Fed balance sheet), full assets side taxonomy (cash assets: vault cash/cash items in process/balances due from depository institutions/balances with Federal Reserve Banks including required reserves and excess reserves — the QE reserve injection signal; securities: Treasury and agency securities split held-to-maturity HTM and available-for-sale AFS, mortgage-backed securities, other securities including municipal and corporate bonds — the $650B unrealized HTM loss problem from 2022-2023 rate rise that contributed to SVB/Signature/First Republic failures; loans and leases: commercial and industrial C&I loans as business cycle leading indicator, real estate loans split 1-4 family residential/commercial real estate CRE/construction and land development/farmland/home equity lines of credit/multifamily, consumer loans split credit cards/auto/student/other, interbank loans, agricultural loans; other assets: premises equipment/intangibles/goodwill/other real estate owned OREO from foreclosures), liabilities taxonomy (deposits: large time deposits $100K+ as institutional/corporate deposits most sensitive to rate competition, small time deposits CDs under $100K retail, savings deposits including money market deposit accounts MMDA, demand deposits checking accounts non-interest-bearing — the deposit run monitoring series; borrowings: federal funds purchased overnight interbank lending market/securities sold under repurchase agreements repo/FHLB Federal Home Loan Bank advances as primary short-term wholesale funding/subordinated debt/other), equity residual as buffer, four bank groups disclosed (all commercial banks headline, domestically chartered commercial banks excluding foreign bank US branches, large domestically chartered banks = 25 largest by assets — approximately JPMorgan/BofA/Wells/Citi and 21 others, small domestically chartered banks = 26th bank and below — the community and regional bank universe most sensitive to local credit conditions, foreign-related institutions = US branches of Deutsche/Barclays/HSBC/BNP/etc.), SVB crisis signal (March 2023 single-week deposit outflow of $98B from small domestically chartered banks as depositors fled to TBTF large banks — the flight-to-safety deposit concentration visible in real time in H.8 weekly data four days after the event, large bank deposits surged offsetting; concurrent HTM securities unrealized loss problem at smaller banks with bond portfolios purchased at pandemic-era low yields now worth less as rates rose — SVB had $117B AFS and HTM portfolio vs. $175B deposits, when forced to sell AFS at loss to meet redemptions triggered confidence collapse), H.8 vs. Call Report comparison (H.8 is weekly aggregate with publication lag of 4 days from reference date — high frequency but no institution-level detail and limited line-item granularity; Call Reports FR Y-9C filed quarterly by bank holding companies and FFIEC 031/041 by commercial banks — 70+ pages of detailed income statement/balance sheet/capital ratios/off-balance-sheet commitments/derivatives/CRE concentration — FFIEC CDR at cdr.ffiec.gov, FDIC Statistics on Depository Institutions SDI for query interface), FRED access (all H.8 series on FRED with documented series IDs: LOANS total loans and leases, BUSLOANS C&I loans, REALLN real estate loans, CONSUMER consumer loans, DPSACBW027SBOG total deposits all commercial banks, DPSACBSL large bank deposits, DPSACBSM small bank deposits — the SVB divergence series, WRMFNS M2), Python snippet using fredapi to pull C&I loan growth and large-vs-small bank deposit levels, computing YoY percent change, plotting divergence during 2023 banking stress with SVB/Signature collapse dates annotated, research applications (credit cycle: C&I loan YoY growth as leading indicator of business investment capex, turning negative before recessions; deposit flight: large-small differential as real-time bank stress indicator; QE/QT: reserve balance changes reflect Fed asset purchase/runoff impacts on banking system liquidity; monetary transmission: loan volume response to fed funds rate changes with lag estimation), and cross-references to OFAC sanctions compliance for banking, CFPB enforcement actions on consumer finance, and NCUA credit union data for non-bank depository institutions.

  120. Q3Writing

    Census County Business Patterns deep-dive published

    Long-form article on the Census Bureau County Business Patterns annual series: origin and history (published since 1964, sourced from the Business Register — the Census Bureau master list of all US businesses maintained via administrative records, federal tax filings, and Economic Census), coverage (all private nonfarm establishments with at least one paid employee — excludes self-employed, domestic workers, railroad employees under RRB, agricultural production employees NAICS 111-112, and most government employees; this is the key difference from QCEW which includes government), key data fields (GEO_ID FIPS geographic code, NAICS code at 2-6 digit level, ESTAB establishment count, EMP mid-March paid employment, PAYANN annual payroll in thousands, PAYQTR1 first-quarter payroll, employment size class breakdown — establishments with 1-4/5-9/10-19/20-49/50-99/100-249/250-499/500-999/1000+ employees), disclosure avoidance methodology change at 2017 (pre-2017: cell suppression for establishments fewer than 3 or single-employer concentration 80%+ with flag codes A-H indicating employment ranges; 2017+: noise infusion / differentially private noise added to employment and payroll cells — no suppression, all cells present but values slightly perturbed, breaks historical comparability), Nonemployer Statistics companion series (NES: self-employed with no paid employees, sole proprietors, partnerships — annual receipts by NAICS and county, growing in importance for gig economy analysis), three-way comparison CBP vs. QCEW vs. Economic Census (QCEW = UI administrative records, includes government, quarterly, 5-month lag, better real-time tracking; CBP = Business Register census, excludes government, annual 18-month lag, best for establishment size structure; Economic Census = quinquennial 2/7 years, most detailed including receipts/value added/capex, gold standard but infrequent — CBP is the annual bridge), Census Bureau API access (api.census.gov/data/{year}/cbp, get=ESTAB,EMP,PAYANN with NAICS2017 and GEO_ID parameters, for=county, in=state FIPS), research applications (location quotient for industrial clustering — county share of NAICS employment vs. national share; tracking small vs. large employer concentration; health services access — physician offices NAICS 621111 and hospital establishments 622110 per capita; economic development baseline for manufacturing-dependent vs. service-dependent counties), Business Dynamics Statistics companion (BDS: adds entry/exit and job creation/destruction flows — establishment births/deaths/expansions/contractions — enabling Haltiwanger et al. finding that young firms not small firms drive net job creation), Python snippet querying Census API for Ohio counties, filtering to manufacturing NAICS 31-33, computing employment and establishment counts, calculating location quotient vs. national share, identifying top 10 manufacturing-concentrated counties with suppression handling, and cross-references to BLS QCEW employment data, BEA GDP by Industry, and BLS OEWS occupational wages.

  121. Q3Writing

    NAEP education assessment deep-dive published

    Long-form article on the National Assessment of Educational Progress (The Nation's Report Card): history and governance (1969 first NAEP assessment, 1988 NAGB National Assessment Governing Board created as independent policy body, 1990 first Trial Urban District Assessment TUDA, 2001 NCLB No Child Left Behind mandated biennial 4th and 8th grade reading/math state NAEP, administered by NCES/IES under ETS and Westat contracts), three NAEP programs (National NAEP — representative national samples by race/income/disability/ELL/school type; State NAEP — state-representative samples enabling cross-state comparison; TUDA Trial Urban District Assessment — 27 large urban districts including NYC/LA/Chicago/Houston/Fresno/Detroit/Atlanta; plus Long-Term Trend NAEP at ages 9/13/17 using stable item sets since 1971 for mathematics and 1969 for reading), subjects (Reading, Mathematics, Science, Writing, US History, Geography, Civics, Technology and Engineering Literacy, Arts — Reading and Math are the primary policy focus), the 0-500 scale and NAGB achievement levels (Basic = partial mastery of prerequisite knowledge and skills fundamental to proficient work; Proficient = solid academic performance and competency over challenging subject matter; Advanced = superior performance — cut scores set by NAGB through Modified Angoff method, periodically reviewed), 2022 4th grade results (33% at or above Proficient in reading — down 3 points from 2019 largest 30-year decline, 36% in math — down 5 points first-ever math score decline), COVID learning loss evidence (2022 the first post-pandemic full-year assessment: 8th grade math down 8 points erasing nearly two decades of gains, urban districts hardest hit with Chicago -8/Detroit -10 reading points, first Nation's Report Card showing absolute decline in both subjects simultaneously), achievement gap tracking (White-Black gap ~25 reading points since 1992 — narrowed slightly in early years widened post-COVID, income gap proxied by free/reduced-price lunch eligibility — FRL students ~26 points below non-FRL in 4th reading 2022, disability gap ~30 points, ELL gap ~38 points), state comparison function (NAEP allows honest cross-state comparison because every state takes same test on same scale — unlike state tests where proficiency definitions vary wildly: some states set Proficient at NAEP Basic level, NAEP vs. state proficiency comparison reveals inflation, Mississippi paradox: Mississippi raised state standards 2013, test prep improved, NAEP scores improved dramatically from 49th to top 20 in 4th grade reading by 2022 showing standards-based reform can work), plausible values methodology (matrix booklet design: each student answers only a fraction of all items — neither feasible nor ethical to administer entire item pool to each student; IRT item response theory calibration across all items; 5 plausible values imputed per student drawn from posterior distribution of theta given item responses and background questionnaire variables via conditioning variables and multiple imputation; analysts must use all 5 PVs and combine estimates per Rubin's rules for valid standard errors — using a single PV or the mean produces biased standard errors; NAEP Primer technical documentation covers this in detail), NAEP Data Explorer API (nationsreportcard.gov/ndecore, REST API for querying aggregate state and national results by year/grade/subject/demographic — documented endpoint with variable parameters for statcode, grade, subject, year, stattype scale score vs. achievement level), Python snippet using the API to fetch all 50 states plus DC reading and math average scale scores for 4th and 8th grade, compute state rankings, and identify states with largest COVID-era 2019-2022 decline, federal data ecosystem connections (CCD Common Core of Data for school-level demographics and enrollment, IPEDS postsecondary data, EDFacts for state-reported outcome data, BEA for economic context of education investment), data limitations (no sub-state geography below urban district level, cross-sectional not longitudinal individual tracking, NAGB achievement-level cut-score critique that Proficient is set above international norms making US appear worse, no teacher-student linkage in public data), and cross-references to Census County Business Patterns for local economic context, BEA GDP for education investment macroeconomic context.

  122. Q3Writing

    BLS OEWS occupational wage statistics deep-dive published

    Long-form article on the BLS Occupational Employment and Wage Statistics program (renamed from OES Occupational Employment Statistics in 2021): survey design (mail survey of ~200,000 nonfarm business establishments per reference period, 3-year rolling panel with semi-annual waves of ~66,000 each conducted in May and November reference months, stratified sample by state/MSA/industry/establishment size, ~70% response rate, excludes self-employed — unlike CPS which captures them), SOC classification (SOC 2018 Standard Occupational Classification: 23 major groups — Management/Business-Financial/Computer-Math/Architecture-Engineering/Life-Physical-Social Science/Community-Social Service/Legal/Education/Arts-Entertainment-Sports/Healthcare Practitioners/Healthcare Support/Protective Service/Food Prep/Building-Grounds/Personal Care/Sales/Office-Admin/Farming-Fishing/Construction/Installation-Maintenance/Production/Transportation/Military; 98 minor groups; 461 broad occupations; 867 detailed occupations; examples: 15-0000 Computer and Mathematical → 15-1200 Software and Web Developers → 15-1252 Software Developers; SOC revision crosswalks needed for historical time series), all wage percentile fields (OCC_CODE SOC code, OCC_TITLE, TOT_EMP employment estimate, EMP_PRSE percent relative standard error, H_MEAN/H_PCT10/H_PCT25/H_MEDIAN/H_PCT75/H_PCT90 hourly wages, A_MEAN/A_PCT10/A_PCT25/A_MEDIAN/A_PCT75/A_PCT90 annual equivalents at 2,080 hours, MEAN_PRSE, LOC_QUOTIENT employment concentration vs. national average), H-1B prevailing wage connection (OFLC uses OEWS to set prevailing wage floors for H-1B and PERM labor certifications: Level I = 17th percentile entry-level with minimal experience, Level II = 34th percentile qualified workers, Level III = 50th percentile median fully competent, Level IV = 67th percentile fully competent with additional specialized requirements; employers must pay at or above applicable level; this governs hundreds of thousands of tech/engineering/healthcare H-1B petitions per year), key occupational wage findings (software developers $127K median annual; data scientists $108K; statisticians $99K; surgeons $252K+; orthodontists $208K+; oral/maxillofacial surgeons; anesthesiologists; fastest food workers 3.8M employment largest occupation group, retail salespersons 3.5M, registered nurses 3.1M; location quotient reveals clustering — petroleum engineers LQ 15+ in North Dakota/Wyoming/Texas, marine architects in Maine/Virginia, gaming occupations in Nevada), OEWS vs. CPS vs. QCEW comparison (CPS = household survey of 60,000 residence-based including self-employed, monthly frequency, best for demographic breakdowns of employed workers, smaller sample means wider CIs for detailed occupations; QCEW = UI administrative records, establishment-based, industry-level wages only as average weekly wage not by occupation, no percentile distribution, quarterly; OEWS = establishment survey occupation-level with full percentile distribution, best for HR benchmarking and H-1B compliance), data access (bls.gov/oes annual XLS files by area and by industry, BLS API with OEUS prefix series IDs, DataUSA/IPUMS visualization layers), Python snippet downloading national OEWS XLS ZIP, filtering to SOC 15-xxxx Computer and Mathematical occupations, sorting by annual median, plotting horizontal bar chart of top 20 highest-paid tech occupations with 25th-75th interquartile range, limitations (18-month effective lag from rolling 3-year panel, SOC revision breaks long series, 2,080-hour annualization assumption, self-employment excluded, non-metropolitan areas have higher standard errors), and cross-references to BLS QCEW industry-level employment data, BEA GDP by industry, and DOL H-2 visa disclosure data.

  123. Q3Writing

    USPTO patent database deep-dive published

    Long-form article on the USPTO patent grant and application database: patent system basics (35 USC §154 20-year term from filing date for utility patents, 15-year for design patents effective June 2023 under Patent Law Treaties Implementation Act, plant patents for asexually reproduced distinct new plant varieties; three requirements: novel under 35 USC §102 prior art search, non-obvious under §103 obviousness standard, useful; adequate written description and enablement under §112), three patent types (utility patents — machines, processes, compositions of matter, new plant varieties, software/business methods, 90%+ of all patents; design patents — ornamental appearance of functional item, D prefix; plant patents — P prefix, much smaller volume), corpus scale (4M+ utility patent grants in PatentsView since 1976, 600,000+ new applications per year, 350,000+ annual grants, ~50% grant rate overall, average 24-month pendency from filing to first office action, 3-year total pendency from filing to grant), what the data contains (PatentsView fields: patent_id, patent_number, patent_type, patent_date grant date, patent_title, patent_abstract, num_claims, application_id, filing_date; CPC Cooperative Patent Classification codes replacing USPC — hierarchical A/B/C/D/E/F/G/H sections → class → subclass → group → subgroup; inventor disambiguation: inventor_id, inventor_name, city/state/country, PatentsView uses machine learning disambiguation to link same inventor across multiple patents; assignee: organization name, type individual/company/government/research institution, location; citation network: backward citations examiner-cited prior art, forward citations received indicating impact/importance; claims text — independent claims define scope, dependent claims narrow further), patent families and continuation strategy (continuation: claims priority to parent application, same disclosure but new claims, no additional prior art exposure; continuation-in-part CIP: new matter added but priority claimed from parent filing date for overlapping disclosure; divisional: restriction requirement forces splitting one application into multiple covering different inventions; patent families tracked by parent_id linkages in PEDS; pharma evergreening — filing continuations with new claims on delivery method/dosage/formulation/new indication to extend effective exclusivity beyond 20-year base patent, documented in Orange Book supplemental patent filings), patent quality controversy (Alice Corp v. CLS Bank International 2014 Supreme Court — abstract idea exception to patent eligibility under 35 USC §101 invalidated thousands of software and fintech patents, Mayo/Alice framework two-step test: is claim directed to abstract idea/natural phenomenon/law of nature? if yes, does claim add inventive concept? test applied inconsistently by courts and PTAB; Inter Partes Review IPR at Patent Trial and Appeal Board created by AIA America Invents Act 2011 — post-grant validity challenge mechanism, ~50% institution rate on petitions, 70%+ cancellation rate on instituted IPRs — NPE favorite defense; non-practicing entity NPE/patent troll ecosystem — Risch academic research estimates $29B/year direct NPE litigation costs for defendants, legislative reform efforts stalled), access points (PatentsView patentsview.org API endpoint api.patentsview.org/patents/query, bulk CSV downloads including patent/inventor/assignee/citation tables, gold standard research dataset with ML disambiguation; USPTO BDSS Bulk Data Storage System: raw weekly XML grant files and Tuesday/Thursday application publications; Google Patents Public Data on BigQuery free tier with full text search; PEDS Patent Examination Data System for prosecution history, office action counts, RCE requests, examiner statistics), Python snippet querying PatentsView API for patents with CPC subclass G06N (Computing; Calculating; Counting — artificial intelligence) granted in last 5 years, groupby assignee_organization, count grants, show top 20 AI patent holders, research applications (citation-weighted patent counts as R&D output proxy in economics literature, CPC technology space mapping shows innovation concentration, corporate IP strategy analysis — Apple/IBM/Samsung/Qualcomm concentration in mobile/semiconductor, Orange Book pharma linkage tracking which patents cover which drugs, inventor location geocoding enables geographic innovation cluster mapping — Silicon Valley/Boston/Austin/Seattle concentration), and cross-references to FDA Drug Approvals for Orange Book pharma patent linkage, SEC EDGAR for corporate R&D spending vs. patent output, and BEA GDP for innovation as output driver.

  124. Q3Writing

    BEA GDP and National Accounts deep-dive published

    Long-form article on the Bureau of Economic Analysis National Income and Product Accounts: the GDP expenditure identity C+I+G+(X-M) and each component in depth (PCE split into durable/nondurable/services; gross private domestic investment covering fixed investment and inventory changes; government consumption expenditures and gross investment at federal/state/local; net exports of goods and services), real vs. nominal GDP and the GDP deflator (PCE price index as the Fed's preferred inflation measure, relationship to CPI), three estimate vintages (advance estimate 30 days after quarter-end based on incomplete data, second estimate 60 days with trade and inventory revisions, third estimate 90 days, plus annual revisions and comprehensive benchmark revisions every 5 years), GDP by State (quarterly, 21 industry sectors NAICS, 1998-present, BEA Regional Data), GDP by Industry (annual, 71 detailed industries, gross value added and contribution by sector), personal income and outlays accounts (PI = compensation + proprietors income + rental income + net interest + dividends + transfer payments - contributions to government social insurance, DPI disposable personal income, PCE spending, personal saving rate), corporate profits before/after tax including inventory valuation and capital consumption adjustments, national saving and investment identity, BEA API query structure (apps.bea.gov/api/data with DataSetName=NIPA&TableName=T10101 for GDP components, BEA API registration for key), NIPA table numbering system (Table 1.1.1 percent change, 1.1.2 contributions, 1.1.5 seasonally adjusted annual rates current dollars, 1.1.6 chained dollar real series), FRED as the easiest access path (GDPC1 real GDP, GDP nominal, PCEPI deflator, PSAVERT saving rate, all FRED series IDs), Python snippet querying BEA API for quarterly real GDP components and plotting contribution waterfall chart for most recent quarter, and cross-references to BLS CPI inflation data, BLS QCEW employment by industry, and Federal Reserve H.6 money supply.

  125. Q3Writing

    FDA drug approval database deep-dive published

    Long-form article on the FDA CDER Drugs@FDA database and drug approval pathways: three application types (NDA New Drug Application for brand-name chemical entities — requires safety and efficacy clinical trial data via 505(b)(1) full NDA or 505(b)(2) hybrid relying partly on published literature; BLA Biologics License Application for large-molecule biologics — proteins, antibodies, vaccines, gene therapies — approved under Public Health Service Act Section 351; ANDA Abbreviated New Drug Application for generic drugs — requires bioequivalence demonstration not full clinical trials — Hatch-Waxman Act 1984 framework), Orange Book official title "Approved Drug Products with Therapeutic Equivalence Evaluations" (TE codes: A therapeutically equivalent including AB bioequivalent, B not therapeutically equivalent, BX data insufficient to determine — AB is the gold standard for generic substitution, the 1975 Mac-Gray memo origin), patent and exclusivity listings in Orange Book (each NDA must list all patents that could be infringed, generics must certify Para I/II/III/IV, Para IV triggering 30-month stay and potential 180-day generic exclusivity for first filer), exclusivity types (NCE New Chemical Entity 5-year exclusivity for new molecular entities with no active moiety previously approved, 3-year clinical exclusivity for new conditions of use/dosage forms requiring new clinical studies, 7-year Orphan Drug Exclusivity for drugs designated for rare diseases under OOPD, 6-month pediatric exclusivity as a reward added to NCE or other exclusivity, 12-year biologic exclusivity under BPCIA for reference products), expedited designation programs (Breakthrough Therapy for serious conditions with preliminary evidence of substantial improvement over available therapies; Fast Track for serious conditions filling unmet need — rolling review allowed; Priority Review for significant improvement in treatment setting review goal at 6 months vs. standard 12 months; Accelerated Approval for surrogate endpoint or intermediate clinical endpoint reasonably likely to predict clinical benefit — subject to post-market confirmatory trials), Aduhelm (aducanumab) Accelerated Approval controversy 2021 (three FDA advisory committee members resigned, large-scale payer coverage denial by Medicare, confirmatory trial EMERGE/ENGAGE conflicting results, $56,000/year price, FDA overruled advisory committee vote, CMS 2022 national coverage determination restricting coverage to CED in clinical trials), OxyContin NDA 1995 and purdue pharma opioid crisis — Sackler family role and DOJ criminal plea, Wegovy/semaglutide 2021 obesity approval and supply shortage impact, Drugs@FDA database access at accessdata.fda.gov and the bulk download Application_Documents.zip, OpenFDA drugs API endpoint /drug/drugsfda.json with full structured approval history, FDA product labeling API for approved label text, Python snippet querying OpenFDA for all NME (new molecular entity) approvals 2015-2024 by therapeutic area with approval time distribution, and cross-references to CMS Medicare Part D prescribing data, ClinicalTrials.gov trial registry, and SEC EDGAR for pharma company financials.

  126. Q3Writing

    CMS Medicare Part B physician procedure data deep-dive published

    Long-form article on the CMS Medicare Part B Physician and Supplier Public Use File: Social Security Amendments 1965 Medicare Part B creation (supplementary medical insurance covering physician services, outpatient, and durable medical equipment — premium-financed vs. Part A hospital trust fund), 35-year AMA injunction against disclosure of physician-level Medicare data lifted in 2011 by Judge Ursula Ungaro in Charette v. HHS, first PUF release 2014 by CMS (covering 2012 data after Seema Verma CMS leadership period), full schema (NPI national provider identifier, provider_last_org_name/provider_first_name, provider_credentials, provider_entity_type individual/organization, provider_address city/state/ZIP, provider_type specialty code, HCPCS code/description, place_of_service facility vs. non-facility, hcpcs_drug_indicator, tot_benes beneficiaries, tot_srvcs services, tot_bene_day_srvcs, avg_sbmtd_chrg submitted charges, avg_mdcr_alowd_amt Medicare allowed amount, avg_mdcr_pymt_amt Medicare payment, avg_mdcr_stdzd_pymt standardized payment removing geographic wage index variation), 2022 data scale (1.1M providers, 13,500 HCPCS codes, $424B in submitted charges, $200B in Medicare allowed amounts), Medicare Physician Fee Schedule mechanics (RVU relative value units — work RVU physician time/skill/complexity, practice expense RVU, malpractice RVU — each multiplied by geographic practice cost index GPCI and conversion factor $34.89/RVU in 2024), standardized payment purpose (removing GPCI geographic adjustment reveals real volume and utilization differences not just cost-of-living), Lucentis vs. Avastas anti-VEGF ophthalmology controversy (ranibizumab Lucentis $2,000/injection vs. bevacizumab Avastin compounded $50/injection for wet AMD and diabetic macular edema, both made by Genentech, Genentech refused to seek FDA approval for Avastin ophthalmic use to protect Lucentis revenue, ASP+6% Medicare payment incentive aligning physician payment with drug cost — Medicare paid $3B+ in Lucentis from 2013-2017 when Avastin would have been clinically equivalent, CATT trial proving non-inferiority), Salomon Melgen MD ophthalmologist $21M Medicare fraud case 2012-2013 (inflating claims for Lucentis administration, billing for services not rendered — single ophthalmologist became highest-paid Medicare provider in the US), how to identify billing outliers (by HCPCS code: compare avg_sbmtd_chrg vs. avg_mdcr_alowd_amt markup ratio, flag providers with tot_srvcs >99th percentile and avg markup >3x average), data download from data.cms.gov, anti-VEGF injection Python filter (HCPCS J0178/J2778/Q2041/J3490 codes for Eylea/Lucentis/Avastin/other intravitreal injections, groupby provider, compute avg_sbmtd_chrg distribution, flag outliers), and cross-references to CMS Open Payments physician payment data, CMS Medicare Part D prescribing data, and NCUA credit union data.

  127. Q3Writing

    BLS QCEW county employment and wages dataset deep-dive published

    Long-form article on the BLS Quarterly Census of Employment and Wages: administrative origin (UI/ES-202 Unemployment Insurance wage records filed by employers as legal obligation, BLS compiles quarterly from state workforce agencies), coverage (97%+ of all US wage and salary employment — all employers participating in UI system: private-sector employers of any size plus state/local government; excluded: self-employed/sole proprietors, domestic workers, railroad workers covered by RRB, agricultural workers on farms employing fewer than 10 workers, religious organizations exempted from UI), geographic detail (nation/state/MSA/county — 3,100+ counties, full FIPS hierarchy), industry detail (6-digit NAICS up to county level, aggregated at higher levels for disclosure suppression), disclosure suppression rules (cell with fewer than 3 establishments or where one establishment represents 80%+ of employment — suppressed with 0 values, disclosure avoidance protecting individual employer confidentiality), key data fields (area_fips, own_code ownership — federal/state/local/private, industry_code NAICS, agglvl_code aggregation level, size_code employer size class, year, qtr quarter, disclosure_code N for suppressed, qtrly_estabs establishment count, month1/2/3_emplvl employment at 12th of month, total_qtrly_wages, taxable_qtrly_wages, qtrly_contributions UI contributions, avg_wkly_wage derived as total_qtrly_wages divided by sum of month employment divided by 13 weeks), QCEW vs. CES distinction (CES Current Employment Statistics is a sample survey ~145,000 businesses for monthly jobs report — faster but less geographic/industry detail; QCEW is the universe census — 8M+ establishments but 5-month publication lag; CES uses QCEW as its annual benchmark), QCEW vs. LAUS (Local Area Unemployment Statistics covers residence-based labor force/employment/unemployment; QCEW is establishment-based — worker counted at employer location not home address, double-counted commuters), BLS bulk data access (www.bls.gov/cew/downloadable-data.htm, quarterly CSVs by area and industry, annual averages, data dictionaries with area_titles.csv and industry_titles.csv), API alternative via api.bls.gov series IDs (ENU prefix for county employment), data limitations (5-month lag, suppressed cells in rural counties, NAICS reclassification breaks in historical series, ownership codes vary by database), Python snippet downloading state QCEW CSV for a given quarter and computing top-5 highest avg_wkly_wage industries by county (handling suppression, joining titles, output formatted table), and cross-references to BEA GDP by Industry, BLS OES occupational wages, and Census County Business Patterns for complementary establishment data.

  128. Q3Writing

    FBI NICS firearm background check data deep-dive published

    Long-form article on the FBI NICS monthly background check dataset: Brady Act 1993 origin and November 1998 launch, FBI CJIS operation of NICS, POC state systems that interface with NICS (18 full-POC states process checks themselves), three queried databases (NCIC for criminal history lookups, Interstate Identification Index Triple-I for state rap sheets, NICS Index for disqualifiers not in NCIC — mental health adjudications, illegal aliens, dishonorable discharge, domestic violence misdemeanor, restraining orders), four outcomes (Proceed, Deny, Delay — 3-business-day window, Cancel), 18 U.S.C. § 922(g) nine prohibited categories and why mental health and DV records are chronically underreported in state submissions, default proceed/Charleston loophole (Dylann Roof 2015 — drug arrest in ambiguous jurisdiction triggered delay that was never resolved, dealer legally transferred the gun, nine people killed at Emanuel AME Church, NICS Denial Notification Act debate over extending 3-day window), full transaction type taxonomy (handgun/long gun/other/multiple transfers; prepawn/redemption; rental; private sale types for universal background check states; permit and permit_recheck — Kentucky inflation case where permit rechecks 12x inflated counts), why NICS checks do not equal gun sales one-to-one (multi-gun transactions, permit inflation, private sales without universal background check laws), BuzzFeed News NICS dataset (GitHub BuzzFeedNews/nics-firearm-background-checks, PDF-parsed monthly CSVs 1998-present, standard research reference), major demand spikes (January 2013 Newtown/Obama executive action panic buying; December 2015/2016 San Bernardino/Orlando/election; March-June 2020 COVID civil unrest fears — historic single-month records; January 2021 Biden inauguration), state-level permit inflation adjustment methodology, Python snippet computing transfer proxy sum and 12-month rolling average with annotated spikes, and cross-references to ATF crime gun trace data (traces are the back-end, NICS is the front-end), FBI NIBRS crime data, and NHTSA FARS fatality data.

  129. Q3Writing

    HUD LIHTC affordable housing database deep-dive published

    Long-form article on the Low-Income Housing Tax Credit program data: Tax Reform Act 1986 creation (IRC Section 42), $10B+ annual tax credit allocation, developer syndication mechanics (investors receive 10-year credit stream in exchange for equity, CRA credit motivation for banks and insurers), 15+15 year compliance + extended use period = 30-year minimum affordability commitment, HUD national database provenance (Novogradac and Abt Associates under HUD contract, huduser.gov), full field schema (project name/address/city/state/ZIP/census tract/county FIPS/HUD region, allocation year, year placed in service, number of low-income units, total units, 4%/9% credit type, new construction vs. rehab, target population — family/elderly/special needs, income limit — 50%/60% AMI, tax credit dollar amount, housing agency allocating credits, project type), 9% credit (annual allocation cap ~$3.13/capita or $3.6M minimum, highly competitive, new construction preferred) vs. 4% credit (available in unlimited quantity but requires private activity bond financing, acquisition-rehab focus, less competitive), scale (50,000+ projects placed in service 1987-2023, 3.5M+ low-income units, ~8% of 44M total US rental units), state HFA Qualified Allocation Plan mechanics (scoring criteria: location in high-opportunity areas, transit proximity, serving very low-income households, rural vs. urban balance, supportive services), gentrification tension (opportunity area siting vs. displacement of existing affordable tenants), National Housing Preservation Database/NHPD (PAHRC, combines LIHTC + HUD Section 8 project-based + USDA RD + HOME — comprehensive federally assisted housing picture, preservation cliff analysis for expiring affordability), data limitations (geocoding gaps, missing credit amounts, state reporting variation, 1-2 year lag, no tenant data), Python snippet downloading HUD LIHTC CSV and computing per-1,000-residents units by state and credit type using Census ACS population, and cross-references to HUD fair housing complaints, HMDA mortgage lending, and Census ACS demographic data.

  130. Q3Writing

    CFTC Commitments of Traders report deep-dive published

    Long-form article on the CFTC Commitments of Traders weekly positioning dataset: 1962 origin, Tuesday-close positions published Friday 3:30 PM ET, CFTC reportable threshold (producers/merchants 200+ contracts for most markets), four report formats (Legacy/Traditional — Commercial hedgers/Non-Commercial speculators/Non-Reportable small traders, published since 1986; Disaggregated — Producer-Merchant-Processor-User/Swap Dealers/Managed Money/Other Reportables, introduced 2009 to separate swap dealer positions from commercial hedgers after 2008 crisis; TFF Traders in Financial Futures — Dealer-Intermediary/Asset Manager-Institutional/Leveraged Funds/Other Reportables for equity index/Treasury/currency/VIX markets; Supplemental — selected combined futures+options with Index Trader category), key data fields (market_and_exchange_names, as_of_date, open_interest_all, long/short/spreading positions by category, change_in columns, trader counts, pct_of_oi), all markets covered (WTI/Brent crude, natural gas, gold, silver, copper, corn, wheat, soybeans, sugar, coffee, cotton, live cattle, lean hogs, S&P 500/Nasdaq E-mini, 2yr/10yr/30yr Treasury, EUR/USD/JPY/USD, VIX, Bitcoin futures), COT as contrarian indicator (Managed Money extreme net long = crowded trade = potential reversal risk; Managed Money extreme net short = short squeeze risk; Commercial hedger position as smart-money proxy; Briese COT Index methodology normalizing net position within 3-year range), academic and practitioner research limitations (3-day publication lag, OTC derivatives and spot markets excluded, CFTC trader classification self-reported, historical category methodology changes), data access (cftc.gov annual historical ZIP files back to 1986, file naming convention), Python snippet downloading current Legacy annual ZIP and computing net non-commercial position and 52-week z-score for WTI crude and gold, and cross-references to EIA electricity data, FERC energy enforcement, and SEC Form 4 insider trading.

  131. Q3Writing

    OFAC sanctions SDN list and Consolidated Sanctions List deep-dive published

    Long-form article on the OFAC sanctions compliance database: OFAC statutory authority (IEEPA, TWEA, various sanctions-specific statutes), civil penalty exposure up to $1.3M per violation and criminal penalties up to $1M and 20 years imprisonment, SDN list (~8,000 entries: individuals, entities, vessels, aircraft) with full record schema (UID unique identifier, lastName/entityName, firstName, middleName, sdnType individual/entity/vessel/aircraft, programList — IRAN/DPRK/RUSSIA/SDGT/SDNTK/CUBA/VENEZUELA/BURMA/SYRIA/GLOMAG/TCO, title, vessel fields call sign/vess_type/tonnage/GRT/vess_flag/vess_owner, addressList with all known addresses, akaList with all aliases, idList with passport/national ID/tax ID/registration document numbers, remarks), all other OFAC lists in Consolidated Sanctions List (SSI Sectoral Sanctions Identifications for Russia-directed restrictions, NS-MBS Non-SDN Menu-Based Sanctions for China, PLC Non-SDN Palestinian Legislative Council, FSE Foreign Sanctions Evaders, CMIC Communist Chinese Military Companies), 50% rule in full (any entity 50%+ owned by one or more SDNs is subject to SDN restrictions even if not listed; aggregate ownership across multiple SDN-owned entities; requirement to look through complex ownership structures), major sanctions programs (IRAN — oil/banking blockade; DPRK — weapons programs; RUSSIA — Ukraine-related EO13661/62/13685 and later EO14024 post-2022 invasion; SDGT — Al-Qaeda/ISIS/Hamas/Hezbollah designated global terrorists; SDNTK — drug kingpins Sinaloa/CJNG; CUBA; GLOMAG Global Magnitsky human rights; TCO transnational criminal organizations), three access formats (XML the most complete, CSV, fixed-width text) plus delta files for incremental updates and OFAC API, compliance screening challenges (transliteration variation — Arabic/Chinese/Russian name rendering, phonetic normalization, patronymic structures; vessel IMO number vs. MMSI vs. flag-of-convenience; cryptocurrency wallet addresses appearing in idList field; 50% rule ownership chain resolution), enforcement case history (Binance $4.3B 2023 largest in OFAC history, Microsoft $3.3M 2023 Azure cloud services, Apple Pay/Tencent $1.27M 2023, Kraken $362K 2022, BitGo $98K 2020), Python snippet parsing SDN XML with ElementTree extracting entity name/type/programs/aliases and outputting CSV, how financial institutions use OFAC (real-time transaction screening, batch customer screening, automated fuzzy-match engines — Refinitiv World-Check/Dow Jones Risk & Compliance/MSCI OFAC), and cross-references to FinCEN SAR data, FARA foreign agents, and DOJ False Claims Act.

  132. Q3Writing

    EPA Toxic Release Inventory data deep-dive published

    Long-form article on the TRI dataset: EPCRA Section 313 statutory basis (1986, Emergency Planning and Community Right-to-Know Act, Bhopal methyl isocyanate 1984 origin — 3,000+ deaths, community right-to-know mandate), reporting threshold requirements (10+ employees in covered SIC/NAICS — manufacturing/mining/electric utilities/federal facilities; general thresholds 25,000 lbs processed or 10,000 lbs otherwise used; PBT lower thresholds 10 lbs for dioxins, 100 lbs for mercury), chemical list (800+ chemicals and categories: lead, mercury, arsenic, benzene, toluene, formaldehyde, dioxins, PCBs, 2024 PFAS additions — 180+ per- and polyfluoroalkyl substances), full dataset field structure (TRI Facility ID, facility name, DUNS/EIN, address, lat/lon, SIC/NAICS, chemical name, CAS number, reporting year, on-site air stack releases, on-site air fugitive releases, on-site water surface discharge, on-site land surface impoundment, off-site transfers by disposition — waste treatment/recycling/energy recovery/POTW/disposal, production ratio, waste minimization activities), release types explained (stack vs. fugitive distinction, on-site vs. off-site transfer categories and their 11 sub-types), scale and trend (3.4B lbs 1988 → ~650M lbs on-site releases 2023, ~75% decline, four drivers: process improvements/pollution prevention/chemical substitution/better management, caveat that production-normalized comparison matters), Form R vs. Form A (full reporting vs. simplified certification for facilities meeting de minimis or 500-lb criteria), TRI Explorer at epa.gov/triexplorer vs. TRI Basic Plus bulk annual CSV at epa.gov/toxics-release-inventory-tri-program/tri-basic-plus-data-files-calendar-year, environmental justice spatial join methodology (TRI lat/lon + Census ACS tract demographics + EPA EJScreen scores — proximity-based exposure caveat), RSEI model (dispersion modeling → toxicity weighting → population exposure calculation, three-step process converting release quantities to comparative risk estimates), notable facilities and chemicals (Valero Texas refineries for BTEX/benzene, secondary lead smelters for lead compounds, chlor-alkali plants for mercury), Python snippet downloading Basic Plus zip and computing air release totals by 2-digit NAICS sector, limitations (self-reported estimation method variation, threshold gaps for small facilities, pounds ≠ dose, PFAS data recency and incomplete coverage), and cross-references to EPA ECHO enforcement, Census ACS tract data, and PHMSA pipeline safety.

  133. Q3Writing

    CMS Hospital Quality / Care Compare database deep-dive published

    Long-form article on CMS hospital quality measurement: Care Compare platform history (Hospital Compare 2005 → unified Care Compare 2021 at care.compare.cms.gov), coverage (~6,000 Medicare-certified hospitals including acute care/critical access/psychiatric/long-term acute), five measure categories (1. Process measures — percentage compliance with evidence-based protocols: aspirin for AMI, antibiotic prophylaxis for surgical patients, VTE prophylaxis; 2. Outcome measures — 30-day risk-standardized mortality rates for AMI/heart failure/pneumonia/COPD/CABG/hip-knee; 30-day risk-standardized readmission rates; 3. Patient Experience HCAHPS — ten survey domains: nurse communication, doctor communication, hospital staff responsiveness, communication about medicines, discharge information, care transition, cleanliness, quietness, overall hospital rating, willingness to recommend; 4. Structural measures — intensivist in ICU, EHR adoption, safe surgery checklist; 5. Efficiency/Spending — Medicare Spending Per Beneficiary MSPB, episode 3 days before to 30 days after discharge), risk adjustment methodology (hierarchical logistic regression, patient-level comorbidity risk scores, risk-standardized rate vs. raw rate, social risk factors debate), Overall Star Rating (1-5 stars, seven measure groups, k-means clustering, academic center/teaching hospital controversy due to case-mix), HAC Reduction Program (six metrics: CLABSI/CAUTI/SSI colon/SSI hysterectomy/C.diff/MRSA bloodstream + PSI-90 patient safety composite; worst-quartile facilities receive 1% Medicare payment penalty, $2B+ total penalties since 2015), Value-Based Purchasing (2% withhold redistributed based on Total Performance Score across clinical outcomes/HCAHPS/spending/safety domains, budget-neutral), suppression rules (<25 patients — important limitation for rural hospitals), data access (data.cms.gov/provider-data bulk ZIP downloads, Care Compare API, quarterly refresh), Python snippet downloading readmission CSV, filtering to heart failure, computing distribution by ownership type and identifying outliers, for-profit vs. non-profit patterns, critical access hospital star rating disadvantage, and cross-references to HHS OIG exclusions, CMS Open Payments, and SAMHSA treatment data.

  134. Q3Writing

    SEC EDGAR XBRL financial statements dataset deep-dive published

    Long-form article on the SEC EDGAR structured financial data system: XBRL mandate timeline (2009 large accelerated filers, 2010 accelerated filers, 2011 all remaining filers), filer universe (~7,000 active operating companies, ~13,000 total including ETFs/closed-end funds), US-GAAP taxonomy structure (dei: namespace for company facts — EntityCommonStockSharesOutstanding/EntityPublicFloat; us-gaap: namespace for financial concepts — Revenues/NetIncomeLoss/Assets/LiabilitiesAndStockholdersEquity/OperatingCashFlow/LongTermDebt; ifrs-full: for foreign private issuers), two main data products (EDGAR Company Facts API at data.sec.gov/api/xbrl/companyfacts/CIK{CIK}.json returning all reported facts as nested JSON by namespace/concept/unit/period; EDGAR Frames endpoint at data.sec.gov/api/xbrl/frames/us-gaap/{concept}/USD/CY{year}Q{n}I.json for cross-sectional screening of all companies in a given period-concept combination), Submissions API at data.sec.gov/submissions/CIK{CIK}.json for filing history, EDGAR full-text search at efts.sec.gov/LATEST/search-index, bulk FSN quarterly dataset (num.tsv/sub.tsv/tag.tsv/pre.tsv at sec.gov/dera/data/financial-statements), three data quality issues (XBRL extension elements — custom company-specific concepts not in taxonomy create comparison gaps; restated periods — same concept/period reported multiple times across filings, deduplicate by selecting latest filed date; unit inconsistencies — millions vs. thousands reported in unit field), SEC EDGAR rate limits (10 req/sec, User-Agent required with email), Python snippet fetching Q4 frames for revenue two years and computing YoY growth ranking, use cases (factor investing: value/growth/quality from fundamentals; earnings quality: accruals and revenue recognition anomalies; goodwill tracking: acquisition patterns; financial distress screening; government contractor analysis by joining to USASpending), and cross-references to SEC Form 4 insider trading, SEC Form 8-K material events, and SEC enforcement actions.

  135. Q3Writing

    Corporate Prosecution Registry DPA/NPA database deep-dive published

    Long-form article on the Corporate Prosecution Registry (Brandon Garrett / Duke Law, data at lib.law.virginia.edu/Garrett/corporate-prosecution-registry/, bulk CSV/Excel download): three federal corporate resolution types (DPAs — charges filed, prosecution deferred pending compliance conditions, dismissed if met/resumed if not; NPAs — no charges filed, agreement on conditions and fine, most favorable; guilty pleas — actual criminal conviction, increasingly rare for large corporations due to collateral consequences: federal contractor debarment under FAR, banking license revocation, exclusion from Medicare/Medicaid under 42 U.S.C. § 1320a-7), database fields (company name, parent company, industry, charge, statute violated, resolution type, DOJ division — Criminal Division/individual USAO, year, fine amount, monitor required, monitor name, compliance requirements, self-disclosure flag, cooperation credit), scale (400+ corporate resolutions since 1990, $30B+ in total fines), DOJ Corporate Enforcement Policy evolution (Filip Factors 2008, Yates Memo September 2015 — individual accountability prerequisite for corporate credit, Monaco Doctrine 2021-2022 — repeat offenders face steeper penalties, monitors more common), self-disclosure discount mechanics (up to 50% off bottom of sentencing guidelines), too-big-to-jail critique (Eric Holder Senate testimony 2013: "Some of these institutions have become too large... it has an inhibiting impact on our ability to bring resolutions that you might expect"), HSBC DPA 2012 ($1.9B, money laundering for Mexican Sinaloa cartel and sanctions violations for Cuba/Iran/Sudan/Libya/Burma — no criminal charges, compliance monitor, five-year DPA), Pfizer NPA 2009 ($2.3B with subsidiary guilty plea — off-label promotion of Bextra, largest criminal fine in US history at the time), Boeing DPA 2021 ($2.5B, 737 MAX MCAS fraud, subsequent guilty plea 2024 after DOJ found Boeing violated DPA), Alstom guilty plea 2014 ($772M FCPA for bribery in Indonesia/Egypt/Saudi Arabia/the Bahamas/Taiwan), Goldman Sachs 1MDB subsidiary guilty plea 2020 ($2.9B, first major bank to plead guilty in decades), compliance monitor system (independent court-appointed, 1-3 year terms, full access to books/records and personnel, quarterly reports to DOJ, monitor selection controversy — John Ashcroft as Zimmer Biomet monitor without competitive selection process, DOJ now requires more transparent selection), FCPA as dominant category (anti-bribery and books-and-records provisions, parallel DOJ/SEC civil enforcement, joint DOJ-SEC FCPA Guide), Python snippet loading CPR CSV, filtering to DPA/NPA, aggregating fines by industry and year, and cross-references to DOJ False Claims Act, SEC enforcement actions, and lobbying disclosure data.

  136. Q3Writing

    USAID foreign assistance and ForeignAssistance.gov data deep-dive published

    Long-form article on US foreign assistance transparency data: USAID mission (civilian foreign aid and development — health, food security, democracy/governance, economic growth, education, environment, humanitarian assistance), ForeignAssistance.gov origin (Foreign Aid Transparency and Accountability Act FATAA 2016, all-agency common reporting format), dataset full field structure (country, fiscal year, implementing mechanism, agency, funding account, activity name, category — economic development/health/democracy/security/humanitarian, sub-category, transaction type obligation vs. disbursement, amount, implementing organization), USASpending.gov award-level complement (contractor/grantee names), scale ($50B+ total annual; USAID ~$30B; PEPFAR $5-6B/year), PEPFAR (President's Emergency Plan for AIDS Relief — Bush 2003 authorization, $110B+ appropriated, 20M+ people on antiretrovirals, data.pepfar.gov transparency portal), top recipient countries by total assistance (Afghanistan/Iraq — security-driven, Israel/Egypt — strategic alliance, Ukraine — post-2022 surge, Ethiopia/South Sudan/Jordan — humanitarian/development), implementing partner ecosystem (Chemonics International, DAI Global, RTI International, FHI 360, Pact, IRC, World Food Programme, CARE, Johns Hopkins — Washington Beltway contractor concentration vs. local organization tension), three access methods (ForeignAssistance.gov bulk annual CSV 2001-present, FA.gov API, USASpending.gov API for award-level contractor data), 2025 DOGE/USAID restructuring (site outage, stop-work orders on thousands of grants/contracts, USAID-State merger implications for data continuity), four researcher use cases (development economics impact evaluation, PEPFAR spending vs. UNAIDS HIV outcomes linkage, contracting concentration analysis, political determinants of aid allocation), Python snippet downloading FA.gov health sector data and computing top-10 health-aid recipients by year, and cross-references to USASpending federal contracts, DOJ False Claims Act settlements, and NIH research grants.

  137. Q3Writing

    PCAOB audit inspection and enforcement data deep-dive published

    Long-form article on the PCAOB public accounting oversight system: SOX Section 101 creation (2002 after Enron/Arthur Andersen/WorldCom — Arthur Andersen certificate revoked, criminal obstruction conviction later overturned), SEC oversight of PCAOB structure, four core functions (Registration of 10,000+ public accounting firms in 50+ countries; Standard-setting replacing GAAS with PCAOB auditing standards AS 2xxx series for public company audits; Inspection — annual for firms with 100+ SEC clients Big Four/GT/BDO/Moss Adams, triennial for smaller firms; Enforcement — disciplinary proceedings against firms and individuals), four public data products (Inspection Reports Part I public deficiency findings and Part II quality control criticisms public after 12-month non-remediation; Enforcement Orders; Registration database with firm status/country/size; Staff Guidance/Standards), Big Four annual inspection deficiency rates (percentage of engagements with at least one Part I.A deficiency, historically 20-40%+ in elevated years), Critical Audit Matters (CAMs) in audit reports under AS 3101 since 2019, KPMG inspection-data theft scandal (2018 — partners received stolen PCAOB inspection selection lists to prepare targeted clients, criminal charges, $50M penalty — largest PCAOB penalty in history), Holding Foreign Companies Accountable Act HFCAA 2020 (Chinese auditors serving US-listed Chinese companies Alibaba/NIO/JD refused PCAOB inspection citing state secrecy laws, 3-consecutive-year delisting threat, PCAOB-CSRC 2022 agreement allowing limited Chinese auditor inspection for first time), Enron/AA consulting-to-audit conflict context driving the auditor independence rules, enforcement examples (Deloitte Brazil $8M 2016, individual CPA bars and suspensions), Python scraper for PCAOB inspection report listings parsing firm name/report date/deficiency counts from Part I summaries, research applications (auditor quality proxy studies, Chinese auditor HFCAA compliance tracking, connecting to SEC 10-K auditor opinion data), and cross-references to SEC enforcement actions, SEC Form 8-K Item 4.01 auditor changes, and DOJ False Claims Act healthcare audit failures.

  138. Q3Writing

    Medicare Part D prescribing data deep-dive published

    Long-form article on the CMS Medicare Part D Prescriber datasets: Part D background (Medicare Modernization Act 2003, 2006 launch, 50M+ enrollees, $200B+ annual spending, PDP standalone and MA-PD plan structure, low-income subsidy/Extra Help population as clean reference), CMS data publication methodology (10+ claims threshold for public disclosure, annual release with 2-year lag, three dataset cuts — by Provider/by Provider and Drug/by Geography and Drug), full field schema (provider NPI/name/specialty/city/state/zip, brand name, generic name, HCPCS code, total claims count, total day supply, total drug cost, beneficiary count, claim-weighted average beneficiary age/risk score, brand/generic flag), scale (1.1M+ providers, 5,700+ unique drugs, $100B+ in visible spending per year, suppression bias from 10-claim threshold), specialty-drug class correlations (oncologists/cancer immunotherapy, psychiatrists/antipsychotics and antidepressants, pain specialists/opioids), geographic variation in opioid prescribing (10x range between lowest and highest states), ProPublica Prescriber Checkup opioid investigation (top-1% prescribers = 25%+ of opioids, CMS prescribing limitation program for persistent outliers), 2-SD/3-SD outlier methodology for peer-group comparison within specialty, specialty drug revolution (GLP-1 semaglutide/tirzepatide rapid cost surge in Part D claims data, cancer immunotherapy academic-center concentration, gross-cost caveat on rebates), Python snippet for paginated Socrata API download with opioid filter and state/specialty aggregation, five dataset joins (NPPES NPI registry, CMS Open Payments payment-prescribing correlation, HHS OIG LEIE exclusion verification, DEA ARCOS historical, FDA FAERS adverse events), data caveats (gross cost vs. net post-rebate, suppression bias for small prescribers, dispensed vs. prescribed gap, specialty label imprecision from NPPES taxonomy), and cross-references to CMS Open Payments, HHS OIG exclusions, and CDC drug overdose mortality data.

  139. Q3Writing

    CMS Open Payments Sunshine Act database deep-dive published

    Long-form article on the Physician Payments Sunshine Act disclosure system: statutory basis (ACA Section 6002, 42 U.S.C. § 1320a-7h — manufacturer reporting mandate for drugs/devices/biologics/medical supplies covered by Medicare/Medicaid/CHIP), 2022 CAA expansion to physician assistants/nurse practitioners/CRNAs/clinical social workers as covered recipients, three payment categories (General Payments: 26 nature-of-payment types from Food and Beverage through Royalty/License; Research Payments: contracts and grants; Ownership and Investment Interests: physician/family equity holdings), all 26 nature-of-payment categories in detail (Food and Beverage — largest transaction count, ~$0.10-$0.50/physician/meal, lowest average; Consulting Fee and Speaker Fee — highest conflict-of-interest relevance; Royalty/License — high-dollar bilateral agreements; Charitable Contribution; Gift; Entertainment; Honoraria; Travel and Lodging; Education; Grant; Research), full database schema (physician NPI, name, specialty, state, city, manufacturer name, associated covered drug/device name/type, payment amount, payment date, nature of payment, form of payment cash vs. in-kind, dispute status, record_id), scale ($12B+ total since 2013, ~$3.5B/year, 2,000+ reporting companies, 1.8M+ covered recipients, Pfizer/AstraZeneca/Medtronic among top payers), general vs. research payment ratio (research dominates dollar volume but general payments drive COI concern), enforcement cases (GSK $3B DOJ 2012 — improper physician payments for off-label promotion; Novartis $678M 2020 — sham speaker programs; Insys Therapeutics — opioid bribe scandal pre-2014 effective date), academic literature (DeJong et al. 2016 JAMA Internal Medicine — meals/brand prescribing correlation; 2019 BMJ — speaker fees/opioid prescribing correlation), three access methods (OpenPaymentsData.CMS.gov search UI, bulk annual CSV downloads by category, CMS Open Payments API at developer.cms.gov), Python snippet downloading consulting/speaker payments >$5,000 and aggregating by NPI/manufacturer for top physician-company pairs, dataset joins to NPPES/Medicare Part D/FDA drug approvals/DOJ FCA settlements, limitations (self-disclosed unverified, NPI gaps for non-prescribers, de minimis $10 threshold, misclassification risk), and cross-references to DOJ False Claims Act, FDA FAERS adverse events, and NIH research grants.

  140. Q3Writing

    ATF crime gun trace data and FFL directory deep-dive published

    Long-form article on ATF firearms data: NTC trace mechanics (chain-of-commerce reconstruction, manufacturer→FFL→crime scene, 500,000+ annual traces), eTrace law enforcement submission system, four published data products (Crime Gun Trace Reports — annual state-level PDFs/tables; Firearms Trace Data by State — source-state cross-tabulation; FFL directory — all licensed dealers with 11 license type codes 01-dealer through 11-manufacturer-ammunition; AFMER — annual production by type/manufacturer), Tiahrt Amendment (2003, renewed in annual appropriations — prohibits public release of individual trace data, blocks municipal lawsuit discovery, source: gun industry protection after 2000s litigation wave), time-to-crime as straw purchase signal (<3 years = trafficking indicator, >10 years = theft/secondary market), iron pipeline documented via state-level trace data (Georgia/Florida/Virginia/Carolinas → New York/DC flow, permissive-to-restrictive state gun trafficking), ghost gun/PMF surge (1,700 recoveries 2016 → 19,000+ 2021, NTC invisibility, ATF 2022 serialization rule), AFMER production trends (pistol dominance, AR-pattern surge post-2004 assault weapons ban expiry, suppressor market growth post-NFA trust reform), Python snippet downloading FFL directory CSV, filtering type 01/02 dealers, computing per-100k-capita by state using Census population data, and journalist use cases for iron pipeline mapping, FFL density correlation, straw purchase hot-spot identification.

  141. Q3Writing

    CPSC product recall and SaferProducts.gov database deep-dive published

    Long-form article on CPSC product safety data: CPSC jurisdiction (Consumer Product Safety Act 1972, 15,000+ consumer product categories — exclusions: FDA drugs/devices/food, NHTSA vehicles, ATF firearms, FCC/FAA/EPA), two databases (Recalls since 1974 — 400-500/year; SaferProducts.gov since 2011 — 300,000+ consumer incident reports), full recall database field schema (recall_number, date, product description, hazard, remedy types, units recalled, recalling firm, country of origin), SaferProducts.gov fields (product type, report date, age/gender/severity, incident description), NEISS hospital-based injury surveillance as a third data source, CPSIA 2008 provisions (third-party testing for children's products, 100 ppm lead limit, phthalate restrictions, tracking label requirement, SaferProducts.gov database mandate), four landmark cases (IKEA Malm dresser tip-overs 8 deaths 2016-2019 → $50M settlement → STURDY Act federal mandatory stability standard; Fisher-Price Rock 'n Play 32 infant deaths recalled 2019 after Consumer Reports investigation → Safe Sleep for Babies Act 2021; Chinese drywall sulfur gas corrosion 2009-2010 structural product recall; Takata airbag CPSC/NHTSA boundary question), Section 15(b) 24-hour reporting requirement, Section 12 mandatory recall authority (rarely invoked — litigation risk, due process), voluntary recall as structured coercion, access channels (recalls.gov joint portal, CPSC API, SaferProducts.gov, FOIA for full incident data), Python snippet querying CPSC recall API for children's product recalls, and cross-references to FDA medical device recalls, NHTSA vehicle recalls, and FDA warning letters.

  142. Q3Writing

    FEMA NFIP flood insurance claims and policy data deep-dive published

    Long-form article on the National Flood Insurance Program datasets: NFIP origin (National Flood Insurance Act 1968, private insurers unable to profitably cover flood risk, federal backstop), two OpenFEMA datasets (NFIP Claims — every paid claim since 1978; NFIP Policies — active policy snapshot), Claims dataset full field inventory (year of loss, flood zone AE/AO/VE/X, building/contents coverage amounts, amounts paid building/contents, occupancy type single-family/2-4-family/5+-units/non-residential, state, county FIPS, census tract, community number, elevation certificate status, pre/post-FIRM, original construction date, cause of flood — riverine/coastal/etc.), Policies dataset fields (effective/expiration dates, building/contents coverage, deductibles, flood zone, FIPS, occupancy type), scale (5M+ active policies, $1.3T+ total coverage, ~40,000 claims in average year, 2M+ in hurricane years), flood zone taxonomy (AE — base flood 1% annual chance FIRM-mapped; AO — shallow flooding; VE — coastal wave action; X — moderate/minimal risk 0.2% annual chance), the FIRM map modernization and CRS community discount system, four major loss events (Katrina 2005 $16B; Sandy 2012 $8B; Harvey 2017 $9B; Ian 2022), NFIP$36B+ Treasury borrowing and structural fiscal deficit, repetitive loss properties (0.35% of policies = 10.6% of claims paid), severe repetitive loss category, NFIP reauthorization political cycle, Risk Rating 2.0 (2021-2022 methodology overhaul — property-level risk models replacing flood zone flat rates, 70% of policies saw increases, some coastal +400%, political backlash), OpenFEMA API access (fimaNfipClaims and fimaNfipPolicies endpoints, $skip/$top pagination), Python snippet downloading claims by state and computing average payment by flood zone and occupancy type, and cross-references to NOAA Storm Events, Census ACS tract data, and PHMSA pipeline safety.

  143. Q3Writing

    FARA foreign agent registration database deep-dive published

    Long-form article on the FARA eFile database: statute history (22 U.S.C. §§ 611-621, 1938 origins targeting Nazi propaganda, 1966 reorientation toward lobbying/political activities), FARA scope (persons acting as agents of foreign principals — governments, parties, political interests — must register with DOJ NSD FARA Unit and disclose activities, expenditures, political propaganda), Section 613 exemptions in full (diplomatic personnel, US government officials, commercial activities not serving a predominant foreign principal interest, legal representation, religious/scholastic/humanitarian, Section 613(h) LDA short-form exemption for lobbyists registered under LDA for foreign commercial clients), four document types (Initial Statement — registrant/foreign principal/relationship; Semiannual Supplemental Statements — activities, propaganda distributed, receipts/disbursements; Exhibit A — description of activities; Exhibit B — copies of agreements), key data fields (Registration_Number, Registrant_Name, Foreign_Principal, Country, Supplemental_Statement_Period, Receipts, Disbursements, Description_of_Services), eFile API (efile.fara.gov/api/v1/ — registrants, supplemental statements, exhibits, foreign principals), scale (~700-800 active registrants, $700M+ annual disclosed receipts), top foreign principal countries (Saudi Arabia, UAE, Japan, Israel, China, Turkey), who files (lobbying firms Podesta Group/BGR Group/Mercury Public Affairs, law firms, PR companies, consultants), DOJ enforcement history (historically lax, few pre-2016 prosecutions, criminal penalty up to 5 years imprisonment), Paul Manafort case ($17M Ukrainian consulting, 2017 retroactive filing, 2018 conviction — first FARA conviction in decades), Mueller investigation FARA enforcement push, retroactive filing wave 2016-2020, FARA-LDA dataset join methodology using firm name matching, Python snippet fetching all registrants and supplemental statements, aggregating receipts by country, ranking top-10, and journalist use cases for network mapping, retroactive filer identification, FARA-LDA-FEC three-way join, DOJ MLARS cross-reference.

  144. Q3Writing

    FDA medical device recall database deep-dive published

    Long-form article on the FDA CDRH device recall database: three recall classes (Class I — reasonable probability of serious adverse health consequences or death; Class II — temporary or reversible adverse health consequences; Class III — not likely to cause adverse health consequences), recalls vs. corrections vs. removals vs. market withdrawals vs. safety alerts taxonomy, full field schema (recall_number, firm_name, product_description, classification, recall_initiation_date, recall_termination_date, center CDRH/CBER, voluntary vs. mandatory, distribution_pattern, reason_for_recall, action_taken, quantity, MedWatch notification, 510k/PMA cross-reference), scale (~1,000–1,500/year, voluntary dominance, FDASIA 2012 mandatory authority rarely invoked), device vs. drug recall differences (implanted devices — surgical explantation vs. watchful waiting; software/firmware OTA correction), four landmark recalls (DePuy ASR metal-on-metal hip $4B settlement 2010 — largest device settlement in history; Philips Respironics CPAP/BiPAP PE-PUR foam degradation 5.5M+ units 2021; Medtronic HeartWare HVAD permanent 2021 discontinuation; Stryker/Biomet metal-on-metal hips), MAUDE/MDR signal chain (manufacturer 30-day MDR, user facility 10-day MDR, voluntary MedWatch, signal-to-recall lag), OpenFDA device recall API (api.fda.gov/device/recall.json — key parameters, pagination), Python snippet querying Class I cardiovascular device recalls, research applications (510(k) predicate chain join, firm-level quality signals, recall rate by category), mandatory recall authority debate, and cross-references to FDA 510(k), FAERS, CPSC, and Warning Letters.

  145. Q3Writing

    NCUA credit union 5300 call report and enforcement data deep-dive published

    Long-form article on NCUA credit union financial and enforcement data: NCUA as the independent federal agency chartering federal credit unions and administering the NCUSIF (credit union equivalent of FDIC deposit insurance), credit union structure (member-owned cooperatives, tax-exempt, common bond requirements — community/occupational/associational, shares vs. deposits nomenclature, no external equity), scale (4,700+ federally insured CUs, $2.4T+ assets, 135M+ members, highly fragmented with median CU ~$50M vs. median bank ~$200M+), 5300 Call Report quarterly bulk data (key fields: CU_NUMBER, total assets ACCT_010, shares ACCT_018, net loans ACCT_025B, net income ACCT_115, ROA ACCT_700A, net worth ratio — quarterly bulk CSV download at ncua.gov), net worth ratio vs. bank Basel III risk-weighted capital (simpler leverage-style ratio), full PCA threshold ladder (Well Capitalized 7%+, Adequately Capitalized 6-7%, Undercapitalized 4-6%, Significantly Undercapitalized 2-4%, Critically Undercapitalized <2%) with mandatory PCA triggers, enforcement taxonomy (LUA informal/unpublished, Consent Order, Cease and Desist, Civil Money Penalty, Prohibition Order, Conservatorship, Liquidation), NCUA as receiver/liquidating agent (unlike FDIC P&A preference), the 2009 corporate credit union crisis (WesCorp $34B and U.S. Central $57B conservatorship from MBS exposure, $28.5B TCCUSF resolution, final 2021 assessments, wholesale CU network restructuring), consolidation trend (12,000 in 1980 → 4,700 today, merger approval process, fintech scale pressure), Python snippet downloading ZIP and computing net worth ratio PCA buckets, and cross-references to FDIC SDI, OCC enforcement, Federal Reserve H.8 bank balance sheet data.

  146. Q3Writing

    CFPB enforcement actions database deep-dive published

    Long-form article on the CFPB enforcement action record: enforcement actions vs. consumer complaints distinction (formal legal proceedings vs. complaint intake, 18-36 month lag from complaint surge to enforcement), statutory authority (Dodd-Frank §1054 civil federal court, §1053 administrative adjudication, 19 enumerated consumer financial laws including TILA/RESPA/ECOA/FCRA/FDCPA/EFTA, plus UDAAP), civil money penalty tiers (Tier 1: $5,576/day general violations; Tier 2: $27,878/day reckless; Tier 3: $1,115,128/day knowing), UDAAP breakdown (unfair — substantial injury, not reasonably avoidable, benefits don't outweigh costs; deceptive — false/misleading representations; abusive — Dodd-Frank addition, does not require confusion, exploits consumer limitation/conditions), database field-by-field description (company, date, forum, law violated, violation type, outcome, relief), five landmark cases (Wells Fargo $3.7B 2022 auto loan/mortgage/deposit failures; Navient $195M 2022 student loan servicer abuses; BancorpSouth $10.6M 2016 redlining; Credit Acceptance $27.2M 2021 subprime auto; MoneyGram $18M 2023 remittance fraud), scale (200+ actions, $20B+), political cycle analysis (Obama 2012-2014 ramp-up, Trump 2017-2020 pullback to ~20 actions/year, Biden restoration, 2025 funding structure challenges), supervision iceberg (examination authority over banks $10B+ and nonbanks, enforcement is public tip), Python scraper using requests/BeautifulSoup with pagination and regex relief-amount extraction, and cross-references to CFPB consumer complaints, OCC enforcement, DOJ Housing Civil Enforcement Section, and state AG coordination.

  147. Q3Writing

    BTS border crossing entry data deep-dive published

    Long-form article on the BTS Border Crossing Entry Data: monthly counts at every US land port of entry on the US-Mexico border (~170 ports) and US-Canada border (~120 ports) going back to 1996, full crossing type taxonomy (Personal Vehicles, Personal Vehicle Passengers, Trucks, Truck Containers Loaded, Truck Containers Empty, Buses, Bus Passengers, Trains, Train Passengers, Pedestrians — precise definitions for each), the CBSA port code system, five key data fields (port_code, port_name, state, border, measure, date, value), access via BTS Transstats download interface and Socrata API (data.transportation.gov dataset ID keg4-3bc2), COVID-19 collapse quantified by measure (pedestrians -93%, bus passengers -98%, personal vehicles -82%, trucks -28% — asymmetric collapse reflecting goods vs. people crossing patterns), San Ysidro as busiest pedestrian crossing (US-Mexico, San Diego-Tijuana urban corridor, 25M+ pedestrian crossings pre-COVID), Laredo as busiest truck crossing (15-20% of all US-Mexico truck traffic, Laredo-Nuevo Laredo bridge complex), US-Canada border seasonal patterns (summer tourism peak, automotive industry shutdown dips), four researcher use cases (nearshoring/supply chain truck trend analysis, CBP trade statistics correlation, immigration/asylum pattern detection, economic activity leading indicator), Python snippet downloading via Socrata API, filtering US-Mexico trucks, aggregating monthly, identifying COVID trough, ranking top ports, and cross-references to CBP trade statistics, ICE enforcement removals, and FMCSA carrier safety ratings.

  148. Q3Writing

    ClinicalTrials.gov registry and results data deep-dive published

    Long-form article on ClinicalTrials.gov: FDAAA 801 registration mandate (all applicable clinical trials must register before enrollment and submit results within 12 months of primary completion), the full NCT record schema (NCT ID, sponsor, phase, study type, condition, intervention, eligibility criteria, primary/secondary endpoints, enrollment count, status, start/completion dates, results submission), AACT PostgreSQL database from Duke/CTTI (normalized schema, 50+ tables, free PostgreSQL access), 50%+ results non-reporting compliance failure and the STAT News and Science analysis of selective outcome reporting, publication bias mechanics (positive results 60%+ more likely to publish than negative), the Clinicaltrials Transformation Initiative and the 2023 FDAAA enforcement push, GLP-1 agonist (semaglutide/tirzepatide) trial explosion as dataset growth case study, Python snippet querying AACT and computing per-sponsor results reporting rate, and journalist use cases for pre-publication trial outcome tracking and approval conflict-of-interest analysis.

  149. Q3Writing

    FDA 510(k) medical device clearance database deep-dive published

    Long-form article on the FDA 510(k) premarket notification pathway: three-class device classification system (Class I exempt, Class II 510(k) substantial equivalence, Class III PMA clinical evidence required), the substantial equivalence standard (same intended use and same/different technological characteristics with no new safety questions), the K-number database fields (K-number, date received/decided, applicant, device name, 510(k) type — traditional/abbreviated/special, product code, predicate K-number, decision code), predicate daisy-chain problem (device cleared against a 1976-era predicate through a chain of predicates, accumulating drift from original), De Novo pathway for novel low-risk devices (creates new Class II category when no predicate exists), metal-on-metal hip implant and pelvic mesh controversies (both cleared via 510(k) substantial equivalence before clinical failures emerged), OpenFDA 510(k) API and CDRH premarket approvals database, Python snippet querying device clearances by product code and computing median review time by decision year, and policy context on 510(k) reform proposals and MATTER Act.

  150. Q3Writing

    DOL H-2 visa guest worker disclosure data deep-dive published

    Long-form article on the OFLC H-2A and H-2B quarterly disclosure files: H-2A (agricultural, no annual cap — cap-free since 1986 Immigration Reform and Control Act) and H-2B (non-agricultural seasonal, 66,000 cap per fiscal year with USCIS and Congressional supplemental allocations), OFLC disclosure file fields (employer name/address, job title, SOC code, worksite state/county, requested workers, certified workers, wages offered, AEWR for H-2A, corresponding employment requirement), H-2A growth trajectory (60,000 certifications in 2012 to 370,000+ in 2023 driven by farm labor shortage and DACA uncertainty), H-2B supplemental visa politics (Congress regularly adds 30,000-64,000 supplemental visas for landscaping, seafood processing, hospitality), adverse effect wage rate (AEWR) mechanics as H-2A minimum wage floor by state and crop type, access at oflc.dol.gov/performancedata.html (quarterly Excel files by program), Python snippet for downloading and analyzing H-2A wage ratios vs. AEWR by employer and state, and journalist use cases for wage underpayment investigation and recruiter fee abuses.

  151. Q3Writing

    OCC bank enforcement actions database deep-dive published

    Long-form article on the OCC enforcement database: four-tier enforcement action taxonomy (informal — Commitment Letter and Memorandum of Understanding; formal — Formal Agreement, Consent Order, Cease-and-Desist Order), termination-of-charter authority as ultimate sanction, full field inventory (bank name/charter type, effective date, enforcement type, docket number, termination date, link to order document), BSA/AML enforcement pattern (anti-money-laundering failures as single largest category — FinCEN coordination, OFAC sanctions screening failures), Wells Fargo consent order cascade (2018 asset cap from Fed, 2020 OCC $500M fine for unsafe practices, 2022 $3.7B CFPB order for auto/mortgage/deposit failures — multi-regulator coordination), search and bulk access at enforcement.occ.treas.gov, Python scraper for OCC enforcement table parsing and trend analysis, how OCC actions compare to Fed/FDIC/CFPB simultaneous enforcement, and compliance officer use cases for peer bank monitoring and enforcement velocity tracking.

  152. Q3Writing

    FERC energy market enforcement data deep-dive published

    Long-form article on FERC enforcement: jurisdiction scope (interstate electricity transmission, wholesale markets, natural gas pipelines, LNG terminals, hydropower — not retail electricity, oil commodity, nuclear safety), two enforcement mechanisms (civil penalties up to $1.4M/day/violation under Energy Policy Act 2005; disgorgement of unjust profits), California energy crisis 2000-2001 policy background, three landmark cases (JP Morgan Ventures $410M 2013 CAISO/MISO uplift gaming, Barclays Bank $488M 2013 western US hub price manipulation, Coaltrain Energy $17M 2015 PJM uplift gaming), enforcement database (annual reports at ferc.gov and IN-prefix dockets on eLibrary), market data (EQR bilateral electricity transactions and Natural Gas Transparency pipeline flows under Order 720), three access channels (eLibrary, Power BI dashboard, FERC API), Python snippet for eLibrary enforcement docket searching and penalty extraction, and energy market analyst/journalist use cases.

  153. Q3Writing

    SAMHSA treatment facility and admissions data deep-dive published

    Long-form article on the SAMHSA N-SSATS facility survey and TEDS admissions dataset: N-SSATS field-by-field breakdown (facility name/address/type — hospital inpatient/residential/outpatient/OTP, ownership — for-profit/nonprofit/government, services offered, special programs, payment accepted, license/certification, beds), TEDS admissions/discharge records (substance of primary concern, age, sex, race, employment, living situation, referral source, prior treatment, expected payment, previous episodes — de-identified, state-level), access at datafiles.samhsa.gov and BHSIS API, findtreatment.gov/N-SSATS backend relationship, MAT landscape (methadone OTP-only vs. buprenorphine any waivered provider), rural zero-OTP county gaps, Python snippet mapping OTP density against CDC overdose death rates by county, three TEDS research use cases, and data quality limitations.

  154. Q3Writing

    SEC enforcement actions database deep-dive published

    Long-form article on the SEC enforcement record: three channels (Administrative Proceedings under Exchange Act Section 15(b) — bars/suspensions/cease-and-desist; Litigation Releases — civil federal court actions; Do Not Pay final orders), source pages (litreleases.shtml and admin.shtml), full record field inventory (defendant, action type, statutes violated, charges, sanction types — disgorgement/prejudgment interest/civil penalties/officer-director bar/broker bar/IA bar/auditor suspension/admission posture/dates), statistics (700-800 actions/year, $4-5B annual recoveries, Liu v. SEC disgorgement limits and Fair Fund distributions), whistleblower program (Section 21F, 10-30% awards over $1M, $400M+ annual volume, $279M single-award record), Python scraper for Litigation Releases and AP pages, journalist use cases (serial violator identification, revolving door tracking, administration enforcement priority quantification), and limitations (informal resolutions, sealed documents, aggregated penalty reporting).

  155. Q3Writing

    HHS OIG LEIE healthcare exclusion data deep-dive published

    Long-form article on the HHS OIG List of Excluded Individuals/Entities: statutory framework (42 U.S.C. § 1320a-7 and § 1320a-7a, strict-liability $10,000/service CMP exposure), mandatory exclusion grounds (four types, 5-year minimum, 10-year and permanent escalations), permissive exclusion grounds (misdemeanor fraud, license revocation, SAM mirroring, audit obstruction), full LEIE field breakdown (NPI, UPIN, EIN, EXCLTYPE, EXCLDATE, REINDATE, WAIVERDATE), monthly full-file vs. supplemental update file access workflow, SAM.gov EPLS comparison (LEIE healthcare-specific, SAM government-wide procurement — both needed), Python script with Latin-1 encoding, NPI exact match, normalized name exact match, rapidfuzz token-sort-ratio fuzzy matching with threshold, and reinstatement filtering, exclusion type code triage by severity, OIG model compliance guidance requirements (pre-employment, monthly ongoing, SAM cross-check, documentation, response protocol), and reinstatement mechanics.

  156. Q3Writing

    SEC Form 8-K material event disclosures deep-dive published

    Long-form article on SEC Form 8-K: the 4-business-day filing window, full item taxonomy across nine sections and 33 items (1.01 material agreements, 1.03 bankruptcy, 1.05 cybersecurity incidents, 2.01 acquisitions, 2.02 earnings releases, 2.05 executive departures, 2.06 impairments, 3.01 delisting, 4.01 auditor change, 4.02 non-reliance, 5.01 control change, 7.01 Reg FD, 8.01 other events), December 2023 Item 1.05 cybersecurity disclosure rule (materiality standard, four-day clock, annual 10-K complement), Item 2.02 earnings release pattern (Exhibit 99.1 as canonical market-moving earnings event), EDGAR access paths (full-text search, quarterly index files, Submissions API), filtering by item type using full-text search, Python snippet for Item 4.02 non-reliance filing surveillance as fraud early-warning signal, and journalist/investor use cases (executive departure tracking, auditor changes, impairment as lagging stress indicator, cybersecurity monitoring).

  157. Q3Writing

    NHTSA vehicle recall database deep-dive published

    Long-form article on the NHTSA recall database: statutory authority (National Traffic and Motor Vehicle Safety Act 1966), recall record fields (campaign number, manufacturer, initiation date, affected vehicle/unit count, defect description, consequence, remedy, VIN range), consumer complaint database (60,000+ annual complaints at safercar.gov, ODI investigation number, complaint description, incident date, vehicle VIN), API endpoints (api.nhtsa.dot.gov/recalls/recallsByVehicle and complaintsByVehicle) and bulk downloads, Takata airbag inflator recall (70M vehicles, 400+ manufacturers, 28+ deaths from metal shrapnel — largest in US history), EV/Samsung SDI lithium-ion battery fire recall acceleration, 70-75% completion rate tracking and why completion stalls, Python snippet querying NHTSA API by make/model/year with complaint-to-recall lead-time computation, and journalist methodology for pre-recall pattern detection using complaint database.

  158. Q3Writing

    DOL Form 5500 pension and 401(k) data deep-dive published

    Long-form article on the ERISA Form 5500 filing system: plan coverage (defined-benefit pensions, 401(k)/403(b)/profit-sharing, health and welfare plans), key data fields (EIN/plan number, plan name/type, participant counts — active/retired/other, total plan assets, contributions, benefit payments, investment return, administrators, service providers, auditor), major schedules (A insurance, C service providers with compensation >$5,000 for 401(k) fee disclosure and ERISA litigation, H large-plan financials, MB/SB actuarial for DB plans — AFTAP thresholds 60%/80% and minimum required contributions), EFAST2 access (efast.dol.gov portal and bulk CSV at dol.gov/agencies/ebsa — data from 2009 onward), scale (750,000+ filings annually, $10T+ in total assets), Python snippet for downloading annual Form 5500 data and computing average 401(k) expense ratios by plan size, and ERISA attorney/journalist use cases for excessive fees, imprudent investments, and underfunded pension identification.

  159. Q3Writing

    Senate LDA lobbying disclosure data deep-dive published

    Long-form article on the Lobbying Disclosure Act disclosure system: LDA filing contents (registrant/client names, lobbyist identities with covered-official history, 79 issue codes, specific bill numbers, income vs. expenses in $10,000 rounding bands, active/terminated status), two-document structure (LD-1 registration within 45 days + quarterly LD-2 activity reports), ~$4B/year aggregate lobbying spending, top categories (healthcare/pharma, finance/insurance, defense, energy), lda.senate.gov REST API (no key required, OpenAPI documented) and bulk XML downloads, LDA vs. FARA boundary and Section 3(h) exemption for foreign commercial clients, LD-203 semiannual campaign contribution reports and FEC join methodology, Python snippet aggregating by client and issue code, revolving-door analysis via covered-official-position field, and journalist methodology for bill number extraction, committee markup/contribution joins, and rulemaking comment-period analysis.

  160. Q3Writing

    USASpending federal contracts database deep-dive published

    Long-form article on the USASpending/FPDS-NG federal contracts dataset: FPDS-NG vs. FABS data provenance, full field inventory (PIID, awarding agency/sub-agency/contracting office, recipient UEI/DUNS/address, action type, award types — definitive contract/purchase order/delivery order/BPA call, three obligation amounts, description of requirement, NAICS/PSC, place of performance, competition type, contract vehicle, small business/set-aside flags), scale ($750B+ FY2024, DOD 60-65%), top five contractors (Lockheed Martin/Boeing/RTX/General Dynamics/Northrop Grumman), small business set-aside system (8(a)/HUBZone/WOSB/SDVOSB with statutory goals), FPDS-NG vs. USASpending API comparison, Python snippet using spending_by_award and spending_by_category endpoints, SAM.gov UEI/debarment cross-reference, and journalist use cases for no-bid contracts, contractor concentration, September spending anomalies, and revolving door tracking.

  161. Q3Writing

    EIA electricity data deep-dive published

    Long-form article on EIA electricity sector data covering Form 923 (monthly plant-level net generation and fuel consumption), Form 861 (annual utility retail sales/revenues/pricing/customer counts/net metering), Form 860 (annual generator inventory — every generator nameplate capacity/prime mover/fuel type/lat-lon/operational status/retirement date), and EIA-930 Hourly Electric Grid Monitor (hourly generation by fuel type and BA interchange flows in near real-time, published with ~1-hour delay, covering 65 Balancing Authorities); the coal-to-gas-to-renewables transformation visible at the plant level (coal 52% of US generation in 2000 to 16% in 2023, natural gas 17% to 43%, wind near zero to 10%, utility-scale plus small-scale solar to 5.5%, nuclear stable at ~19-20%); EIA API v2 structure with free key, electricity routes (/electricity/retail-sales/, /electricity/facility-fuel/, /electricity/rto/daily-region-data/); ERCOT Texas grid isolation (largely outside FERC jurisdiction, not synchronously connected to Eastern or Western Interconnects) and the February 2021 Winter Storm Uri generation collapse (natural gas wellhead freeze-offs, wind turbine failures, nuclear offline — 246 deaths, $200B+ damages visible in EIA-930 hourly generation data); Python snippet using EIA API to pull monthly net generation by energy source 2010-2024 and plot stacked area chart of the energy transition; and cross-references to EPA Toxic Release Inventory for power plant air emissions, FERC energy enforcement, and CFTC Commitments of Traders for energy futures.

  162. Q3Writing

    CFPB consumer complaint database deep-dive published

    Long-form article on the CFPB Consumer Complaint Database: full schema (complaint ID, date received/sent, product/sub-product, issue, company, state/ZIP, submission channel, company response — closed with/without relief/in progress, consumer consent, narrative text), scale (5M+ complaints since July 2011), credit reporting surge (Equifax/Experian/TransUnion at 60%+ share driven by COVID forbearance errors, mixed files, identity theft disputes), company response and relief rates (15-day timely response standard, monetary vs. non-monetary vs. without-relief categories), access methods (bulk CSV and api.consumerfinance.gov/data/complaints), Python code for bulk load and paginated API fetch with product/company relief/timely stats and credit bureau outlier analysis, enforcement connection (complaint data as supervision input, 18-36 month enforcement lag, credit bureau consent orders), journalist use cases, and limitations (volume ≠ harm frequency, complaint ≠ violation).

  163. Q3Writing

    FINRA BrokerCheck data deep-dive published

    Long-form article on FINRA BrokerCheck: individual record contents (CRD number, registrations — Series 7/63/65/66/6/24/3, ten-year employment history, five disclosure categories with sub-types — customer dispute, regulatory action, criminal, civil, financial), firm record contents (ownership, registration, arbitration history), customer complaint statistics (30,000+ annually, 15-20% recovery rate), the recidivist broker/cockroach effect (Egan-Matvos-Seru research, 7% disclosure rate, re-hire patterns, high-disclosure firm concentration), BrokerCheck web interface and API endpoint and FINRA public data bulk downloads, Python snippet for name search and CRD-level disclosure tallying, IAPD gap for RIA-only advisers, ten-year lookback cutoff, expungement mechanics, and attorney/journalist use cases for systematic misconduct screening.

  164. Q3Writing

    SEC Form 4 insider trading disclosures deep-dive published

    Long-form article on SEC Form 4: Section 16(a) mechanics (Securities Exchange Act 1934 original requirement, Sarbanes-Oxley 2002 acceleration to 2-business-day filing deadline from prior month+40-day lag, EDGAR mandatory electronic filing August 2004), three forms (Form 3 initial beneficial ownership when becoming insider, Form 4 changes within 2 business days, Form 5 annual catch-up for exempt/overlooked transactions due 45 days after fiscal year end), full field schema (reporting person CIK as stable identifier vs. name strings, issuer CIK as join key to all other EDGAR filings, relationship checkboxes officer/director/10%-owner plus free-text title, transaction date vs. filing date distinction for event studies, security type common stock vs. derivatives table, transaction code critical letter, shares transacted, price per share zero for grants, direct D vs. indirect I ownership through trusts/LLCs, post-transaction shares owned for position reconstruction), transaction code taxonomy in full (P open-market purchase at prevailing price with personal capital — the highest-signal code, the insider is making a costly voluntary commitment; S open-market sale — informationally ambiguous because insiders sell for diversification/taxes/personal expenditures/pre-planned 10b5-1 programs not necessarily because bearish; A grant or award — compensation, no discretion, no signal; D disposition to issuer typically tax withholding; M option exercise; F payment of exercise price or tax withholding; G gift charitable or family; J other catch-all), 10b5-1 plan analysis (Rule 10b5-1 adopted 2000 — affirmative defense for trades under pre-established written plans adopted when not in possession of MNPI; Jagolinzer 2009 Accounting Review found insiders gaming plans: entering near expected positive news, exploiting short cooling-off period, executing immediately after; Cohen/Malloy/Pomorski 2012 Journal of Finance confirmed plan trades outperformed non-plan trades — the opposite of what non-informational framing predicted; December 2022 SEC amendments: 90-day cooling-off for officers/directors, 30-day for other insiders, mandatory disclosure of plan adoption/modification/termination in 10-Q and 10-K, limits on single-trade plans; 10b5-1 footnote parsing required to identify plan trades — no structured XML field), EDGAR access (company search, quarterly bulk index form.idx fixed-width at full-index/{year}/QTR{n}/, EDGAR submissions API at data.sec.gov/submissions/CIK{CIK}.json, EDGAR full-text search efts.sec.gov, 10 req/sec rate limit with User-Agent required; OpenInsider.com and Quiver Quantitative as secondary databases), academic research consensus (Seyhun 1992 Journal of Business: aggregate insider buying predicts broad market returns; Lakonishok & Lee 2001 Review of Financial Studies: purchases predict 3-6% abnormal returns especially small-cap, sales carry no significant predictive power; Jeng/Metrick/Zeckhauser 2003 Journal of Finance: 6% abnormal returns over 6 months; Cohen/Malloy/Pomorski 2012: opportunistic vs. routine decomposition — opportunistic purchases predict 8%+ 12-month returns; cluster buying — 3+ insiders within same company within 30 days — amplifies signal substantially), practical applications (quant factor investing, corporate governance monitoring of actual vs. compensation equity retention, investigative journalism for suspicious timing, M&A arbitrage tracking activist position building), Python snippet scanning EDGAR quarterly bulk index for P-coded direct officer purchases >$100K, extracting XML to get issuer/filer/date/shares/price, outputting ranked list, and cross-references to SEC EDGAR financials, Corporate Prosecution Registry for enforcement cases, and PCAOB audit oversight.

  165. Q3Writing

    FDA FAERS adverse drug event database deep-dive published

    Long-form article on the FDA Adverse Event Reporting System: seven-file quarterly schema (DEMO/DRUG/REAC/OUTC/RPSR/THER/INDI joined by PRIMARYID/CASEID), MedDRA five-level hierarchy (LLT → PT → HLT → HLGT → SOC with 27 SOCs and SMQs), reporting hierarchy (manufacturer 15-day alert requirement, ~90% manufacturer share, Weber effect, voluntary underreporting <1%), FAERS Public Dashboard and quarterly bulk ZIPs and OpenFDA API, disproportionality analysis (PRR and ROR formulas with a/b/c/d cells, EMA thresholds PRR≥2/chi-squared≥4/N≥3, FDA EBGM system), signal cases (rosiglitazone/Avandia 2007 black box, rofecoxib/Vioxx 2004 retrospective, SSRIs + adolescent suicidality 2004/2006 black box), CASEID vs. PRIMARYID deduplication (15-25% count reduction), Python PRR analysis snippet, and limitations (underreporting, notoriety bias, no denominator, confounding by indication).

  166. Q3Writing

    College Scorecard federal data deep-dive published

    Long-form article on the College Scorecard: data provenance (IPEDS, FAFSA, federal loan/Pell records linked to IRS earnings via NSLDS), institution-level metrics (completion rate, median earnings at 1/5/10 years, median debt for completers vs. non-completers, repayment rate, Pell share, loan rate, median family income, part-time/first-gen/independent/gender breakdown), earnings-debt gap framework ($40k debt / $40k earnings threshold, Gainful Employment rule context), for-profit sector patterns (completion gaps, debt vs. community college, low repayment rates, pre-closure signals), program-level CIP data (software vs. liberal arts vs. art earnings differentials, added 2020), Scorecard API access and bulk CSV, Python code for computing earnings-debt ratio by sector (public/private nonprofit/for-profit), journalist use cases for program-level accountability, and limitations (aid-population coverage bias, cohort timing lag, completion rate denominator).

  167. Q3Writing

    CISA Known Exploited Vulnerabilities catalog deep-dive published

    Long-form article on the CISA KEV catalog: catalog contents (CVE ID, vendor/project, product, vulnerability name, date added, description, required action, due date for federal agencies, notes), CISA addition criteria (must have CVE ID + credible active-exploitation evidence + clear remediation guidance), BOD 22-01 federal patching mandate (2-week critical / 6-month general deadlines for FCEB agencies), access methods (JSON and CSV at cisa.gov/known-exploited-vulnerabilities-catalog, updated frequently), catalog growth (287 CVEs at November 2021 launch → 1,000+ by 2023), top vendors by KEV count (Microsoft, Apple, Cisco, Adobe, Google plus legacy/industrial systems), KEV vs. CVSS comparison (active exploitation vs. theoretical severity), EPSS as complementary resource, Python snippet for downloading KEV JSON and joining to NVD CVE data for CVSS scores, and practical patching strategy (KEV as minimal patch list vs. full CVSS triage).

  168. Q3Writing

    USCIS H-1B visa data deep-dive published

    Long-form article on the H-1B visa dataset: DOL Labor Condition Application (employer, SOC code, job title, wage rate, prevailing wage level, worksite location — quarterly OFLC bulk disclosures) and USCIS H-1B Employer Data Hub (annual approval/denial counts by employer), H-1B mechanics (65,000 cap + 20,000 master's exemption, lottery, H-1B1 and E-3 cousins), true shape of the program (IT staffing companies Infosys/Tata/Cognizant/Wipro dominate, not big tech; India-born workers 70%+; IT/computer occupations 60%+ of LCAs), prevailing wage Level I-IV system and how staffing firms use Level I to undercut wages, Python snippet for downloading OFLC Excel and computing employer wage ratios vs. prevailing wage, and journalist methodology for investigating wage suppression and H-1B fraud.

  169. Q3Writing

    DOJ False Claims Act settlement database deep-dive published

    Long-form article on False Claims Act data: statutory mechanics (31 U.S.C. §§ 3729-3733, treble damages, per-claim penalties), qui tam relator mechanism (filing under seal, DOJ election to intervene or decline, 15-30% relator share), public data sources (DOJ annual FCA statistics PDFs and individual settlement press releases), sector breakdown (healthcare 70-80% of annual recoveries — upcoding/kickbacks/medically unnecessary; defense procurement #2; COVID relief fraud surge 2020-2023), $70B+ recovered since 1986, major cases (GlaxoSmithKline $3B 2012, Abbott $1.5B 2012, Purdue Pharma $8.3B 2020), qui tam statistics (relator-initiated cases risen from 30% to 80%+ of FCA activity), Python scraper for downloading and parsing DOJ press releases into a structured settlements database, and compliance/journalist use cases.

  170. Q3Writing

    NIH research grant data deep-dive published

    Long-form article on the NIH Reporter grant database: dataset schema (award number, PI name/institution, project title, abstract, total cost, fiscal year, funding IC, activity code, study section, keywords, clinical trial registration), activity code taxonomy (R01 flagship research, R21 exploratory, P01 program projects, U cooperative agreements, T training, K career development, SBIR/STTR), funding by IC (NCI ~20%, NHLBI, NIAID COVID windfall), geographic concentration (Boston, SF, Baltimore, NY, Durham), NIH Reporter API with Python snippet for disease-area search and institutional concentration mapping, indirect cost rate mechanics (50-70% overhead at major R1 universities), payline and study section peer review, R1 vs. HBCU funding disparities, and journalism/policy applications (COVID-19 tracking, opioid funding shifts, political targeting of research areas).

  171. Q3Writing

    CDC drug overdose mortality data deep-dive published

    Long-form article on CDC overdose mortality datasets: three waves of the opioid epidemic (Wave 1 prescription opioids 1999, Wave 2 heroin 2010, Wave 3 fentanyl 2013 — 70,000+ of 107,000+ total drug deaths in 2022), ICD-10 underlying cause codes (X40-X44 accidental, X60-X64 intentional, X85 assault, Y10-Y14 undetermined) and multiple cause T40.1-T40.4/T40.5/T43.6 for specific drug categories, CDC WONDER access (query interface, ten-death suppression rule, XML batch API), VSRR Provisional Counts (6-month lag, Socrata CSV endpoint, 12-month rolling format), geographic patterns (Appalachian concentration in Wave 1 vs. nationwide fentanyl dispersal), racial disparities (Black and Native American rates now exceeding white rates driven by fentanyl in stimulant supply), Python snippet using Socrata API to pull and pivot VSRR data, and policy applications (SAMHSA SOR grants, naloxone distribution targeting, syringe services, HRSA treatment capacity planning).

  172. Q3Writing

    FDIC bank failure data deep-dive published

    Long-form article on the FDIC bank failure list and call report data: full failure dataset schema (cert, charter class, faildate, fund — BIF/SAIF/DIF, transaction types P&A/IDT/OA, assets, deposits, resolution cost), three historical failure waves (S&L Crisis 1980–1994 with 3,000+ failures, GFC 2008–2012 with 500+ failures, 2023 SVB/Signature/First Republic episode — three of the 15 largest US bank failures in one year), deposit insurance limit history ($5K→$40K→$100K→$250K), BIF vs. SAIF vs. DIF mechanics, Texas Ratio formula (non-performing assets / tangible common equity + ALLL) with >100% predicting failure at >85% accuracy, FDIC BankFind Suite API and bulk CSV access, Python snippet for downloading failures and computing state-level failure rates by era, and multi-signal screening methodology (Texas Ratio, non-accrual %, Tier 1 capital, CRE concentration, duration mismatch).

  173. Q3Writing

    FDA warning letter database deep-dive published

    Long-form article on the FDA warning letter database: the five enforcement tool tiers (untitled letter < warning letter < import alert < consent decree < criminal referral), legal status of warning letters as warnings not orders, full dataset field structure (date, company, subject, issuing office, response deadline, Form 483 precursor), five violation categories (pharma cGMP, food HACCP, device 510(k)/MDR, dietary supplement NDIG/structure-function, clinical investigator), bulk access via scraping (no official CSV download), Chipotle 2016 norovirus and NECC 2012 compounding pharmacy meningitis outbreak case studies, Python scraping and analysis snippet, repeat violator detection methodology, and compliance team enforcement escalation forecasting.

  174. Q3Writing

    MSHA mine safety database deep-dive published

    Long-form article on the MSHA three-dataset system: Mine listing (mine ID, commodity, surface/underground, status, employment), Accident/Injury/Illness (every reportable accident since 1983 with injury type, days lost, narrative), and Violations (every citation since 1983 with law section, penalty, contested, S&S flag) — all linked by Mine ID at ard.msha.gov. Significant-and-substantial designation (Mathies Coal four-part test, elevated penalties, POV scrutiny), Pattern of Violations closure mechanism, Upper Big Branch disaster (29 killed, 2010 — pre-disaster violation record analysis), commodity breakdown (underground coal/surface coal/metal-nonmetal), Python join script computing S&S rate as a fatality leading indicator, post-2010 coal fatality decline vs. persistent metal/nonmetal hazards, and journalist investigative methodology for pre-accident violation records.

  175. Q3Writing

    USCG marine casualty database deep-dive published

    Long-form article on the USCG BARD and MCPD databases: BARD covering recreational boating accidents (state-reported, vessel type, operator sobriety/experience, primary cause, fatalities, injuries, property damage — threshold: death/injury/disappearance or $2,000+ damage), MCPD covering commercial vessel casualties (groundings, sinkings, fires, machinery failures, pollution incidents), key BARD findings (alcohol as primary cause 15%+ of fatalities, capsizing and flooding as leading incident types, life jacket non-use in majority of drowning deaths), geographic concentration (Great Lakes, Gulf Coast, Florida), NTSB marine investigation relationship, access methods (BARD annual statistics at uscgboating.org; MCPD via FOIA/NTSB Marine Accident Reports; MISLE system), Python snippet for analyzing BARD annual statistics, and journalist use cases for manufacturer defects and rental company safety records.

  176. Q3Writing

    FMCSA carrier safety ratings deep-dive published

    Long-form article on the FMCSA SAFER/MCMIS carrier database: the three official safety ratings (Satisfactory/Conditional/Unsatisfactory from compliance reviews), the seven SMS BASICs (Unsafe Driving, HOS Compliance, Driver Fitness, Controlled Substances, Vehicle Maintenance, HazMat Compliance, Crash Indicator — each as a percentile among similar carriers), carrier data fields (USDOT, MC/MX, fleet size, commodity, inspection counts, OOS rates), three access channels (SAFER Web, SMS Portal, bulk download), the ~20%/~5% vehicle/driver OOS benchmark, insurer and freight broker due diligence use cases, Sperl v. C.H. Robinson broker liability ($23M verdict), Python bulk screening workflow, and NTSB investigation linkage.

  177. Q3Writing

    CBP US trade statistics deep-dive published

    Long-form article on US import/export trade statistics: CBP entry document and AES export filing flows into Census Foreign Trade Division, the CIF vs. customs-value vs. FOB distinction, five data dimensions (commodity HTS, partner country, district/port, time, value/quantity), full HTS chapter hierarchy decoding methodology, the three-layer CBP→Census→FT-900 separation, access channels (USA Trade Online, Census Foreign Trade API, FRED, bulk downloads), top 10 trading partners and leading import/export HTS chapters, Section 301 China tariffs and Vietnam/Mexico diversion signals, Python workflow for downloading and computing diversion ratios, investigative journalism use cases (sanctions evasion, counterfeit goods, supply chain concentration), and limitations (de minimis gap, country-of-origin vs. country-of-shipment noise).

  178. Q3Writing

    ICE enforcement and removal operations data deep-dive published

    Long-form article on ICE ERO annual datasets: arrests/detentions/removals/returns definitional distinctions, dataset fields (country of origin, criminality, removal method, fiscal year, field office, violation type), access channels (ICE ERO PDFs, TRAC-ICE at Syracuse, DHS OIS Yearbook), removal trend timeline (Obama FY2013 peak 432k → Biden FY2021 59k), interior vs. border enforcement split and expedited removal under 8 U.S.C. § 1225(b)(1), criminal vs. non-criminal designation nuances, nationality composition shift (Central American triangle → Venezuela/China post-2020), sanctuary jurisdiction and ICE detainer data (Form I-247), detention facility statistics, TRAC-ICE analytical use cases, and Python code for downloading and analyzing removal trends by nationality.

  179. Q3Writing

    BLS CPI-U inflation database deep-dive published

    Long-form article on the Consumer Price Index for All Urban Consumers: CE Survey derivation and 87% urban scope, eight expenditure weight categories (housing ~36%, food ~14%, transportation ~15%, energy ~7%, medical ~7%), headline vs. core CPI (food/energy stripped), CPI-U vs. PCE deflator (weight methodology, substitution, scope), seasonal adjustment (SA vs. NSA, X-13ARIMA-SEATS), BLS API series IDs (CUUR0000SA0, CUUR0000SA0L1E), Python workflow for 10-year YoY inflation computation, the June 2022 9.1% peak and stimulus/supply-chain/energy causes, shelter inflation persistence via OER methodology (12-18 month lag vs. Zillow ZORI), Boskin Commission four bias types and BLS responses, and 8:30 AM release market impact mechanics.

  180. Q3Writing

    SSA disability award statistics deep-dive published

    Long-form article on SSDI and SSI disability award data: the two-program structure (SSDI requires work history, SSI is needs-based), data structure covering awards by state/diagnosis/age/gender/decision level, scale (8.3M SSDI + 7.5M SSI beneficiaries), the ALJ hearing backlog (average 2+ year wait at peak), geographic variation (West Virginia, Arkansas, Alabama highest per-capita rates), top disabling conditions (musculoskeletal + mental disorders ~60%), SSA Annual Statistical Supplement access, Python analysis of state-level award rates, and policy significance covering fraud detection, program integrity, and the SSDI Trust Fund solvency timeline.

  181. Q3Writing

    NLRB unfair labor practice case data deep-dive published

    Long-form article on the NLRB ULP case management system: Section 7 and 8 violation types (employer interference, domination, retaliation, bad-faith bargaining; union duty-of-fair-representation), the case lifecycle from charge filing through regional investigation to ALJ hearing and Board decision, data structure (docket number format, key fields, disposition codes), key statistics (20,000–25,000 annual charges, ~65% dismissed or withdrawn, ~1% reaching Board decisions), the 2022–2024 Starbucks and Amazon organizing surge reflected in multi-region charge concentration, NLRB case search API access, Python snippet for downloading and analyzing disposition rates by industry, and significance for journalists covering labor organizing and corporate accountability.

  182. Q2Writing

    BLS JOLTS job openings and labor turnover deep-dive published

    Long-form article on the Job Openings and Labor Turnover Survey: what JOLTS measures vs. the unemployment rate (demand-side labor market dynamics), survey design (~16,000 establishment respondents, December 2000 onward), the quits rate as a worker-confidence proxy (November 2021 Great Resignation peak of 3.0% / 4.5M quits/month), how the Fed used the job-openings-to-unemployed ratio (peaked at ~2.0 in 2022) to calibrate rate hikes, BLS Public Data API access with series ID structure, Python snippet using requests and pandas, industry breakdown (healthcare stable/moderate, leisure-hospitality highest quits 4–5.5%, professional services elevated openings), journalist use cases for covering labor market stories, and revision and seasonal adjustment caveats.

  183. Q2Writing

    FTC Consumer Sentinel Network fraud data deep-dive published

    Long-form article on the FTC Consumer Sentinel Network: scope and partner organizations (FTC, IC3/FBI, CFPB, SSA OIG, BBB, state AGs, CAFC), data structure (complaint category, reported amount, payment method, contact attempt, state, age range, media source, report date, data contributor), access channels (public Data Books vs. bulk record-level via FOIA/data-use agreement), top fraud categories (imposter scams, online shopping, identity theft with 2023 figures), payment method analysis (wire transfer, cryptocurrency, gift cards with median-loss context), age demographics (younger consumers report more, older consumers lose more money), state-level per-capita patterns (Florida, Georgia, Nevada, Delaware, Maryland), Python analysis snippet with four cross-tabulations, journalist and researcher use cases, and policy significance for FTC enforcement priorities.

  184. Q2Writing

    FRA railroad accident and grade crossing data deep-dive published

    Long-form article on the FRA Form 54 and Form 57 databases: two-form database overview, Form 54 structure (railroad/location/track class/train type/speed/cause codes), six-group cause taxonomy with subcodes (H human factors, T track, E equipment, S signal, M misc, W weather), Form 57 grade crossing fields (DOT crossing ID, highway user type, sight obstruction, warning device), 80% safety improvement since 1980s (wayside detectors, PTC, track geometry cars), Safety Data Portal access, three research use cases (track class derailment analysis, grade crossing risk modeling, East Palestine hazmat context), and cross-references to NTSB accident reports, PHMSA hazmat, and FRA track inspection data.

  185. Q2Writing

    PBGC terminated pension plan data deep-dive published

    Long-form article on PBGC trusteed pension data: single-employer vs. multiemployer program distinction, three termination types (standard/distress/PBGC-initiated), full field listing (plan name/EIN/CUSIP, termination type, participants, unfunded vested benefits, PBGC claims, funding ratio), major terminations (Bethlehem Steel $3.7B, United Airlines $3.2B, Delphi $6.2B), the 2024 maximum guarantee ($83,400/year at 65, airline pilot impact), three research use cases (industry NAICS concentration, funding ratio distribution analysis, geographic concentration), the multiemployer crisis (Central States, MPRA 2014, ARPA 2021 Special Financial Assistance), and cross-references to DOL Form 5500, SEC 10-K pension footnotes, and PACER bankruptcy records.

  186. Q2Writing

    USGS earthquake ComCat database deep-dive published

    Long-form article on the USGS ComCat earthquake catalog: global M2.5+ events back to 1900, all 11 data fields (id/time/lat-lon/depth/mag/magType/magSource/locationSource/azimuthalGap/minimumDistance/rms/nst/place), magnitude type comparison (Mw/Ml/Mb/Ms/Md with saturation limits), the Oklahoma induced seismicity surge 2009-2015 (Arbuckle formation, M3+ count peaking at 900/year, OCC restrictions), FDSN event API query parameters, three research use cases (fault trace mapping via DBSCAN clustering, induced seismicity statistical separation, Gutenberg-Richter recurrence interval), and cross-references to USGS ShakeMap/HAZUS, EPA UIC program, and NOAA storm events.

  187. Q2Writing

    EPA ECHO enforcement database deep-dive published

    Long-form article on EPA ECHO: four statutory programs (CAA, CWA, RCRA, TSCA), 800k+ regulated facilities, data structure (FACS_ID, SIC/NAICS, violation type, formal action taxonomy, penalty assessed vs. paid, SNC/HPV compliance status), the Significant Non-Compliance/High Priority Violator designation system, four access channels (facility search, bulk download, API, enforcement dashboard), three research use cases (SNC duration analysis, EJScreen environmental justice spatial join, penalty-to-violation ratio for chronic violators), state/federal delegated program split, TRI/RMP/press release cross-references, and four named limitations.

  188. Q2Writing

    NOAA Storm Events 60-year weather database deep-dive published

    Long-form article on the NOAA Storm Events Database: 50+ NWS event types, temporal coverage gaps (tornadoes back to 1950, most types fully recorded from 1996+), the K/M/B damage encoding quirk, tornado path fields (F/EF scale, path length/width, begin/end lat-lon for GIS mapping), three access methods (query tool, FTP bulk CSV, NOAA API), three research use cases (flash flood deaths spatial analysis, billion-dollar disaster trend tabulation, Tornado Alley vs. Dixie Alley shift by decade), and cross-references to FEMA NFIP flood claims, USGS NEHRP, and NWS SPC archives.

  189. Q2Writing

    IRS Form 990 political organizations dark money deep-dive published

    Long-form article on 527 and 501(c)(4) political organization disclosures: the two-tier structure (527 files Form 8871/8872 with IRS + FEC if federal; 501c4 files Form 990 with IRS only), why 501(c)(4)s are the dark money vehicle (no donor disclosure, primary purpose doctrine), Form 990 parts I/IV/VII/Schedule C/Schedule O, the IRS bulk XML S3 release (irs-form990 bucket, index files by year), ProPublica Nonprofit Explorer API, three research use cases (Schedule C overspend detection, executive compensation, pass-through revenue ratio), 527 database gaps, cross-references to FEC/FARA/SEC, and the 2012 e-file cliff.

  190. Q2Writing

    NTSB aviation accident database deep-dive published

    Long-form article on 60 years of US civil aviation accident data: accident vs. incident distinction, ~1,300-1,500 accidents/year, six key field categories (event/location, aircraft, operator, injury counts, phase of flight, weather/IMC), probable cause structure (pilot error 70%, mechanical 15-20%), GA vs. commercial 20x fatal rate gap, three access paths (query UI, AADS bulk CSV/MDB, investigative docket), three research use cases (VFR-into-IMC geography, aging aircraft maintenance trends, 1980-to-present fatal rate trends), cross-references to FAA Aircraft Registry and enforcement actions, and limitations (US civil only, incident incompleteness, probable cause not legally binding).

  191. Q2Writing

    PHMSA pipeline incident database deep-dive published

    Long-form article on 50 years of significant pipeline accident data across four PHMSA databases (gas distribution, gas transmission/gathering, hazardous liquid, LNG): the significant-incident threshold, data structure and key fields, seven-category cause taxonomy, three notable incidents (San Bruno 2010 8 dead/$1.6B penalty, Colonial Pipeline Alabama 2016, Enbridge Kalamazoo 2010), four data access routes, three research use cases (corrosion failure trends, third-party excavation damage, Gulf Coast geographic concentration), the penalty gap (proposed vs. final after consent agreements), and cross-references to NTSB/EPA ECHO/SEC 8-K.

  192. Q2Writing

    USDA SNAP participation data deep-dive published

    Long-form article on FNS SNAP data: monthly state-level participation table (1969-present), 57-year historical trajectory (1969 baseline through 2023 emergency allotment end), COVID emergency allotments ($142B, March 2020-February 2023), the February 2023 cliff (average $95/month loss), state variation via ABAWD waivers and broad-based categorical eligibility, three research use cases (recession response lag, 2023 cliff measurement, participation rate vs. poverty population by state), and cross-references to USDA WIC, Census ACS poverty, SNAP retailer data, and BLS CPI Food at Home.

  193. Q2Writing

    Census ACS 5-year tract data deep-dive published

    Long-form article on the American Community Survey: 5-year vs. 1-year estimate tradeoffs, geographic hierarchy from nation to block group, five key variable groups (income/poverty, housing, demographics, education, employment) with table variable codes, four access methods (Census API, data.census.gov, tidycensus/cenpy, bulk FTP), FIPS GEOID construction, three research use cases joining ACS to HMDA/OSHA/SNAP, four limitations including the 2020 differential privacy TopDown Algorithm impact on small-area counts.

  194. Q2Writing

    HUD FHEO fair housing complaint data deep-dive published

    Long-form article on HUD fair housing complaints: 7-9k complaints/year, disability at 55% of basis (ESA accommodation failures driving surge), race at 20%, the two-filing-path structure (HUD administrative vs. federal court), full data schema (basis, issue type, property type, respondent type, disposition, closure, monetary relief), three research use cases (geographic concentration of race complaints, lender vs. landlord by basis, conciliation rate by basis), and cross-references to HMDA, DOJ Housing Civil Enforcement Section, and CFPB ECOA.

  195. Q2Writing

    BJS National Prisoner Statistics deep-dive published

    Long-form article on the National Prisoner Statistics program: three data forms (NPS-1A annual stock, NPS-1B monthly, NPS-1C offense/demographics), custody vs. jurisdiction distinction, 1972-2009 incarceration boom (mandatory minimums, Truth in Sentencing, 1994 Crime Bill, drug war), post-2009 decline (fiscal pressure, Fair Sentencing Act, COVID decarceration), racial disparity trends and the 2012 race classification discontinuity, three research use cases (drug offense composition, parole revocations, state-by-state rate comparison), and cross-references to FBI NIBRS, EEOC, and USDA SNAP.

  196. Q2Writing

    OSHA inspection and citation enforcement deep-dive published

    Long-form article on the OSHA enforcement dataset: 2.5M+ workplace inspections since 1972, inspection types (programmed/complaint/referral/accident follow-up), the three-table structure (establishment/inspection/violation), the 40-70% penalty reduction gap between proposed and final penalties, the severe violator enforcement program (SVEP), egregious citations, construction Fatal Four analysis, agriculture/poultry/meatpacking underinspection patterns, federal contractor SAM.gov cross-reference, state-plan integration challenges, and OSHA Form 300 ITA companion data.

  197. Q2Writing

    DOL Wage and Hour Division enforcement deep-dive published

    Long-form article on the WHD WHISARD database: 300k+ concluded investigation records 2005-present, FLSA minimum wage/overtime/child labor, H-2A/H-2B temporary worker protections, FMLA, Davis-Bacon prevailing wage, back wages found vs. collected gap, repeat violator pattern, restaurant/agriculture industry breakdown, Davis-Bacon federal contractor compliance, and cross-reference with SAM.gov debarments, NLRB unfair labor practice filings, and EEOC discrimination charges.

  198. Q2Writing

    NHTSA FARS traffic fatality data deep-dive published

    Long-form article on the Fatality Analysis Reporting System: 1.1M+ US traffic deaths since 1975 across 30+ relational data files (accident, vehicle, person), key variable groups (crash date/location/roadway, vehicle type/make/model, person demographics/alcohol/restraint), 50-year trends (drunk driving down from 60% to 28%, pedestrian death surge since 2009), three research use cases (state pedestrian VMT-adjusted rates, teen vs. elderly driver patterns, large truck severity asymmetry), VIN-to-recall cross-reference methodology, and CRSS companion dataset for non-fatal crashes.

  199. Q2Writing

    FBI NIBRS incident-level crime data deep-dive published

    Long-form article on the National Incident-Based Reporting System: six segment types (administrative, offense, property, victim, offender, arrestee), Group A vs. Group B offenses, coverage expansion from 30% to 70% of US population 2016-2023 including California late adoption, three research use cases (drug offense rural vs. urban breakdown, clearance rate disparity by victim demographics, school/university incident tracking), and limitations (dark figure of unreported crime, voluntary participation bias, enforcement-intensity distortion for drug offense counts).

  200. Q2Writing

    FEC campaign finance bulk data deep-dive published

    Long-form article on the FEC bulk data portal: seven key files per two-year cycle (indiv, pas2, itpas2, oppexp, cm, ccl, cn), the 16-code committee type taxonomy with super PAC (O) and 527 group (I) classification, the structural gap between super PAC disclosure and 501(c)(4) dark money opacity, the SpeechNow.org v. FEC (2010) and Citizens United (2010) legal architecture, shell-company donor detection via EMPLOYER/ZIP/timing signals in the individual contribution file, the fuzzy donor entity resolution problem across cycles (NAME+ZIP+EMPLOYER+OCCUPATION), PAC-to-PAC money flow graph construction and Louvain community detection with NetworkX, and the OpenFEC REST API Schedules A/B/E endpoints. Cross-reference against IRS Form 990 via ProPublica Nonprofit Explorer for 501(c)(4) revenue gap analysis.

  201. Q2Writing

    EEOC employment discrimination charge data deep-dive published

    Long-form article on the EEOC charge statistics and FOIA-released charge-level data: the five statutes enforced (Title VII, ADEA, ADA, EPA, GINA), the two-tier data structure (public aggregate statistics at eeoc.gov/data and FOIA-released charge-level extracts), the 18% merit resolution rate and its analytical caveats, the $535M+ in annual conciliation recoveries without admission of wrongdoing, large employer repeat-appearance patterns and the 42 USC 2000e-6 pattern-or-practice authority, the post-ADAAA disability charge surge after 2009, the EEO-1 workforce composition FOIA release enabling charge-rate-per-employee computation, Python analysis of year-over-year basis trends and NAICS sector disability charge acceleration, and cross-reference opportunities with NLRB ULP cases, OSHA Section 11(c) retaliation complaints, and DOL OFCCP compliance evaluations for federal contractors.

  202. Q2Writing

    HHS-OCR HIPAA breach database deep-dive published

    Long-form article on the Wall of Shame: the 45 CFR Part 164 Subpart D statutory basis, the OCR portal and bulk CSV download, key fields (Covered Entity Type, Type of Breach, Location of Breached Information, Business Associate Involved, Web Description), the shift from laptop theft (2009–2014) to ransomware dominance (80%+ of affected individuals by 2022), Change Healthcare (190M records, 2024) as the largest breach in US history, business associate multiplier effect (MOVEit, Change Healthcare), settlement narrative mining (Anthem $16M, Premera $6.85M, MAPFRE $2.2M), Python analysis recipe for breach rate by covered entity type and hacking trend lines, and cross-reference with CISA KEV, HHS-OIG exclusions, and SEC 8-K cybersecurity disclosures.

  203. Q2Writing

    SBA PPP loan FOIA data analysis published

    Long-form article on the $793B PPP bailout dataset: 11.8M loans, the FOIA fight for borrower names, fraud signals (debarred contractors, nonexistent businesses, EIN mismatches), SAM.gov and IRS BMF cross-reference, DOJ prosecution record.

  204. Q2Writing

    STOCK Act congressional trading deep-dive published

    Long-form article on the Stop Trading on Congressional Knowledge Act (5 U.S.C. § 13104): where House Clerk scanned-PDF disclosures live, how Quiver Quantitative and Capitol Trades structure the data, the amount-range schema, the 45-day lag and non-compliance pattern ($200 fine), committee assignment cross-references (SSCI/HPSCI for COVID-19 briefing trades, Armed Services for defense stocks, Senate Commerce for CHIPS Act semiconductor trades), a Python CAPM event-study methodology for computing cumulative abnormal returns on congressional purchases, and the family-exemption and blind-trust gaps.

  205. Q2Data hub

    Federal Regulatory Data Hub — 230 datasets, 35M+ records, CC0 1.0

    Cross-agency regulatory catalog spanning SEC, FDA, OFAC, DOJ, EPA, CFPB, IRS, FEMA, CDC, NHTSA, FAA, CFTC, CMS, MSHA, OSHA and 40+ other agencies. Entity bridge joins every regulatory event for a company in one query. REST + MCP + JSON-LD + Markdown surfaces. 38+ MCP tools. CC0 1.0 universal license.

  206. Q2Writing

    EOIR asylum grant rate disparities write-up published

    Long-form article on the Executive Office for Immigration Review bulk datasets: asylum case outcomes by judge, nationality, and court; the documented 5%–90%+ judge-level grant rate spread for identical nationalities; TRAC Immigration cross-reference methodology; administrative closure distortion; Python analysis of grant rate coefficient of variation within courts; and cross-reference opportunities with DHS removal data, CBP encounter data, and UNHCR country-of-origin conditions.

  207. Q2Writing

    HMDA mortgage lending disparities deep-dive published

    Long-form article on the CFPB HMDA Snapshot National Loan-Level Dataset: 10M+ applications per year, Parquet and pipe-delimited CSV bulk download, action_taken schema (originated/approved/denied/withdrawn), derived_race/sex/ethnicity fields, denial_reason_1–4, LEI lender identifier. Income-stratified Black/white denial rate ratio analysis using pandas, tract-level join to ACS demographic data to surface redlining patterns, reverse redlining signal via rate_spread concentration in high-minority tracts, lender-level chi-squared significance testing, FHA/VA loan share steering analysis. The 2018 Kraninger rule rollback: credit score field exempted for ~1,700 smaller depositories. Cross-reference opportunities: CRA examination ratings, DOJ redlining settlements (Trustmark, City National, Trident), CFPB ECOA enforcement actions.

  208. Q2Writing

    CPSC recall database deep-dive published

    Long-form article on the CPSC product safety recall database: 9,800+ recalls since 1973, the voluntary recall process under 15 USC 2064, how manufacturers negotiate hazard language with CPSC staff, recall effectiveness data showing 10–30% median return rates, the SaferProducts.gov companion dataset, the 15 USC 2055(a) confidentiality gap that withholds pre-recall incident reports, hazard taxonomy (fire/burn, tip-over, entrapment, choking, carbon monoxide, lead/chemical), Python analysis of fire hazard year-over-year trend with SEC EDGAR cross-reference, furniture tip-over epidemic, lithium battery recall acceleration, and children's sleep product market restructuring.

  209. Q2Writing

    Seven dataset deep-dives: NLRB, PCAOB, USAID, ATF, Corporate Prosecution, DEA ARCOS, SEC 13F

    NLRB union election records: RC and RD cases, the 2021–2024 organizing surge, the 100k export cap workaround, industry breakdowns, cross-reference with OSHA and CFPB. PCAOB deficiency data: 26% of Big 4 audits reviewed in 2023 carried Part I.A findings — auditors signed off without sufficient evidence. USAID foreign assistance: foreignassistance.gov archived before DOGE removal, $1.5T disbursements 2001–2024 reconstructed. ATF Federal Firearms Licensees: 75k active dealers monthly CSV, Tiahrt Amendment redactions mapped, geographic cross-reference. Corporate Prosecution Registry (Duke/UVA): every federal DPA/NPA/plea since 1990 including agreements DOJ refused to release under FOIA. DEA ARCOS: 380M controlled-substance transaction records from MDL 2804 discovery — pill distribution by county, manufacturer, pharmacy. SEC Form 13F: 6,000 institutional managers, quarterly position disclosures, CUSIP-to-ticker join methodology.

  210. Q2Writing

    FARA, NFIP flood claims, and pharma payment map — three new dataset deep-dives

    FARA foreign agent registrations: the DOJ bulk endpoint buried in Oracle APEX, daily CSV of every DC lobbying firm registered for a foreign principal, cross-reference opportunities with LDA lobbying, OFAC SDN, and Congressional roll-call votes. FEMA NFIP flood claims: 2.7M paid claims, repetitive-loss properties paid out more than their assessed value, ZIP-level resolution after 2019 address redaction, climate signal in yearOfLoss distribution. CMS Open Payments + Medicare Part D join: 100M+ pharma payment records cross-referenced with physician prescribing patterns, ProPublica Dollars for Docs methodology, LEIE exclusion cross-reference.

  211. Q2Writing

    Compliance screening deep-dive: 30+ lists, 0–100 risk score

    Long-form article on the screening endpoint: list severity multipliers, recency decay, entity resolution pipeline. Covers OFAC SDN/Non-SDN, FinCEN, SAM, OIG, SEC, CFPB, FDIC, OCC, FINRA, CFTC, PCAOB, DOJ, EPA, MSHA, OSHA, NHTSA, FDA warning letters, UFLPA, BIS, Trade.gov CSL, CISA KEV.

  212. Q2Writing

    Compliance entity resolution write-up published

    Long-form article on how the Federal Regulatory Data Hub resolves entity identity across 30+ compliance lists to reduce false positives: three-stage pipeline (identifier join 34%, FTS5 canonical name 41%, Jaro-Winkler fuzzy 18%), false positive taxonomy (same-name different entity 47%, subsidiary-parent 28%, historical name 16%, transliteration 9%), EntityResolutionResult confidence-to-action mapping (MATCH ≥0.90, PROBABLE_MATCH 0.72–0.90, POSSIBLE_MATCH 0.60–0.72), 99.1% recall, 98.7% precision at ≥0.90 threshold, and weekly analyst-feedback calibration loop with PSI drift detection.

  213. Q2Writing

    Federal entity name matching write-up published

    Long-form article on the name-matching pipeline behind the Federal Regulatory Data Hub entity bridge: 34% of cross-dataset links have no shared identifier and require name matching; NFKC normalization + legal suffix stripping pipeline; OFAC alias explosion (44K aliases from 12K SDN entries); SEC Exhibit 21 subsidiary mapping; three-pass matching (exact → Jaro-Winkler ≥ 0.88 → TF-IDF cosine ≥ 0.72); 1.4% combined false positive rate; sanctions evasion heuristics (character substitution, token reordering, jurisdiction hop); entity_confidence field and 0.7× risk score multiplier for fuzzy matches.

  214. Q2Writing

    Canonical entity IDs write-up published

    Long-form article on how the Federal Regulatory Data Hub generates and maintains stable canonical IDs across 197 federal datasets: deterministic SHA-256 ID generation from entity type and normalized name, EntityVersion history for merge events (0.3% of ingestion batches) and split events (<0.01%), EntityAlias table with AKA/FKA/NFE/PHONETIC types, entity_id_mapping lookup for source ID to canonical_id (1.8ms), and subscriber continuity guarantees when source identifiers change.

  215. Q2Writing

    Cross-agency entity graph deep-dive published

    Long-form article documenting the entity_master bridge: three-pass ID resolution across CIK, ticker, UEI, LEI, DUNS, NPI; TF-IDF fuzzy matching for free-text datasets (OFAC, EPA, DOJ); parallel D1 query fan-out with <200ms edge latency.

  216. Q2Writing

    Regulatory change alerts write-up published

    Long-form article on how the Federal Regulatory Data Hub detects regulatory record changes and delivers near-real-time webhooks to compliance subscribers: conditional GET polling for OFAC SDN (10-minute window), bulk-file hash delta for SAM.gov exclusions (30-minute window), RSS polling for EDGAR 8-K filings (10-minute window), HMAC-SHA256-signed Cloudflare Queue delivery with at-least-once semantics, per-entity subscription filters via the entity bridge, entity_name_patterns fuzzy matching, batched delivery for bulk sanctions packages, and idempotency_key deduplication for retry-safe processing.

  217. Q2Writing

    Entity subscription layer write-up published

    Long-form article on how the Federal Regulatory Data Hub lets compliance teams subscribe to regulatory events for specific entities: EntitySubscription model (entity_master_id pinned vs. pattern-based), EntityChangeEvent payload with cross-agency context, severity scoring table (ofac_sdn=10, sam_debarment=8, epa_echo_violation=4), cross-list fan-out via Cloudflare Queue, O(1) pinned lookup vs ~12ms pattern resolution latency, bulk portfolio monitoring across 200 counterparties, RSS/Atom feed channel with ETag, and four-tier subscription quota model.

  218. Q2Org

    Site rebuild · governance + methodology published

    Public methodology and governance pages; press kit; intel feed; full Schema.org graph; RSS / JSON syndication. Intel briefs, structured data coverage, and cross-linking across all three flagship projects.

  219. Q1Data hub

    Regulatory API v1 — cross-agency entity bridge, 150 datasets

    Public launch of api.ai-analytics.org. Cross-agency entity bridge keyed on CIK, ticker, UEI, LEI, DUNS, NPI. Compliance screening across 30+ enforcement lists in one GET. MCP server with 38+ tools. EDGAR, openFDA, OFAC, EPA ECHO, SAM.gov, USAspending, FinCEN, FDIC, CMS, CDC, NIST NVD, CISA KEV.

  220. Q1Writing

    Federal Regulatory API design write-up published

    Long-form article covering the api.ai-analytics.org design: no-auth CC0 REST endpoints, cross-agency entity resolution in one GET, 38+ MCP tools for Claude and GPT agent workflows, JSON-LD structured data, /today.md and /llms.txt AI-facing surfaces, and the Cloudflare edge caching strategy.

  221. Q1Writing

    Regulatory data MCP server write-up published

    Long-form article on building the regulatory data MCP server: 38 tools across screening, entity lookup, sanctions intelligence, and dataset access; screen_entity Zod schema with confidence_threshold; Claude Desktop config; three rate-limit tiers (free 60/min, api_key 300/min, enterprise 1200/min); and the eight tool categories from compliance screening to real-time change webhooks.

  222. Q1Writing

    Federal dataset ingest ETL write-up published

    Long-form article on the ETL pipeline behind the Federal Regulatory Data Hub: three source categories (structured REST APIs, bulk file downloads, HTML scrapers), OFAC SDN conditional GET with ETag, SAM.gov paginated delta sync, schema drift detection with BREAKING vs ADDITIVE classification, record-level hash-based delta detection for bulk sources, per-source retry budgets and staleness alerting, and the ingest timing nuances (OFAC updates at 10am ET; CISA KEV at any time).

  223. Q1Writing

    Swarm SDK double ratchet write-up published

    Long-form article on how the Swarm SDK implements the Double Ratchet algorithm for drone-to-drone messaging: adapting Signal Protocol's KDF chains for ML-KEM-768 post-quantum initial key exchange (encapsulation ratchet replacing the ECDH step), the root/sending/receiving chain KDF structure with HKDF-SHA-256, header encryption to hide ratchet state from passive observers, out-of-order message handling via a sliding key cache (100-key lookahead, 200-message eviction), MAVLink v2 TUNNEL framing, and performance benchmarks on STM32H7 (1.8ms full encrypt) and Jetson Nano.

  224. Q1Writing

    Swarm SDK Sealed Sender write-up published

    Long-form article on the Sealed Sender implementation in Swarm SDK: recipient-issued SenderCertificate (Ed25519-signed, 48-hour TTL), SealedSenderEnvelope construction with ephemeral X25519 key per message, HKDF-SHA-256 with "SealedSender_v1" domain label, AES-256-GCM encryption, zero relay-visible sender field, four decryption failure modes (DecryptionError, CertificateExpired, CertificateSignatureInvalid, SenderKeyMismatch), CertificateCache with CERTIFICATE_REFRESH control message, and integration with Sender Keys group messaging for per-recipient sealed envelopes.

  225. Q1Writing

    Swarm SDK v0.3 feature deep-dive published

    Long-form technical write-up on the three v0.3 capability areas: Sender Keys for O(1) group encryption (SenderKeyState with chain key ratchet, 0.7ms encrypt on STM32H7 vs. 1.8ms Double Ratchet, 23× reduction for 32-drone broadcasts); Sealed Sender using ML-KEM-768 encapsulation to hide drone identity from mesh participants (+1,108 bytes per message); and deniable HMAC authentication (HMAC-SHA-256 over Double-Ratchet-derived key, neither party can prove the other generated the MAC). 127 new tests, 302 total.

  226. Q2Writing

    Swarm SDK key rotation write-up published

    Long-form article on automated cryptographic material refresh in field-deployed drone meshes: RotationScheduler Rust struct with SPK 7-day timer, OTP replenishment at <20 keys, staggered mesh rotation via device_id.hash_u64() % 86_400 jitter, BKPSRAM zeroization with 0xFF pattern verification, KeyRevocationAnnouncement gossip flood with TTL=infinity, and IK rotation 7-day overlap window procedure.

  227. Q1Writing

    Swarm SDK key management write-up published

    Long-form article on cryptographic key management for drone fleets: on-device ML-KEM-768 + X25519 identity keypair generation at provisioning, three-tier Fleet CA hierarchy (Root CA on air-gapped HSM → Fleet CA on ground station → 90-day device certificates), pre-provisioned mission cert bundles for offline peer authentication, 7-day signed prekey rotation broadcast over the gossip mesh, in-flight device revocation via authenticated RevocationMessage with gossip propagation (~30-second swarm-wide), and emergency wipe procedure with SRAM scrub (180ms on STM32H7).

  228. Q1Writing

    Swarm SDK device enrollment write-up published

    Long-form article on how a new drone goes from factory state to trusted mesh participant: factory-provisioned ML-KEM-768 + X25519 identity keypairs (FactoryProvisionedIdentity struct), Ed25519 factory-signing-key signature on the CSR, Fleet CA 90-day device certificate (cert_serial + FleetCaSignature), three enrollment transport options (USB tether, provisional RF, air-drop batch), SignedPreKeyBundle gossip mesh announcement with EnrollmentAnnouncement, pioneer bootstrap for the first device in a fleet, and re-enrollment flow at certificate expiry.

  229. Q1Writing

    Swarm SDK X3DH session establishment write-up published

    Long-form article on how the Swarm SDK establishes drone-to-drone sessions using Extended Triple Diffie-Hellman: PrekeyBundle structure (identity key, SignedPreKey, OneTimePreKey list, DeviceCertificate), four DH operations with HKDF-SHA-256 derivation, OTP single-use consumption and auto-replenishment (refresh_threshold=3), PQ adaptation replacing DH1 with ML-KEM-768 encapsulation, Fleet CA bundle verification chain, handoff from X3DH shared secret to Double Ratchet initial state, and STM32H7/Jetson Nano benchmarks (PQ X3DH init p50 62ms/6.8ms).

  230. Q1Writing

    Swarm SDK prekey management write-up published

    Long-form article on OneTimePreKey lifecycle in the Swarm SDK: batch generation of 100 OTPs using STM32H7 TRNG (0.25ms, BKPSRAM 64-byte per-key layout), PreKeyBundleAnnouncement gossip flood (TTL=7, dedup by device_id+prekey_id), OtpConsumed gossip message for remote consumption tracking, optimistic consumption to prevent double-use, OTP exhaustion fallback to SignedPreKey (SessionInitResult::NoOtpAvailable, forward-secrecy degradation logged), 7-day SPK rotation with 0xFF SRAM zeroization, late-joiner BundleRequest/BundleResponse relay, and per-device storage budget (6.4KB OTP in BKPSRAM, ~192KB peer cache in SRAM1).

  231. Q1Writing

    Swarm SDK gossip mesh write-up published

    Long-form article on the gossip mesh protocol inside the Swarm SDK: GossipNode with VecDeque<MessageId> deduplication (capacity 1000), k=3 bounded fanout with 250ms ± 50ms gossip interval, TTL=7 hop limit, GossipMessageHeader with sender_id and hop_count, CausalMessage Lamport clock ordering for key management messages, hop-count propagation bounds (N=64 → 4 hops), anti-entropy reconciliation via AntiEntropyDigest (200 IDs) and response payloads on a 5-second cadence, ISOLATED mode and 1MB outgoing buffer for partition handling, and STM32H7 benchmarks (gossip tick p50 3.2ms, dedup p50 0.8ms).

  232. Q1Writing

    Swarm SDK mesh transport write-up published

    Long-form article on the MeshTransport reliability layer for drone RF links: sliding window ARQ (Go-Back-N base + selective ACK extension), DashMap peer_states and BTreeMap retransmit_queue (deadline-keyed), EWMA RTT estimation (alpha=0.125), 25-byte DATA frame header with sequence number and SACK bitmap, 228-byte payload matching MAVLink MTU, 3-retry retransmit cap with peer disconnect on exhaustion, transparent fragmentation for multi-frame messages (EnrollmentAnnouncement=2 frames, SealedSenderEnvelope=5–6 frames), multi-channel bonding across 2.4GHz and 5.8GHz radios, and STM32H7/Jetson Nano benchmarks (throughput 82% efficiency at 5% loss, <1% at 60%).

  233. Q2Writing

    Swarm SDK v0.4 feature deep-dive published

    Long-form technical write-up on the four v0.4 capability areas: Situational Awareness API (signed position broadcasts, sensor fusion, dead-reckoning); EW Coordination protocol (EwEvent messages, anti-replay, frequency hop plans); Adversarial Resilience (traffic morphing to 6 fixed sizes, ±15% timing jitter, store-and-forward ring buffer, degraded-channel mode); and RF Fingerprinting (IQ sample analysis, passive emitter triangulation, commercial drone radio signature library). 163 new tests, 465 total.

  234. Q2Writing

    Swarm situational awareness write-up published

    Long-form article on how the swarm coordination layer maintains a shared operational picture: Ed25519-signed 124-byte position broadcast frames (60-byte PositionFrame #[repr(C, packed)] + 64-byte signature), Extended Kalman Filter fusing GPS/IMU/barometric altitude into a 15-DOF state estimate with HDOP-scaled noise, dead-reckoning for up to 90 seconds without GPS (DR_TIMEOUT_SECS=90, 0.04 × t² uncertainty), adjusted_separation_m() with 3-sigma RSS uncertainty, and a probabilistic gossip protocol (FORWARD_PROBABILITY=0.60, MAX_HOPS=4, SEEN_CACHE_SIZE=512) achieving 94.2% frame delivery at 340ms median latency across a 2km × 2km field deployment.

  235. Q2Writing

    Swarm SDK embedded Rust write-up published

    Long-form article on porting the Swarm SDK cryptographic core to no_std Rust on the STM32H7 Cortex-M7: feature-gated Cargo feature flags for std/embedded builds, 96KB static heap with cortex-m-alloc (CortexMHeap, [MaybeUninit<u8>; 98_304]), pre-allocated VecDeque deduplication ring (capacity 1000), in-place AES-GCM encryption via AeadInPlace to avoid heap allocation, hardware CRYP peripheral integration (0.14ms vs. 0.61ms software, 4.3×), memory.x linker script (DTCM stack / SRAM1 heap / SRAM4 key material), and binary size optimization from 1.2MB to 284KB (opt-level="z", lto=true, codegen-units=1, panic="abort", strip="symbols").

  236. Q1Writing

    Swarm SDK MAVLink v2 integration write-up published

    Long-form article on how the Swarm SDK wraps post-quantum encrypted mesh traffic in MAVLink v2 SWARM_MESH_FRAME messages: 18-byte SwarmFragHeader design (distribution_id, fragment_index, fragment_count), 235 bytes usable ciphertext per frame, per-message reassembly buffer with 5-second TTL, PX4 uORB subscription, ArduPilot SERIAL proxy, MAVSDK MavlinkPassthrough, and fragment count by message type (routine SenderKeyMessage = 1 frame; SealedSenderEnvelope = 6 frames due to 1,088-byte ML-KEM-768 ciphertext).

  237. Q1Writing

    Swarm SDK operational security write-up published

    Long-form article on Swarm SDK traffic analysis resistance: MessageSizeBin enum (TINY_64/SMALL_128/MEDIUM_256/LARGE_512/XLARGE_1024/JUMBO_2048 bytes), bin_for_size() padding function, JitterConfig struct (base_interval_ms=1000, jitter_fraction=0.15, max_backlog=32), next_tx_time_ms() using STM32H7 TRNG, StoreForwardBuffer ring (VecDeque capacity 128) with token-bucket rate limiting, FramePriority enum (CRITICAL/HIGH/NORMAL/LOW), OperationalMode state machine (Normal to Degraded at >70% loss for >30s, EmergencyBeacon sends only position+auth every 60s), and jitter increase to +/-30% in degraded mode.

  238. Q1Writing

    Swarm SDK message framing write-up published

    Long-form article on how the Swarm SDK wire-formats, fragments, and packs encrypted messages for MAVLink v2 transport: the 16-byte SwarmFrameHeader (magic [0x57,0x00], version, frame_type, sequence, total_frames, frame_index, payload_len, message_id, CRC-16), 237-byte max payload, fragment_message() Rust implementation, ReassemblyState keyed on (source_device_id, message_id) with 5-second timeout, CONTROL frame HMAC-SHA256 authentication, ACK/NACK message-level protocol, and STM32H7 benchmarks (fragment 200B → 0.09ms, HW AES-GCM → 0.14ms).

  239. Q1Writing

    Swarm SDK cryptography design — published

    Long-form technical write-up of the Swarm SDK cryptographic architecture: ML-KEM-768 + X25519 hybrid key exchange, Double Ratchet forward secrecy, Sender Keys for O(1) group encryption, gossip mesh routing, and the CNSA 2.0 compliance mapping.

  240. Q1Swarm

    Swarm SDK v0.4

    Situational Awareness, EW Coordination, Adversarial Resilience, RF Fingerprinting & Tracking. 163 new tests (465 total).

  241. Q1Coverage

    Voidly coverage hits 200 countries

    37+ probe nodes spanning every continent. 80-domain test list. 5-minute scan cadence. Cross-source verification against OONI / CensoredPlanet / IODA.

  242. Q1Tooling

    MCP server — 83 tools

    voidly-ai/mcp-server enables Claude / GPT / agent frameworks to query the censorship dataset directly.

2025

76 milestones
  1. Q4Writing

    Real-time event pipeline write-up published

    Long-form article on Voidly's probe-to-alert pipeline: inline anomaly scoring under 50ms, parallel async OONI and IODA API queries, the confidence-threshold publication ladder (Observed → Corroborated → Verified), the two-window alert-fatigue guard that prevents false-alarm flooding, and the nightly CensoredPlanet retroactive pass that fills the batch-export coverage gap.

  2. Q4Writing

    Voidly probe run lifecycle write-up published

    Long-form article on the complete probe-side execution path: MeasurementTask struct (domain, protocols, priority, jitter_ms, scheduled_window, expected_control_resolver), four measurement phases (DNS DnsResult with anomaly flag, TCP TcpResult with rst_injected, TLS TlsResult with dual-SNI probing, HTTP HttpResult with body_simhash and blockpage_match), ProbeResult assembly and Ed25519 signing, zstd level-1 batch compression (50 results OR 5-minute timer), QUIC upload with 30s/5m/20m exponential backoff. p50/p99 timing: DNS 22ms/180ms, TCP 28ms/320ms, TLS 180ms/850ms, HTTP 380ms/1800ms.

  3. Q4Writing

    Voidly probe networking write-up published

    Long-form article on how Voidly probes stay connected and upload data from hostile networks: QUIC/443 transport (indistinguishable from HTTPS, passes CG-NAT), domain fronting via CDN SNI for environments that block the ingest hostname, TLS certificate pinning (SPKI SHA-256) against ISP MITM interception, three multiplexed QUIC streams (heartbeat, measurement upload, telemetry), local SQLite buffer (500 MB cap, 48h TTL) for disconnection resilience, exponential backoff reconnect, zstd compression (3.2× ratio), and metered-connection data budget tracking.

  4. Q4Writing

    Voidly probe local buffer write-up published

    Long-form article on how Voidly probes preserve measurement data during upload failures: 72-hour SQLite ring buffer with 50,000-row cap and anomaly-safe eviction, LZ4 batch compression (47KB → 9KB median), exponential backoff retry (30s → 4h), priority queue for anomalous measurements, per-chunk Ed25519-signed upload with per-chunk acknowledgment for partial delivery resumption, and 0.003% measurement loss rate across 37 probes over 6 months.

  5. Q4Writing

    Voidly probe architecture deep-dive published

    Long-form write-up on the Tauri 2 + boringtun + tun-rs desktop probe application: userspace WireGuard with Cloudflare's boringtun, tun-rs TUN device (utun on macOS, /dev/net/tun on Linux, Wintun on Windows), X25519-Dalek on-device key generation stored in OS keychain, and the operator-safety design constraints that shaped every technical decision.

  6. Q4Writing

    Probe test runner deep-dive published

    Long-form article on the Rust async engine that orchestrates concurrent measurements inside the Tauri probe: tokio Semaphore with 3 permits, MeasurementState machine (Pending → Running → Success/Error/Timeout), per-layer timeout budgets (DNS 3s, TCP 5s, TLS 8s, HTTP 15s, total 30s), tokio::time::timeout wrapping each protocol layer, Ed25519 measurement signing before upload, mpsc channel upload queue (capacity 200), zero retries for NetworkError/DnsError, one retry for Timeout, and Prometheus-style counters for the Tauri dashboard.

  7. Q4Writing

    HTTP/HTTPS measurement lifecycle write-up published

    Long-form article on how Voidly probe tests work at each protocol layer: DNS resolution with CDN-aware addr_match comparison, TCP RST injection detection (fast RST timing vs. expected RTT), TLS handshake capture with certificate chain fingerprinting and MITM detection, HTTP GET with body_sha256 fingerprinting against the 2,300-entry block page library, response timing anomaly detection (TTFB z-score, body truncation), and the ControlDelta struct mapping each layer to the anomaly classifier's 47 input features.

  8. Q4Writing

    TCP measurement write-up published

    Long-form article on how Voidly measures TCP-layer censorship: TcpResult struct (connected, connect_time_ms, rst_received, rst_timing_ms, is_injected_rst, null_routed, icmp_unreachable, control_connect_time_ms, connect_time_delta_ms), RST injection classification threshold of 15ms derived from 1.2M confirmed injection events vs 800K clean connections, null-routing detection (5-second timeout with no RST and no ICMP), connect_time_delta >200ms triggering ROUTE_ANOMALY for transparent-proxy detection, dual-IP probing to 8.8.8.8 for RstSource classification (SniBased/IpBased/Unknown), and p50/p99 timings of 28ms/320ms.

  9. Q4Writing

    Control server methodology write-up published

    Long-form article on how Voidly distinguishes censorship from network errors using distributed control servers: DNS comparison with CDN split-horizon handling, TLS MITM interception detection via cert chain analysis, TCP RST injection vs. null-routing identification, HTTP block page fingerprint matching (~2,300 known block-page hashes across 80 countries), control failure handling across US-East/EU-West/AP-East nodes, and the mapping from ControlComparison fields to anomaly classifier input features.

  10. Q4Writing

    Bandwidth throttling measurement write-up published

    Long-form article on how Voidly detects bandwidth throttling — the hardest interference class because it shares timing signatures with ordinary congestion: the TimingFeatures Rust struct (dns_latency_zscore, tcp_connect_zscore, ttfb_zscore, body_truncated, rst_during_body, throughput_bps, throughput_ratio_vs_control), compute_ttfb_zscore() Python function computing Z-score against the per-domain rolling control distribution, classify_throttling_signals() Rust function combining signal evidence, the calibration problem distinguishing Russia's TSPU from legitimate congestion, cross-probe corroboration (throttling_corroboration_score requires 3+ probes seeing consistent signals), and country patterns across Russia TSPU deep-packet inspection, Iran ARRS rate limiters, India WhatsApp throttling, and China video-platform rate limiting.

  11. Q4Writing

    DNS injection detection write-up published

    Long-form article on how Voidly detects DNS injection in censored networks: DnsTestResult dataclass, three control resolvers (Cloudflare DoT, Google DoH, Voidly authoritative), four weighted detection signals (IP divergence +0.4, TTL anomaly +0.3, source IP divergence +0.25, response timing +0.05), per-country injection rates (China 94%, Iran 61%, Russia 12%, Turkey 8%), CAP_NET_RAW privilege handling for raw socket capture, anycast false-positive calibration (4.2% → 0.8%), and pipeline integration with the anomaly classifier.

  12. Q4Writing

    Probe health monitoring write-up published

    Long-form article on how Voidly monitors 37+ probe nodes: 60-second heartbeat on separate HTTPS transport, DEGRADED/OFFLINE state machine (5-minute → DEGRADED, 15-minute → OFFLINE), measurement quality scoring via compute_quality_score() (measurement_rate, error_rate, control_reachability, dns_response_distribution), ASN coverage SLOs (≥2 standard, ≥4 high-risk countries), flapping detection (>3 transitions in 2-hour window → confidence capped at CORROBORATED), classify_offline_cause() algorithm for distinguishing probe failure from ISP-level censorship, and automated replacement from standby operator waitlist.

  13. Q4Writing

    Incident clustering and deduplication write-up published

    Long-form article on how Voidly deduplicates thousands of probe measurements into discrete censorship incidents: the four-tuple clustering key (country_code, domain, interference_type, probe_type_group), the 6-hour gap rule and its calibration against 400 ground-truth incidents from 2023–2024, the full incident lifecycle (ANOMALY → CORROBORATED → VERIFIED → RESOLVED), incident_id assignment stable under retroactive reprocessing, the 12-hour re-open window, retroactive CensoredPlanet batch alignment, and edge cases (BGP 24h gap, flapping detection, false resolution recovery).

  14. Q4Writing

    Incident timeline reconstruction write-up published

    Long-form article on how Voidly reconstructs the authoritative timeline of a censorship incident from distributed probe measurements: IncidentEvent sourcing model (FIRST_DETECTED, PROBE_CONFIRMED, ESCALATING, PEAK, DEESCALATING, RESOLVED, RETROACTIVE_START), temporal alignment using per-measurement probe_local_offset_secs, confidence weighting (>=3 probes in >=2 ASNs = CONFIRMED), retroactive revision from CensoredPlanet batch data (median -23 minute adjustment), duration statistics by type, and the /incidents/{id}/timeline REST API endpoint.

  15. Q4Writing

    Swarm SDK architecture overview published

    Long-form article covering the three-layer Swarm SDK architecture: Gossip Mesh (k=3 fanout, 1000-ID VecDeque dedup, Lamport clocks, TTL=7, anti-entropy reconciliation), Cryptographic Core (X3DH + ML-KEM-768 hybrid, Double Ratchet, Sealed Sender, Sender Keys, Deniable HMAC), and HAL/Transport (MAVLink v2, 237-byte payload, ARQ sliding window, no_std embedded). Target hardware: STM32H7 Cortex-M7 (primary) and Jetson Nano. Binary size 284KB (opt-level="z", lto=true). Five formal security properties including forward secrecy and post-compromise security.

  16. Q4Writing

    Incident resolution methodology write-up published

    Long-form article on how Voidly determines that a censorship incident has ended: RESOLUTION_THRESHOLDS by interference type (dns_tamper=4, http_blocking=4, tls_interference=3, throttling=6, bgp_withdrawal=1 consecutive passing measurements with p_blocked < 0.3), the 12-hour RESOLVED_PENDING re-open window, FLAPPING state (≥4 alternations in 2-hour window, 90-minute calm exit, confidence capped at CORROBORATED), cross-source resolution requirements for VERIFIED incidents, and observed resolution time distributions (BGP median 4.2h, DNS 8.4 days, HTTP 12.1 days, TLS 3.1 days, throttling 6.2h).

  17. Q4Writing

    Voidly real-time anomaly scorer write-up published

    Long-form article on embedding ONNX Runtime inside an Apache Flink streaming job for 50K events/sec anomaly scoring: AnomalyScorerJob with 16 task slots and keyBy(country:asn), OnnxScoringOperator using Java ThreadLocal<OrtSession> with intraOpNumThreads=1 and OptLevel.ALL_OPT, OnnxTensor per-feature inputs and probs[0][1] censorProb extraction, Kafka partition alignment (64 partitions / 16 slots = 4 per slot using murmur2 on country_code:asn key), 5ms/16-record mini-batch accumulation for 4,000 events/sec per slot throughput, and native Flink backpressure throttling consumer on overload. Peak: 47K events/sec at 97ms p99 end-to-end.

  18. Q4Data hub

    Regulatory data pipeline — D1 ingestion for 130+ federal datasets

    Built Cloudflare D1 + Workers daily ingest pipeline: EDGAR FTP, openFDA API, OFAC XML, EPA ECHO, SAM.gov, USAspending, FinCEN, FDIC BankFind, CMS provider files, NIST NVD, CISA KEV, NTSB, NHTSA, FAA, MSHA, OSHA. Entity normalization across CIK, UEI, LEI, NPI, DUNS.

  19. Q4Swarm

    Swarm SDK v0.3

    Sender Keys for O(1) group encryption. Sealed Sender. Deniable HMAC mode. PKCS7 padding across all transports.

  20. Q3Swarm

    Swarm SDK v0.2

    MAVLink v2 transport adapter. PX4 / ArduPilot / MAVSDK compatibility. Message fragmentation and reassembly inside 253-byte payloads.

  21. Q2Swarm

    Swarm SDK v0.1 — initial release

    Gossip mesh routing, Double Ratchet forward secrecy, ML-KEM-768 + X25519 hybrid post-quantum key exchange.

  22. Q2Writing

    7-day shutdown forecast — model published

    Long-form technical write-up of the 7-day internet shutdown forecasting model: political calendar features, network telemetry, ARIMA + XGBoost ensemble, per-country calibration and reliability scoring. Covers 200 countries.

  23. Q2Writing

    Classifier-to-forecast aggregation write-up published

    Long-form article bridging the per-measurement anomaly classifier and the 7-day shutdown forecast: three-stage risk score aggregation (ASN-domain hourly → domain-level → country-level), TimescaleDB measurement_scores hypertable with asn_domain_hourly continuous aggregate (15-minute refresh), HALF_LIFE_HOURS=48 exponential decay over WINDOW_HOURS=336, category_weights (news/human_rights 2.5×, circumvention 2.0×, social_media 1.5×, general 1.0×), sigmoid normalization with log count weight and ASN diversity weight, and the 28-feature ForecastFeatureVector published to voidly.forecast.features Kafka topic.

  24. Q2Writing

    Classifier calibration methodology write-up published

    Long-form article on how Voidly calibrates the anomaly classifier separately for each country: Platt scaling logistic regression fitted on per-country 30-day holdout predictions, F2-weighted threshold tuning per interference class, fallback to regional groupings for data-sparse countries, and the monthly re-calibration pipeline with change-detection alerting. Case studies: Iran DNS tampering threshold 0.62 (single-authority consistent signal); China DNS threshold 0.74 (CDN split-horizon noise requires conservative setting).

  25. Q2Writing

    Classifier retraining pipeline write-up published

    Long-form article on Voidly's weekly classifier retraining cadence: rolling 6-month training window with time-based splits (weeks 1-20 train, 21-23 validation, 24-26 test), SMOTE k=5 on training data only, fixed XGB_PARAMS (n_estimators=800, max_depth=6, lr=0.05, subsample=0.8, colsample_bytree=0.7) with quarterly Optuna re-tuning (30 trials), compute_psi() PSI thresholds (<0.1 stable, 0.1-0.25 warning, >0.25 alert), ChallengerEvaluation with three promotion criteria (F2>=champion, precision>=0.94, KS p>0.05), 48-hour shadow deployment, canary rollout 5%->25%->100% over 6 hours, and onnxmltools.convert_xgboost() ONNX export.

  26. Q2Writing

    OFAC SDN integration write-up published

    Long-form article on how the Federal Regulatory Data Hub ingests and queries the OFAC SDN list: hourly conditional GET with ETag + SHA256 two-step delta detection, sdn_advanced.xml parsing with alias explosion (44,891 aliases from 12,247 entries), name normalization pipeline (NFKD Unicode, legal suffix stripping, punctuation collapsing), FTS5 virtual table with unicode61 tokenizer, three-pass screening (exact normalized → FTS5 BM25 → Jaro-Winkler fuzzy), and p50 8ms / p99 28ms screening latency for the SDN list alone.

  27. Q2Writing

    Shutdown forecast feature engineering write-up published

    Long-form write-up on the 47-feature engineering pipeline powering the 7-day internet shutdown forecasting model: political calendar features (days_to_nearest_election signed-distance, election_competitiveness_score, protest_intensity_7d via GDELT CAMEO 14x events), OFAC sanctions timelines (ofac_designation_30d, diplomatic_isolation_score), network telemetry (bgp_withdrawal_rate_7d, probe_measurement_rate_delta, blocking_rate_trend, throttling_incident_count_7d), historical shutdown cycle features (sin/cos annual encoding, days_since_last_shutdown), and XGBoost SHAP analysis ranking days_to_nearest_election (0.41) and shutdown_history_3y (0.38) as top predictors across 200 countries.

  28. Q2Writing

    Voidly probe vantage selection write-up published

    Long-form write-up on how Voidly selects and distributes its probe vantage network: the ASN diversity requirement (residential/mobile preferred over data-center), the VANTAGE_QUALITY_MULTIPLIER scoring function, operator recruitment and deliberately lightweight vetting (no KYC), per-country safety tiers with key rotation and measurement delays for high-risk countries (CN, RU, IR, BY), and three approaches for reaching countries where most people connect on mobile-only networks.

  29. Q2Writing

    Voidly probe configuration delivery write-up published

    Long-form article on how Voidly delivers signed configuration bundles to probe operators: CBOR+gzip ConfigBundle (GlobalConfig + CountryConfig overlays + TestListRef + ModelRef), anonymous country token via BLAKE3(country_iso + "\x00" + salt)[:16], Ed25519 signature with three trusted verifying keys verified before decompression, 72-hour freshness window with ConfigUpdater (6-hour fetch interval, 5min base backoff, 24h max), ETag-based conditional KV writes via putIfMatch for atomic two-snapshot rollover, SnapshotStore with rollback to previous bundle, resolve_config() merging overlay onto GlobalConfig, and five-step CDN publish pipeline with 2% canary rollout.

  30. Q2Writing

    Voidly operator privacy architecture write-up published

    Long-form article on how Voidly protects probe operator identity while publishing full measurement data: probe_id derived as SHA-256(public_key_bytes) with zero-second IP log retention, human-readable codename system (adjective-noun-number, 450K+ combinations, no joint table with probe_id), measurement anonymization pipeline stripping IP before Kafka write while preserving probe_cc + probe_asn + network_type for analysis, per-probe Ed25519 signing keys stored in an isolated two-column key store with no foreign keys to operator tables, and 12-country extra protections (4–48 hour publication delay, 90-day probe_id rotation, Tor/VPN recommendation for CN/RU/IR/BY). Explains why Voidly cannot comply with operator identification demands even if legally compelled.

  31. Q1Writing

    Regulatory entity alias table write-up published

    Long-form article on entity alias management across 197 federal datasets: five alias types (AKA/FKA/NFE/PHONETIC/VESSEL), entity_aliases DDL with FTS5 virtual table (unicode61 tokenizer) and covering indexes idx_aliases_norm and idx_aliases_phonetic, NFKD alias normalization with iterative legal-suffix stripping, compute_alias_id() as sha256(entity_id + "\x00" + alias_type + "\x00" + alias_norm)[:16], Double-Metaphone phonetic alias generation for AKA entries, build_alias_rows_for_entity() bulk construction, four-pass resolution (exact 71.4% → phonetic 88.2% → FTS5 96.1% → edit-distance 98.7%), TypeScript resolveAlias() Workers function, and expired: true flag for deactivated aliases.

  32. Q2Writing

    Voidly test list curation write-up published

    Long-form write-up on how Voidly selects and maintains its 80-domain probe test list: Citizen Lab's global list as the starting point, 12 OONI category codes, per-country supplemental lists for 37 high-risk countries, the measurement budget problem that limits probed URLs per run, and quarterly curation cadence including emergency additions (Myanmar 2021, Iran 2022).

  33. Q2Writing

    Probe operator safety write-up published

    Long-form article on the threat model and data minimization design behind Voidly probe operator safety: five adversary classes (ISP passive monitoring, government warrant, network active probing, physical device seizure, operator compromise), Tor upload path as optional layer, measurement scrubbing pipeline removing operator fingerprints before Kafka write, emergency stop procedure with local SQLite deletion, and the policy reasons Voidly cannot comply with operator identification demands even under legal compulsion.

  34. Q2Writing

    Voidly probe commissioning write-up published

    Long-form article on how a new operator joins the Voidly censorship measurement network: X25519 key generation with x25519-dalek (StaticSecret::random_from_rng(OsRng)), probe_id derivation via SHA-256 of the public key bytes, POST /v1/probes/register with CAIDA AS-Rank ASN type classification and MaxMind GeoIP country verification, 48-hour warmup period (measurements tagged warmup=true), PROMOTION_THRESHOLD=0.72 quality gate via compute_quality_score() (measurement_rate 0.35×, error_rate 0.30×, control_reachability 0.25×, dns_diversity 0.10×), calibration deviation check against peer-median per domain (mean_deviation > 0.25 or max_deviation > 0.60 triggers review), pioneer_asn=true flag for unrepresented autonomous systems, and per-country threshold adjustments (Iran 0.60).

  35. Q2Writing

    Regulatory API rate limiting write-up published

    Long-form article on the two-layer token-bucket rate limiting system for the Federal Regulatory Data Hub: five quota tiers (free 2/s burst 5 bucket 200/day through vendor 200/1000/1000000), checkBurstLimit() with BucketState in Cloudflare KV and ETag conditional writes for lock-free updates via conditionalPut() with 3-retry loop, checkDailyQuota() with per-minute KV buckets over a 24-hour rolling window (WINDOW_SECONDS=86400, BUCKET_SIZE_SECONDS=60), fail-open semantics on KV timeout, and X-RateLimit-Remaining / X-Quota-Used response headers in rateLimitMiddleware().

  36. Q2Writing

    Real-time inference API write-up published

    Long-form article on how Voidly serves the anomaly classifier as a live inference API: ONNX Runtime for CPU-efficient model serving, 47-feature extraction pipeline with in-memory LRU control cache, three regional bare-metal inference nodes (US-East/EU-West/AP-East) routed by Cloudflare Workers, msgpack request encoding, per-country Platt scaling calibration at inference time, champion/challenger shadow mode deployment, and the full latency budget showing p50/p99 across all pipeline stages (inference service p99: 44ms).

  37. Q2Writing

    ONNX inference in Rust write-up published

    Long-form article on exporting the censorship classifier to ONNX and serving it in a Rust binary: export_pipeline_to_onnx() with 12-feature FEATURE_SCHEMA (TARGET_OPSET=17, zipmap=False), onnx.checker validation, SUPPORTED_OPSET_MIN/MAX=13/17, validate_model_metadata() with proto decode, build_session() with SessionOptions (intra/inter threads=1, Level3 graph optimization, disable_mem_arena=true), thread_local! SESSION backed by OnceCell, run_batch() with ndarray Array2<f32> inputs and float32 column-1 extraction, and benchmarks achieving p50 15.2ms / p99 49.8ms at batch size 200 (68K items/s).

  38. Q2Writing

    Classifier feature extraction write-up published

    Long-form article on how Voidly transforms raw probe measurements into the 47-feature vector that feeds the anomaly classifier: the ControlDelta struct (dns_match, dns_ip_in_control_set, tls_cert_is_mitm, http_blockpage_score), 12 DNS features (NXDOMAIN, bogon injection, known injection IP database, TTL anomaly, DNS response time), 8 TCP features (RST timing <15ms injection threshold, SYN-ACK count, RTT delta), 10 TLS features (MITM cert library with 831 certs, government CA list, alert code encoding), 12 HTTP features (blockpage SimHash distance, body length ratio, TTFB delta), 5 cross-layer metadata features, feature versioning with schema_version field, and the LRU control cache (10,000 entries, 15-minute TTL) that avoids doubling probe cost.

  39. Q2Writing

    Probe scheduling constraints write-up published

    Long-form article on how the Voidly probe respects device resource constraints before launching measurements: ResourceSnapshot struct (battery_pct, is_charging, is_thermal_limited, NetworkType enum, cellular_bytes_today/wifi_bytes_today), check_constraints() with ConstraintViolation early exit (BatteryTooLow/ThermalThrottled/CellularDailyCapExceeded/UnknownNetwork), per-minute SQLite cellular_usage table with 24-hour rolling VIEW, record_cycle_usage() ON CONFLICT DO UPDATE, BYTES_PER_MEASUREMENT_ESTIMATE=28000 constant, compute_cycle_params() adaptive cycle length, and score_domain() three-axis priority (staleness 0.50, priority_flag 0.35, anomaly_recency 0.15).

  40. Q2Writing

    Federal Regulatory Data Hub query layer write-up published

    Long-form article on how the Federal Regulatory Data Hub routes queries across 35M records at the Cloudflare edge: 8 vertical D1 shards (securities, financial_crimes, healthcare, labor_safety, environment, transportation, enforcement, infrastructure), single-shard direct routing vs. cross-agency Promise.all fan-out, entity bridge joins across CIK/UEI/LEI/DUNS/NPI, FTS5 full-text search with unicode61 tokenizer for DOJ press releases, FDA warning letters, and CFPB narratives, cache TTLs by endpoint type, cross-agency entity query p50 38ms / p99 120ms, and partial-response fallback on shard failure.

  41. Q2Writing

    Regulatory data versioning write-up published

    Long-form article on bitemporal regulatory record management: half-open valid_from/valid_until intervals in sdn_entries DDL, idx_sdn_current partial index (WHERE valid_until IS NULL) for zero-overhead live queries, record_versions audit table with append-only previous_payload JSONB and change_reason check constraint, two-statement UPDATE+INSERT transaction for version close and create, buildSdnQuery() TypeScript AS-OF rewriting with strict inequality (valid_until > asOfDate), three screening modes (current/as-of/historical), and keyset-paginated NDJSON snapshot export using (sdn_id, version_seq) pair.

  42. Q2Writing

    Regulatory Data Hub staleness monitoring write-up published

    Long-form article on how the Federal Regulatory Data Hub monitors freshness of 197 federal datasets: per-source FRESHNESS_CONFIG (expected_cadence + max_staleness_hours), D1 dataset_ingests table with strftime staleness query, Cloudflare Cron */5 * * * * Worker handler, multi-channel alerting (Slack webhook, email MIME, PagerDuty) with Workers KV deduplication to prevent repeat alerts, OFAC ETag HEAD polling with 90-minute publish-delay alert, five ingest error classes (source_unavailable, schema_drift, auth_failure, rate_limit, parse_error), and a public /status endpoint returning per-dataset staleness for downstream consumers.

  43. Q3Writing

    Geoblocking vs. censorship methodology write-up published

    Long-form article on how Voidly distinguishes commercial geoblocking from government censorship: HTTP 451 detection, streaming and GDPR block page fingerprints tagged geoblock_commercial (not censorship), multi-country probe pattern classification (SINGLE_COUNTRY vs. MULTI_COUNTRY_SELECTIVE vs. GLOBAL_OUTAGE), CDN split-horizon false positive mitigation via ASN group mapping, domain-level 30-day unavailability baselines, and the p_geoblock score (>0.70 suppresses censorship classification, 0.40–0.70 applies 0.5× weight reduction, <0.40 rejected). Domain category geoblock priors: streaming 0.82, news_media 0.08.

  44. Q3Writing

    Internet interference taxonomy write-up published

    Long-form article on Voidly's seven-type InterferenceType enum (DNS_TAMPERING, TLS_INTERFERENCE, TCP_RST_INJECTION, HTTP_BLOCKING, THROTTLING, BGP_WITHDRAWAL, APPLICATION_LAYER) with a protocol-layer priority ordering (BGP first, DNS second), classification decision tree, per-type confidence scoring, and country distribution breakdown across 2.2B probe measurements.

  45. Q3Writing

    Block page fingerprint library write-up published

    Long-form article on how Voidly built and maintains the 2,300-entry block page fingerprint library used by the anomaly classifier: four detection strategies (exact SHA-256 body hash, structural normalization to strip dynamic fields, SimHash locality-sensitive hashing with 8-band index, TLS certificate fingerprinting for MITM block pages), the match pipeline cascade in Python, block page collection from OONI confirmed events and direct probe captures, per-country library composition (Turkey 47, Iran 312, Russia 189, China 8, Germany 12 legitimate), false positive mitigation for CDN error pages and captive portals, and integration with the lf_http_blockpage_hash Snorkel label function (weight: 0.97).

  46. Q3Writing

    Voidly measurement protocol stack write-up published

    Long-form article on how the ProbeResult struct encodes all five measurement layers and how the probe executes them: sequential DNS→TCP→TLS→HTTP execution with parallel tokio-spawned control task, Option<Layer> failure propagation (None=not attempted, Some(failed)=attempted-and-failed), six layer-outcome combinations (DNS NXDOMAIN + TCP open → DNS tampering; all pass + body mismatch → HTTP blocking; etc.), five CONTROL_VANTAGE_ENDPOINTS across US-East/EU-West/AP-East/AP-South/SA-East, and deterministic vantage selection via domain_hash_u64 % 5 for consistent cross-probe comparison.

  47. Q3Writing

    DNS censorship measurement deep-dive published

    Long-form article on the DNS layer of Voidly's censorship detection: dual-resolver design querying both the ISP resolver and a neutral control (8.8.8.8 / 1.1.1.1 / 9.9.9.9 rotation), four interference types (NXDOMAIN injection most common in Turkey and Pakistan; IP spoofing with GFW returning 127.0.0.1 or 8.7.198.45 for blocked domains in China; empty answer / SERVFAIL from Iran's IRGC filtering), compare_dns_results() Python function, curated known injection IP database (China 18 IPs, Iran 3, Turkey 2), CDN geofencing false positive mitigation via is_cdn_expected_difference(), DNSSEC validation limitations (ISP resolvers disable validation before injecting), and DoH/DoT diagnostic queries that confirm ISP resolver tampering vs. domain non-existence.

  48. Q3Writing

    Voidly measurement API export write-up published

    Long-form article on the Voidly bulk data API and nightly Parquet export: GET /v1/measurements/export keyset-paginated by (ts, measurement_id), SSE streaming mode with 2-second poll and ":\ ping" keepalive, 17-field PARQUET_SCHEMA (domain as pa.dictionary int16, body_sha256 as pa.binary(32)), Zstandard level 3 with 1MB pages and 500K row groups, sort by (domain, ts) for 60% I/O reduction on domain-filtered reads, HuggingFace dataset card regeneration on push, and classifier_version tagging table with AUC-ROC 0.883/0.911/0.924 across v1.0/v2.0/v2.1.

  49. Q3Writing

    Voidly dataset schema reference published

    Field-by-field reference article for the CC BY 4.0 Voidly measurement dataset: probe identity fields, DNS/TCP/TLS/HTTP measurement layers, control comparison outputs, ML classifier fields (interference_type, per-class probabilities, tier), cross-source corroboration fields, BGP/outage signals, and filtering recipes for journalists, ML researchers, and infrastructure teams.

  50. Q3Writing

    TimescaleDB continuous aggregates write-up published

    Long-form article on the three-level continuous aggregate hierarchy behind Voidly's sub-10ms query latency: measurement_hourly (15-minute refresh, 45 materialized columns), country_daily_summary (1-hour refresh, hierarchical cagg from hourly), country_monthly_stats (daily refresh for HuggingFace export), and asn_hourly_summary (30-minute refresh from raw hypertable). Covers refresh_continuous_aggregate_policy configuration, start_offset settings for late-arriving probe data (94.2% within 1 hour, 98.7% within 24h), compression interplay after 7 days, and query benchmarks (7-day country query: 4.1s → 4ms).

  51. Q3Writing

    Voidly probe ingest pipeline write-up published

    Long-form article on the full path from probe bytes to TimescaleDB record: protobuf over QUIC/443, Cloudflare Worker Ed25519 authentication and rate limiting, Kafka fan-out to normalization and raw-archival consumers, Rust normalization with probe-version schema drift handling, GeoIP/ASN enrichment, quality filtering (3.2% drop rate, control_unreachable dominant), bulk COPY ingest at 45K rows/sec, EC2 Auto Scaling on consumer lag, and nightly Parquet export to HuggingFace.

  52. Q3Writing

    BGP shutdown detection write-up published

    Long-form write-up on how Voidly uses BGP routing data from IODA (RIPE NCC RIS, RouteViews, bgp.tools) alongside HTTP, DNS, and TLS signals to detect internet shutdowns: prefix withdrawal patterns, 90-day per-country baselines, the distinction between BGP silence and withdrawal, and per-country calibration for Iran, China, Russia, and Belarus.

  53. Q3Writing

    Voidly AS path analysis write-up published

    Long-form article on how Voidly uses CAIDA AS-Rank, RIPE NCC RIS route collector data, and PeeringDB to build AS-level topology graphs: Gao-Rexford valley-free inference for AS relationship classification (CUSTOMER_PROVIDER, PEER_TO_PEER, SIBLING), ChokepointClassification enum (IXP_LEVEL, TRANSIT_AS, EDGE_ISP) with real-world examples (Iran TICT/IRGC, Russia TSPU, Pakistan PTCL), per-country as_path_diversity_score with coverage targets (≥3 distinct upstream ASes for Tier 1 countries), as_path_length_delta anomaly classifier feature (SHAP 0.03), and three data-freshness limitations (5-minute BGP snapshot staleness, monthly CAIDA AS-Rank updates, IPv4/IPv6 topology mismatch).

  54. Q3Writing

    Voidly BGP data ingestion write-up published

    Long-form article on how Voidly ingests BGP routing data from three sources: RIPE NCC RIS (MRT format, 5-minute update intervals, 8-hour full dump), RouteViews (MRT daily + 15-minute snapshots), and bgp.tools WebSocket for real-time events. Covers bgpkit-parser MRT parsing (type 13 TABLE_DUMP_V2, type 16/17 BGP4MP), CountryBgpBaseline 90-day rolling median with p5/p95 bounds, WITHDRAWAL_THRESHOLDS (LOW 10%, MEDIUM 25%, HIGH 50%, CRITICAL 80%), BgpEvent records in TimescaleDB, bgp_outage_score SQL join with 6-hour lookback, and five false-positive mitigations (planned maintenance calendar, single-ASN filter, symmetric multi-country withdrawal, duration <5min, confirmation requirement). Latency: p50 ~90s from RIPE NCC; ~35s from bgp.tools during active events.

  55. Q3Writing

    Voidly ASN-level blocking analysis write-up published

    Long-form article on how Voidly uses per-ASN probe vantages to distinguish nationwide censorship orders from selective ISP-level enforcement: CAIDA AS-Rank tier classification (state-owned vs. private carriers), per-ASN blocking rate queries against TimescaleDB, ISP-level interference type fingerprinting (Rostelecom TLS interception vs. MTS DNS injection vs. ER-Telecom HTTP redirect), differential blocking detection algorithm, propagation speed analysis for new block orders (same-hour = automated TSPU push; multi-day = ISP-discretionary), and ASN coverage targets for the top-20 censorship countries.

  56. Q3Writing

    Domain censorship history API write-up published

    Long-form article on the DomainMeasurementSummary continuous aggregate and the /v1/domains/{domain}/history endpoint: first-seen and last-seen tracking per country, rolling 7-day/30-day blocking rate columns, TimescaleDB continuous aggregate cascade from hourly to daily summaries, and cursor-paginated query API with country_code and date_range filtering.

  57. Q3Writing

    Cross-source verification engine — technical write-up published

    Long-form write-up on the OONI / CensoredPlanet / IODA reconciler: data format normalization, 4-hour sliding window alignment, independence-weighted confidence scoring, and handling source disagreements. Explains what it takes for an anomaly to reach the "Verified incident" tier.

  58. Q3Writing

    Voidly middlebox detection write-up published

    Long-form article on fingerprinting network middleboxes: EchoTestResult struct detecting Via headers, X-Forwarded-For, and injected headers, RST packet heuristics with four weighted features (arrival time 0.40, TTL mismatch 0.30, zero TCP window 0.20, absent TCP options 0.10), 47-vendor DPI signature library covering TSPU Russia (12 fingerprints), Sandvine (4 countries, 9 fingerprints), Huawei Hi-SEC Iran/Pakistan/Cuba (8), GFW China (7), Cisco Saudi/UAE (5), TimescaleDB middlebox_events hypertable, and SQL correlation showing 18-hour median lead time before confirmed censorship events.

  59. Q3Writing

    TLS censorship measurement deep-dive published

    Long-form article on the TLS layer of Voidly's censorship detection: full certificate chain extraction with rustls (TlsResult struct capturing cert chain, alert codes 20/40/42/47/50/70), government CA list (China MoI CA, Iran MICT CA, Kazakhstan NCA), detect_tls_mitm() comparing Subject/Issuer/SPKI fingerprint against control, TLS alert timing analysis (RST < 15ms = injected TCP reset; alert < 30ms = SNI-based blocking), dual-SNI probing for Russia's TSPU infrastructure, ECH/ESNI support detection via DNS HTTPS record type 65, and decision tree mapping TLS vs TCP evidence to interference_type outputs. Country performance: Kazakhstan 100% MITM, China GFW RST 97%, Russia TSPU SNI 60%, Iran alert-based 85%.

  60. Q3Writing

    Sanctions-shutdown correlation write-up published

    Long-form article on how Voidly correlates OFAC sanctions packages with internet shutdown events: SanctionsEvent dataclass, ofac_designation_30d SQL feature, diplomatic_isolation_score (OFAC=1.0, UN=0.9, EU=0.8, UK=0.7, exponential decay half-life 30 days), four country case studies (Iran 2019 isolation score 8.7, Russia 2022 score 9.4, Myanmar 2021 shutdown preceded OFAC by 10 days, Belarus 2020 throttling preceded OFAC by 8 months), Pearson r=0.42 across 87 verified shutdowns, event-driven vs administrative shutdown classification, and nightly feature recomputation pipeline.

  61. Q3Writing

    Country-level censorship scoring write-up published

    Long-form article on how Voidly aggregates per-measurement ML classifier outputs into per-country censorship scores: exponential recency decay (30-day half-life, 90-day window), ASN diversity weighting (1/√K per-ASN weight cap to prevent ISP-concentration bias), domain category weighting (news_media 2.0×, social_media 1.8×, gaming 0.5×), cross-source corroboration multiplier (1 + corroboration_score), Gaussian temporal smoothing (σ=3 days), bootstrap 90% confidence bands, and per-country baseline calibration for coverage disparity between China (12M measurements/month) and Eritrea (200 measurements/month).

  62. Q1Writing

    Voidly REST API write-up published

    Long-form article on the Voidly REST API at api.voidly.ai/v1: core endpoints (/incidents, /measurements, /countries/{cc}/summary, /domains/{domain}/history, /bgp/events, /forecast/{cc}), cursor-based pagination, filtering by country_code / confidence_tier / interference_type / domain / date range, streaming NDJSON export, RFC 7807 error format, rate limits (120 req/min unauthenticated, 5× for authenticated), and code samples in curl, Python, and TypeScript.

  63. Q1Writing

    Voidly API authentication write-up published

    Long-form article on Voidly's two access tiers (unauthenticated read-only vs. authenticated full access), the voidly_{env}_{base58} key format, PBKDF2-HMAC-SHA256 key storage in D1, the D1+KV auth flow with 1-hour KV cache, four plan tiers (free 120 req/min, researcher 600 req/min, journalist 1200 req/min, enterprise 6000 req/min), and HMAC-SHA256 webhook verification.

  64. Q1Writing

    SSE streaming API write-up published

    Long-form article on the Voidly Server-Sent Events streaming endpoint: GET /v1/stream with country/tier/type filtering, four event types (incident_created, incident_updated, incident_resolved, country_status_change), SSE wire format with id/event/data fields, Last-Event-ID checkpoint replay with 24-hour ring buffer (10,000 events per filter), Python httpx.Client and JavaScript EventSource reconnect examples, connection limits by tier, and a comparison of SSE vs. HMAC-signed webhooks for different consumer types.

  65. Q1Writing

    Voidly MCP server write-up published

    Long-form article on the Voidly MCP server: 83 tools across six categories (incident lookup, measurement queries, country summaries, domain/test-list tools, BGP/network infrastructure, dataset metadata), JSON-RPC over Streamable HTTP, wiring into Claude Code via mcp-remote, rate limits, and three example agent workflows for journalists, researchers, and human rights organizations.

  66. Q1Writing

    Real-time corroboration engine write-up published

    Long-form article on the CorroborationEngine Rust struct that fetches and aligns data from OONI, CensoredPlanet, and IODA in near-real-time: tokio::join! parallel fetches with per-source timeouts (OONI 10s, CP 30s, IODA 15s), CorroborationResult struct with per-source confirmed flags and adaptive next_check intervals, OONI polling on a 15m/60m/3h/6h adaptive schedule keyed to anomaly tier, in-memory CensoredPlanet daily dump index loaded via load_cp_daily_dump() (NDJSON streaming, country×domain×date hashmap), independence weight table penalizing shared probe network (OONI+CP weight 0.6 vs. OONI+IODA 0.9), real-time check latency 2–8s, throughput 800/hr normal and 4,000+/hr during surge, and the nightly retroactive pass that reprocesses anomalies against the previous day's full CP dump.

  67. Q1Writing

    Parquet export pipeline write-up published

    Long-form article on the nightly Voidly export job: 30 0 * * * cron after TimescaleDB compression, PyArrow schema with pa.dictionary for low-cardinality columns, server-side named cursor streaming (50K rows/round-trip), Zstandard level 3 compression, country+year_month Parquet partitioning, atomic HuggingFace commit with CommitOperationAdd, SHA-256 post-push verification, and the incremental vs. monthly full-snapshot strategy.

  68. Q1Writing

    Voidly HuggingFace datasets write-up published

    Long-form article on how the Voidly CC BY 4.0 measurement dataset (global-censorship-index) and the OONI historical corpus are hosted on HuggingFace: Parquet partitioning by country_code and year_month, daily incremental append cadence, git-lfs versioning for point-in-time reproducibility, DuckDB/pandas/R access recipes, multi-country query patterns, confidence tier filtering guidance for journalism vs. ML vs. infrastructure monitoring, probe quality filtering, time-based train/val/test split methodology, and CC BY 4.0 citation format.

  69. Q1Writing

    Censorship incident lifecycle write-up published

    Long-form article on how a Voidly censorship incident progresses through six states (Anomaly, MultiSourceAnomaly, Corroborated, VerifiedIncident, Resolved, FalsePositive): exact transition thresholds (3 measurements from 2 ASNs for MultiSourceAnomaly; corroboration_score ≥ 0.80 for VerifiedIncident; 80% probe recovery for Resolved), timing data from 847 incidents in 2024 (67% stuck at Anomaly, 18% reach VerifiedIncident; median Anomaly→VerifiedIncident ~68 minutes), IncidentRecord struct with incident_id stability across state changes, publication timing by tier (Corroborated within 5 minutes, VerifiedIncident triggers journalist alerts), and how lifecycle state encodes into HuggingFace dataset fields (confidence_tier, is_active, resolved_at, corroboration_score).

  70. Q1Writing

    Voidly measurement retention policy write-up published

    Long-form article on Voidly's three-tier TimescaleDB retention policy for 2.2B probe measurements: hot tier (0-30 days, full resolution, ~144GB), warm tier (31-365 days, native compression at 6.2× ratio, ~55GB), and cold tier (>365 days, continuous-aggregate hourly/daily/monthly only, ~12GB). Covers compress_chunk DDL with segmentby=['country_code'] and orderby=['measured_at'], pg_cron compliance verification queries, continuous aggregate cascade, and R2 tiered storage for cold-to-archive offload planned for Q3 2026.

  71. Q1Writing

    Voidly TimescaleDB measurement store write-up published

    Long-form article on Voidly's 2.2B-row TimescaleDB architecture: hypertable with 1-day chunk intervals and space partitioning by country (4 space partitions), 6.2× compression using delta encoding for timestamps and dictionary encoding for country/protocol enums, continuous aggregates (country_daily_summary, asn_hourly_summary) with 30-day start_offset for late-arriving probe uploads, three-tier retention (7-day hot, 1-year compressed, S3 Parquet cold), query benchmarks (country window 4ms, 90-day domain history 18ms, raw measurement detail 140ms), r6g.2xlarge at 64GB RAM, 91% cache hit rate.

  72. Q1Writing

    Regulatory Data Hub infrastructure deep-dive published

    Long-form write-up on building the hub on Cloudflare D1: per-vertical SQLite tables across 197 datasets, daily cron ingest, FTS5 for free-text datasets (DOJ press releases, FDA warning letters), vertical sharding past the 10GB per-database limit, and the Workers query-routing layer.

  73. Q1Writing

    Entity ID normalization write-up published

    Long-form article on how the Federal Regulatory Data Hub resolves company identity across five incompatible federal identifier schemes: CIK (SEC EDGAR), UEI (SAM.gov), LEI (FinCEN/CFTC), DUNS (legacy contractor), NPI (CMS healthcare). Three-pass resolution strategy: exact ID join (99.9% true positive), alias table lookup for DBA/former names (99.2%), TF-IDF fuzzy name matching with Jaro-Winkler similarity for the remainder (confidence-gated at 0.7). entity_master bridge table schema, company name normalization (strip Co/Inc/LLC/Ltd), false positive rate table by method, special cases for healthcare NPI arrays (json_each) and foreign entities, and how the bridge achieves p50 38ms for cross-agency entity queries.

  74. Q1Writing

    Regulatory Data Hub schema design write-up published

    Long-form article on D1 schema design across eight shards: per-vertical DDL with OFAC SDN FTS5 virtual table (trigger-based shadow maintenance), EPA enforcement covering index (idx_epa_entity_date), entity_master bridge with shard_presence bitmask INTEGER column and four partial identifier indexes for CIK/UEI/LEI/NPI, TypeScript SHARD_MAP and queryEntityAllShards() with parallel Promise.all fan-out, and measured query latency (p50 4ms entity_master lookup, p50 38ms 3-shard fan-out, p99 91ms 3-shard). Covers migration strategy for the 10GB per-database D1 limit.

  75. Q1Writing

    Federal Regulatory FTS5 full-text search write-up published

    Long-form article on implementing full-text search across 35M federal records using SQLite FTS5 in Cloudflare D1: CREATE VIRTUAL TABLE with unicode61 tokenizer, content= shadow-table pattern (content_rowid= linking to physical table), BM25 scoring via bm25(table, 10.0, 5.0, 1.0, 1.0) weighting entity_name 10× and description 5× over narrative text, highlight() and snippet() SQL functions for context extraction, buildFts5Query() TypeScript function stripping legal suffixes and double-quoting tokens, alias expansion from entity_aliases table, Promise.all cross-shard fan-out across 5 D1 shards with response merging, three triggers (AFTER INSERT/DELETE/UPDATE) for index maintenance, and weekly optimize() via Cloudflare Cron to merge FTS5 b-tree segments.

  76. Q1Writing

    Voidly confidence tier system documented

    Long-form write-up explaining the three-tier confidence system (Anomaly → Corroborated → Verified Incident): independence weighting across OONI, CensoredPlanet, and IODA sources, why external confirmation is required for Verified tier, and what each tier means for journalists, ML researchers, and infrastructure monitoring teams.

2024

24 milestones
  1. Q4Writing

    Election cycle coverage

    Statistical anomaly detection across 47 races. Benford’s Law, ARIMA time-series, turnout modeling.

  2. Q4Writing

    OSINT entity extraction write-up published

    Long-form article on the NER and disambiguation pipeline running on 58M social posts per day: en_core_web_trf + 5 language-specific spaCy models, A100 GPU at 2,800 posts/sec (22,400/sec across 8 workers), EntityType taxonomy (PER/ORG/GPE/LOC/LAW/PRODUCT/EVENT/FAUX), Wikidata QID disambiguation with 1M Redis LRU cache (78% hit rate), cross-language transliteration for Arabic/Cyrillic, person co-reference resolution (edit-distance + 24h author context), org hierarchy P749 table, and 15,000 entity mentions per hour output rate.

  3. Q4Writing

    Distributed VPN routing post

    ML-driven path selection across 142 entry-node IPs; traffic-morphing layer for DPI evasion.

  4. Q4Writing

    Voidly measurement scheduler write-up published

    Long-form article on how Voidly schedules 80-domain probe runs across 37+ nodes: MeasurementTask dataclass with domain/protocols/priority/jitter_ms, OONI category-code priority table (NEWS=8, SMG=8, HUMR=7, POLR=6), compute_domain_priority() with anomaly_boost (max 3) and recency_boost (max 2), high-priority domains probed every 5 minutes, +/-15% jitter for anti-detection with 10-15% skip rate in high-risk countries (CN/RU/IR/BY/VN), per-domain ASN distribution ensuring cross-ASN coverage, HighPrioritySignal urgent injection for 6 windows on anomaly detection, Unix epoch window alignment across probes, and per-country task budgets (CN:68, RU:72, IR:74, global avg:49).

  5. Q4Writing

    OONI training data write-up published

    Long-form article on how Voidly ingests 200M+ OONI Explorer web_connectivity measurements for classifier training: OoniMeasurement dataclass (measurement_uid, test_name, is_confirmed, anomaly, failure), S3 bucket ooni-data-eu-fra (~40 GB/day), web_connectivity filter (95.3% of records), +/-12-hour alignment window against Voidly probes (67% alignment rate), five Snorkel label functions (ooni_confirmed 0.95, ooni_anomaly_no_failure 0.60, blockpage_hash_match 0.97, dns_injection_ip 0.92 covering 18 GFW + 3 Iran + 2 Turkey IPs, rst_injection_timing 0.88 for events <15ms), LabelModel generative training, 34.2% OONI confirmed coverage, 71.8% LF coverage, 2.1% conflict rate, ~4.8% label noise, and pseudo-labels weighted at 0.5x during ML training.

  6. Q4Writing

    Alert delivery system write-up published

    Long-form article on how Voidly delivers censorship incident alerts to journalists, researchers, and compliance monitoring systems: Subscriber filter model (country_codes, interference_types, domain_categories, min_confidence), HMAC-SHA256-signed webhook delivery with exponential-backoff retry (30s/5m/20m, dead-letter after 4 failures), PGP/MIME encrypted email for VERIFIED incidents, per-country and per-confidence-tier RSS/Atom feeds updated within 60 seconds, alert deduplication via (subscriber_id, incident_id, event_type) unique index, rate limiting per subscriber and per subscriber × country (20/hr and 5/hr defaults), BGP withdrawal escalation with 2-minute country-level cool-down, and end-to-end latency budget (p50 ~1.6s webhook, ~8–10 min probe-to-inbox for email).

  7. Q4Writing

    Voidly incident publication write-up published

    Long-form article on how Voidly models censorship incident state transitions and publishes them to Kafka: five-state Rust IncidentState enum (Anomaly → MultiSourceAnomaly → Corroborated → Verified → Resolved), per-type transition threshold table, ResolutionMethod and StateTransition enums, compute_incident_id() SHA-256 keyed on "{country}:{domain_hash8}:{type}:{epoch_day}", PostgreSQL upsert with idempotency_key ON CONFLICT DO UPDATE, TimescaleDB incident_events hypertable, and three Kafka topics (voidly.incidents.state-changes, voidly.incidents.verified, voidly.cache.invalidations) with their consumer group purposes.

  8. Q4Writing

    Voidly anomaly classifier deep-dive published

    Long-form write-up on the ML classifier: five per-class binary models (DNS tampering, TLS interference, HTTP blocking, BGP withdrawal, throttling), XGBoost with per-country calibration, and why optimizing for recall rather than precision is the correct tradeoff when cross-source corroboration handles false positives downstream.

  9. Q4Writing

    Classifier offline test harness write-up published

    Long-form article on the offline evaluation framework for the Voidly anomaly classifier: per-country EvaluationSplit dataclass with stratified 80/10/10 train/val/test partitioning, AUC-PR rationale over AUC-ROC for imbalanced censorship data (random baseline equals class prevalence, e.g. 0.03), F2 score (β=2) to weight recall above precision, Platt scaling ECE calibration (before 0.14–0.22, after 0.03–0.06), per-class F2 results (BGP 0.96, DNS 0.91, HTTP 0.88, TLS 0.84, throttling 0.79), country case studies (Iran 0.97 DNS recall, China 0.78 precision with CDN noise, Russia 0.83 throttling recall), and promotion criteria: macro AUC-PR ≥ 0.82, F2 ≥ 0.85, ECE ≤ 0.07 for ≥90% countries, 48h shadow mode.

  10. Q4Writing

    Voidly active learning loop write-up published

    Long-form article on how Voidly grows its censorship anomaly training set beyond the 127K bootstrap labels: uncertainty sampling (least-confidence heuristic 1 − |2P − 1|), diversity filter (≤20 examples per country × protocol cell per batch), three-annotator review with Cohen's kappa ≥ 0.82 acceptance threshold, 500 examples/week annotation cadence (87 person-hours/week), DVC data versioning linking every model version to its exact training set hash, PSI-based feature distribution drift detection (threshold 0.2), and labeling criteria review workflow on drift trigger.

  11. Q4Writing

    ML training pipeline write-up published

    Long-form article on how Voidly builds its labeled censorship training dataset from 200M+ OONI measurements: Snorkel-style weak supervision with 5 label functions (ooni_confirmed, dns_nxdomain_blockpage_asn, tls_reset_no_control_failure, http_blockpage_hash, bgp_outage_corroborated), 47-feature schema, SMOTE class imbalance handling, time-based train/val/test splits to prevent temporal leakage, per-country Platt scaling calibration, and weekly incremental retraining with champion/challenger promotion.

  12. Q4Writing

    Measurement quality filter write-up published

    Long-form article on the quality_filter() function that gates raw probe measurements before ML feature extraction: five rejection criteria (probe version <2.5.0, missing required fields, control_failure, protocol layer incompleteness, deduplication), drop rates by reason (control_failure 1.9%, missing_fields 0.8%, old_probe 0.3%, duplicate 0.2%, total 3.2%), the FilterResult dataclass with drop_reason + metadata, and the to_feature_input() schema transformation that maps the 3.2% pass-through to the 47-feature ML input vector.

  13. Q4Writing

    OONI data normalization write-up published

    Long-form article on normalizing OONI web_connectivity measurements across five schema versions (v0.2–v0.6) for ML training: WebConnectivityVersion enum with detect_web_connectivity_version() field-presence inference, AnomalyType and ConfidenceTier enums, full OoniMeasurementNormalized dataclass, FLAG_* bitmask constants (FLAG_DNS=0x01, FLAG_TCP=0x02, FLAG_TLS=0x04, FLAG_HTTP=0x08, FLAG_CTRL_FAIL=0x10), normalize_v05() and normalize_v06() version-specific paths with structural diff table, and a drop reason table achieving 95.3% pass-through (schema_unknown 1.8%, missing_control 1.4%, dns_error_no_control 1.5%).

  14. Q4Writing

    OONI historical corpus write-up published

    Long-form article on building the OONI historical corpus now downloaded 1.66M+ times on HuggingFace: probe version schema drift, test_keys normalization across 20 measurement types, streaming 200M+ records, and the three decisions that reduced corpus size while improving ML usability.

  15. Q4Writing

    Censorship attribution OSINT write-up published

    Long-form article on attributing censorship to specific DPI vendors via OSINT: DpiVendorSignature dataclass with TSPU_SIGNATURE example, score_signature_match() combining RST timing (0.35), block page (0.30), injection IP (0.25), CA SPKI (0.10), PROCUREMENT_SOURCES dict with five government tender portals (zakupki.gov.ru, ihalesorgu.gov.tr, etc.), extract_vendor_from_tender() regex pipeline, BGP TTL-hop attribution for middlebox distance estimation, and country case studies (Russia TSPU 2019–2023 rollout, Iran ARRS autonomous system, China GFW distributed injection).

  16. Q4Writing

    OSINT digital-footprint pipeline write-up published

    Long-form article on the internal entity reconnaissance tool: Entity/Attribute data model, 40+ source connectors, graph-based identity disambiguation with calibrated edge weights (PGP fingerprint 0.92, shared IP 0.40, stylometric 0.55), Certificate Transparency monitoring via CertStream for new domain registration detection, BGP RIB diff tracking for ASN prefix changes, compartmentalized codename-based storage architecture, and integration with Voidly censorship event attribution for ISP and government agency profiling.

  17. Q4Writing

    Censorship infrastructure mapping write-up published

    Long-form article on mapping the physical and logical infrastructure behind government internet censorship: L3/L4/L7 blocking taxonomy (BGP withdrawal, TCP RST injection <15ms, NXDOMAIN injection, transparent proxy), DPI vendor signatures (Russia TSPU RST <3ms, Iran ARRS resolver 10.10.34.35, GFW injection IPs including 8.7.198.45, NetClean Turkey 47 block-page fingerprints), Rust DpiSignature struct with classify_dpi_vendor(), infer_middlebox_distance() TTL hop-count analysis, Python IspBlockingFingerprint dataclass with match_score(), OSINT cross-referencing of zakupki.gov.ru TSPU contracts and BTK Turkey tenders, censorship_infrastructure controlled vocabulary (0.80 confidence threshold, 67% coverage), and Russia TSPU rollout timeline 2019–2023.

  18. Q4Writing

    Social media ingestion pipeline write-up published

    Long-form article on the three-tier social media collection architecture ingesting 58M posts per day across 47 platform schemas: Tier 1 official APIs (Twitter/X v2, Reddit PRAW) ~31M posts/day; Tier 2 ActivityPub inbox crawling across 26 Mastodon/Pleroma instances ~4M/day; Tier 3 RSS feeds and scraping across 8,400 Telegram channels, Truth Social, and GETTR ~23M/day. CanonicalPost schema (post_id, content_hash, is_repost, collection_strategy), token-bucket rate limiting with circuit breaker on HTTP 429, FastText lid.176 language detection at 2.1μs/post, Kafka partitioned by language_code (64 partitions, msgpack + lz4, acks='all'), and Redis-based content deduplication via SHA-256 with 24-hour TTL.

  19. Q3Writing

    NLP at 2.4M posts/hr

    Real-time pipeline goes into production: Apache Kafka, TimescaleDB, 80 GPU NLP workers, MinHash deduplication.

  20. Q3Writing

    Multilingual bot detection write-up published

    Long-form article on detecting coordinated bot accounts across 14 languages: BotFeatureVector dataclass (8 features: posting_interval_entropy, reply_outdegree_ratio, content_cluster_density, age_velocity_zscore, quote_to_original_ratio, url_recycling_rate, cross_platform_correlation, bio_change_count_90d), compute_posting_interval_entropy() with Shannon formula, Redis-bucketed perceptual hash matching (pHash Hamming distance ≤ 8 threshold), XGBClassifier with StratifiedGroupKFold on language groups to prevent cross-language leakage, per-language Platt scaling calibration, and F1 0.883–0.908 across all 14 languages (Arabic 0.883, Turkish 0.901, English 0.908).

  21. Q3Writing

    Coordinated campaign detection write-up published

    Long-form article on detecting coordinated inauthentic behavior across 58M daily social media posts: MinHash LSH (128 hash functions, 16 bands, Jaccard threshold 0.80) for content similarity clustering, Redis sorted-set temporal burst detection (≥5 accounts posting from the same content cluster within 15 minutes, with inverse-sqrt account age weighting), seven account behavioral features (age, posting rate, reply ratio, domain diversity, follower ratio, profile completeness, hourly entropy), network amplification ring detection via Johnson cycle enumeration, cross-platform timing joins on content_hash, and a 0–100 coordination score assembling all signals (70+ → human review queue, 90+ → automatic flagging for the election anomaly detection pipeline).

  22. Q3Writing

    Election finance entity resolution write-up published

    Long-form article on resolving FEC committee identities across joint fundraising committees, disbursement schedules, and multi-cycle filings: FEC committee type taxonomy (H/S/P/X/Y/N/Q/O/I/U codes), JointFundraisingCommittee dataclass with JFCAllocation list, resolve_jfc_participants() parsing Form 99 MEMO entries, normalize_entity_name() with iterative legal-suffix stripping (10 patterns, 3 passes), four-pass resolution table (exact C-code 71.5% → alias 84.2% → TF-IDF char 3-gram cosine 92.1% → Jaro-Winkler fuzzy 95.5%), and FECEntityMatcher class with cosine similarity threshold 0.72.

  23. Q3Writing

    Election statistical methods write-up published

    Long-form article on the statistical tests underpinning election anomaly detection: BenfordTest (chi-squared df=8, KS test, applicability gate requiring >=2 log10 range and >=200 precincts), last-digit uniformity test (chi-squared + zero/five clustering >30%), TurnoutAnomalyTest with linear regression baseline and z-score >3.5 flag, non-monotonic reporting detection, ElectionAnomalySignal composite severity scorer (weighted by type: turnout 0.4, last-digit 0.35, Benford 0.2), cross-validation requirement (social media + media + analyst sign-off before alert), and 2024 calibration results: 0.7% false positive rate across all applicable precincts.

  24. Q3Writing

    Election data pipeline write-up published

    Long-form article on the data engineering backbone for election anomaly detection: AP Election API long-polling (30–60s cadence), Kafka election.precinct_results topic (50 partitions by state FIPS), PrecinctResult protobuf schema (fips_code, race_id, candidate_results, precincts_reporting, source enum), state authority feed scrapers with CSV/HTML normalization, FIPS normalization edge cases (Connecticut 8→9 planning regions 2022, Alaska house districts to borough FIPS, NYC borough consolidation), ElectionSentimentConsumer aggregating DistilBERT outputs by state × hour to election.social_sentiment, ElectionClaimConsumer computing narrative divergence via cosine similarity, and end-to-end latency budget (AP feed p50 45s / p99 120s; state scraper p50 6min; social signal p50 8min).

2023

2 milestones
  1. Q4Open data

    OONI corpus mirrored on HuggingFace

    ooni-censorship-historical dataset published; passes 1.66M cumulative downloads over the next two years.

  2. Q2Voidly

    Voidly cross-source verification online

    Reconciler ships across OONI / CensoredPlanet / IODA. Verified-incident tier becomes the default surface.

2022

2 milestones
  1. Q3Coverage

    Probe network → 100 countries

    Vantage selection rules formalized: presence inside affected jurisdictions, ASN diversity, operator safety. Test list grows to 60 domains.

  2. Q1Voidly

    ML anomaly classifier in production

    Five interference classes (DNS, TLS, HTTP, BGP, throttling) graded with confidence scores; corroborated tier introduced.

2021

2 milestones
  1. Q3Open data

    CC BY 4.0 dataset published

    First public release of the Voidly measurement archive — committing to open data as a permanent operating principle.

  2. Q2Tooling

    Voidly probe v1

    Cross-platform Tauri desktop probe with boringtun + tun-rs. Anyone with a network can run a probe; keys never leave the device.

2020

2 milestones
  1. Q4Voidly

    First measurements collected

    Initial 6-country probe network active. The censorship dataset begins.

  2. Q2Org

    AI Analytics founded

    Operator-led collective forms around a shared belief: internet censorship should be measurable, verifiable, and citable.

Have an event you cited from us in a publication or research paper? Tell us at info@ai-analytics.org and we'll add it to the timeline.