Technical writing

SBA Loan Programs: The Federal Database Behind $50 Billion in Annual Small Business Financing

· AI Analytics
SBASmall BusinessLoans7(a)Federal Data

The Small Business Administration 7(a) and 504 loan guarantee programs back over $50 billion in small business financing per year — every loan disclosed in a public dataset covering borrower name, location, loan amount, lender, industry, and jobs supported, making SBA the most transparent source of small business capital data in the United States.

What the SBA is

The Small Business Administration was created by the Small Business Act of 1953, signed by President Eisenhower. Its congressional mandate is to “aid, counsel, assist, and protect, insofar as is possible, the interests of small business concerns.” The SBA is a cabinet-level independent agency with approximately 3,000 permanent staff and 68 district offices spread across every state and territory. Unlike most federal agencies, the SBA does not primarily deliver services through its own staff—it operates primarily through partner institutions: banks, certified development companies, small business development centers, SCORE mentors, and women's business centers.

The agency runs two broad categories of programs. The financial assistance programs provide capital through loan guarantees, direct loans, and equity investment: the 7(a) loan guarantee program, the 504 Certified Development Company program, the Microloan program, the Small Business Investment Company (SBIC) equity investment program, and disaster loan programs that are legally and operationally separate from the commercial programs. The technical assistance programs provide counseling, training, and market access: Small Business Development Centers, SCORE volunteer mentors, Women's Business Centers, and Veteran Business Outreach Centers.

The SBA's disaster loan program deserves a brief mention because it is frequently confused with the commercial programs. The SBA Office of Disaster Recovery and Resilience makes direct loans—not guarantees—to businesses, nonprofits, homeowners, and renters after presidentially declared disasters. The Economic Injury Disaster Loan (EIDL) and the COVID-era Paycheck Protection Program (PPP) operated through this authority. Disaster loans appear in a separate database from 7(a) and 504 data and are not analyzed in this article.

By dollar volume, the 7(a) program is the flagship. In FY2023, the SBA approved roughly $27.5 billion in 7(a) loans across approximately 57,000 loans. The 504 program approved approximately $10 billion across roughly 5,000 loans. The Microloan program is far smaller—approximately $84 million annually across some 5,000 loans averaging under $17,000 each—and targets the very smallest businesses and entrepreneurs who cannot qualify even for 7(a). All three programs feed public datasets, with 7(a) and 504 providing the richest loan-level disclosure.

The 7(a) loan guarantee program

The 7(a) program is a guarantee mechanism, not a direct lending program. The SBA does not originate or fund 7(a) loans. Private lenders—banks, credit unions, and SBA-licensed non-bank lenders—originate loans to qualifying borrowers using their own capital, subject to SBA eligibility rules. The SBA then guarantees a portion of each loan: if the borrower defaults, the SBA pays the lender the guaranteed percentage of the outstanding principal. The lender retains the unguaranteed slice and absorbs loss on that portion, which is why lenders still underwrite rather than originating any loan that technically qualifies.

The standard guarantee percentages: for loans of $150,000 or less, the SBA guarantees 85% of principal. For loans above $150,000 up to the $5 million statutory maximum, the guarantee is 75%, capped at $3.75 million of guaranteed exposure per loan. These percentages make 7(a) loans substantially less risky to a bank than unguaranteed commercial loans of comparable size, which is the mechanism that expands credit access—lenders accept thinner collateral and weaker credit histories when the federal government absorbs most of the downside.

Eligible uses of 7(a) proceeds are deliberately broad: working capital, equipment purchase, leasehold improvements, furniture and fixtures, business acquisition (including buying out a partner), commercial real estate purchase and renovation, and refinancing of existing business debt that is not on “reasonable” terms. Real estate loans can run up to 25 years; equipment and working capital loans run up to 10 years.

Interest rates on 7(a) loans are variable and tied to the Wall Street Journal Prime Rate. The allowable spreads above prime are set by the SBA and vary by loan size and term: for fixed-rate loans above $50,000, lenders may charge prime plus 2.25% to 4.75% depending on maturity; for variable-rate loans the spread is narrower. In practice, most 7(a) loans are variable rate, priced at prime plus 2.25% to 2.75% for larger loans. Smaller loans carry wider spreads. When the federal funds rate environment is elevated—as it was from 2022 through 2024—7(a) borrowers face materially higher debt service than in zero-rate environments, which is visible as elevated early charge-off rates in loan vintage cohorts from those years.

Eligibility requires the borrower to be a for-profit business, organized and operating in the United States, meeting SBA size standards for its NAICS code, and unable to obtain credit on reasonable terms elsewhere. The “unable to obtain credit elsewhere” test is a certification by the borrower and lender—it is not independently verified by the SBA in most cases. Businesses in certain industries are ineligible regardless of size: financial businesses that primarily lend money (banks, finance companies), life insurance companies, businesses primarily engaged in political or lobbying activities, and businesses engaged in illegal activity under federal law.

Sub-programs within 7(a)

Several variants of 7(a) operate under the same statutory authority with different processing rules and guarantee structures.

SBA Express is the highest-volume sub-program by loan count. Lenders approved for the Express program may use their own underwriting procedures rather than SBA standard procedures, receiving an SBA decision within 36 hours rather than the standard multi-week timeline. The tradeoff is a reduced guarantee: Express loans carry a 50% SBA guarantee rather than 75% or 85%. The maximum Express loan is $500,000. Express is popular with experienced SBA lenders because it allows them to move quickly on smaller loans without waiting for SBA approval.

Export Express and Export Working Capital Program extend 7(a) to businesses with international sales. Export Express mirrors the Express structure but is restricted to exporters. The Export Working Capital Program provides revolving lines of credit to fund export transactions, with a 90% guarantee, making it the highest-guarantee product in the 7(a) family.

CAPLines addresses revolving credit needs for businesses with cyclical or contract-based cash flows. Four variants exist: Seasonal CAPLine (for businesses with predictable seasonal peaks), Contract CAPLine (advances against specific contracts), Builders CAPLine (construction and substantial renovation), and Working Capital CAPLine (general-purpose revolving credit).

Community Advantage—now operating under the Community Advantage SBIC structure after a 2023 program redesign—allows mission-driven lenders such as Community Development Financial Institutions (CDFIs), nonprofit lenders, and microloan intermediaries to make 7(a) loans up to $350,000. Community Advantage is intended to serve markets that mainstream SBA lenders underserve: rural areas, low-to-moderate income communities, minority-owned businesses, and businesses too small to attract conventional SBA lender interest.

The 504 program: fixed assets and the three-party structure

The 504 program exists for a specific purpose that 7(a) cannot efficiently serve: providing long-term, fixed-rate financing for major fixed assets, primarily owner-occupied commercial real estate and heavy equipment. It is not available for working capital, inventory, or debt refinancing (with limited exceptions), which sharply distinguishes it from 7(a). The program operates through Certified Development Companies (CDCs)—SBA-licensed nonprofits, approximately 200 of which operate regionally across the country.

The financing structure involves three parties rather than the two-party structure of 7(a). For a typical project costing $1 million: a private bank provides a first mortgage covering 50% of total project cost ($500,000); the CDC provides a second mortgage covering 40% of project cost ($400,000), funded by selling a debenture to investors with an SBA guarantee backing that debenture; and the borrower provides a 10% equity down payment ($100,000). The borrower equity requirement rises to 15% for startups (in business less than two years) and for “special purpose” properties—buildings not easily converted to general use, including hotels, gas stations, car washes, and golf courses.

The maximum SBA debenture is $5.5 million for most projects. It rises to $5.5 million again (for a total of up to $11 million in combined debentures) for projects meeting “public policy goals”: energy efficiency improvements, use of renewable energy, reduction in energy consumption by at least 10%, or projects in designated areas of high unemployment or labor surplus. A manufacturing project meeting energy efficiency standards could thus receive up to $16.5 million in CDC debentures.

The CDC debenture carries a fixed rate set monthly by SBA based on the yield of ten-year or twenty-year Treasury notes plus a spread for guarantee and servicing fees. Because it is backed by an SBA guarantee and funded through the bond market rather than a bank's cost of funds, the CDC debenture rate is typically below what a bank would charge for a second mortgage of equivalent risk. This below-market fixed rate on the 40% CDC tranche is the program's primary value to borrowers—they effectively get the large fixed-rate anchor of their project financing at a below-conventional rate, while the bank's first mortgage at 50% LTV is a conservative credit with strong collateral coverage.

For owner-occupied commercial real estate, 504 is often the optimal financing structure for qualifying businesses. The effective combined interest cost of the three-tranche stack is often lower than a conventional commercial real estate loan, and the 20-year fixed rate on the CDC debenture provides duration certainty that a floating-rate bank loan does not. The program has historically generated roughly 5,000 loans per year at approximately $10 billion in total project financing.

The PPP: the largest public disclosure of small business financial data ever

The Paycheck Protection Program was created by the CARES Act in March 2020. It was not a guarantee program—it was a direct forgivable loan program, operated through SBA but funded by the Treasury. Businesses with under 500 employees, sole proprietors, independent contractors, and self-employed individuals could borrow up to 2.5 times their average monthly payroll costs, with the loan forgiven in full if at least 60% of the proceeds were used on payroll costs and the rest on rent, mortgage interest, and utilities.

The PPP ran from April 2020 through May 2021 across two rounds. Total disbursements exceeded $800 billion across approximately 12 million loans. The program was intentionally designed for speed rather than verification—borrowers self-certified eligibility, and lenders processed applications on a first-come-first-served basis with minimal underwriting. The SBA instructed lenders to rely on borrower certifications and not independently verify payroll data.

The PPP database is the largest public disclosure of small business financial data in American history. For loans above $150,000, the SBA released borrower name, address, lender name, loan amount, NAICS code, business type, race and gender of owner, veteran status, and the number of jobs the borrower reported the loan would retain. For loans below $150,000, the SBA released ranges rather than exact amounts, but borrower name and location are still public.

The PPP data revealed patterns that would have been invisible without it. Tens of thousands of loans went to businesses that the evidence later suggested were not operating—borrowers with no payroll records, businesses registered after the pandemic began, and duplicate applications from the same borrower at different lenders. The SBA Inspector General estimated that $100 billion or more in PPP funds may have been obtained through fraud or were otherwise improper. ProPublica built a publicly searchable PPP loan lookup tool using the full dataset. The OCCRP and dozens of newsrooms used the data to identify questionable recipients including dissolved businesses, convicted fraudsters, and duplicate recipients. The DOJ prosecuted thousands of PPP fraud cases using financial records that the public dataset made straightforward to identify.

The PPP data is permanently public and available in bulk from the SBA website. It remains the richest source of granular small business financial data ever published by any federal agency, and it established a disclosure precedent for small business lending that the 7(a) and 504 programs—which have published loan-level data for years—had already set but that PPP extended to an entirely new scale.

Public data: what is disclosed and where to find it

The SBA publishes loan-level data for both 7(a) and 504 programs on a quarterly basis at sba.gov/about-sba/sba-performance/open-government/digital-sba/open-data. The same data is accessible via the Socrata API at data.sba.gov. Historical bulk files going back to fiscal year 1991 are available for download; the most analytically complete period is FY2010 through the present, as older records have inconsistent field coverage and missing demographic flags.

The 7(a) annual files are organized by fiscal year (SBA fiscal year runs October through September). Each row is a single approved loan. The key fields:

FieldNotes
LoanNumberPrimary key. Unique across all fiscal years.
ApprovalDateSBA approval date. Determines the fiscal year cohort for vintage analysis.
BorrNameLegal name of the borrowing business as submitted to the lender.
BorrCity, BorrState, BorrZipBorrower location. State enables geographic aggregation; zip enables census tract matching.
GrossApprovalTotal loan amount approved in dollars.
SBAGuaranteedApprovalDollar amount of the SBA guarantee on this loan.
TermInMonthsLoan maturity in months. Up to 300 (25 years) for real estate loans.
NaicsCodeSix-digit NAICS industry code. Primary basis for sector analysis and size standard verification.
BankName, BankStateOriginating lender name and state. Enables lender-level performance and concentration analysis.
InitialInterestRateInterest rate at origination. Enables spread-above-prime analysis by lender, sector, and vintage.
JobsSupportedNumber of jobs the borrower reported the loan would support. Self-reported; not independently verified.
LoanStatusP I F (paid in full), CHGOFF (charged off), CANCLD (cancelled), EXEMPT, DISBURSED CURRENT (active). Inconsistently formatted across file vintages.
SBA_Guaranteed_Portion_Charged_OffDollar amount the SBA paid out on a defaulted guarantee. Zero for non-defaulted loans.
BusinessTypeCorporation, LLC, Partnership, Sole Proprietorship, etc.
BusinessAgeAge category at origination: “Existing > 2 years,” “New Business,” etc.
RuralUrbanIndicatorR (rural) or U (urban). Rural loans receive the full 85% guarantee on all loans, not just those under $150,000.
MIS_Flag, WomenOwned, VeteranStatusMinority, women-owned, and veteran ownership flags. Self-reported by borrower; SBA does not independently verify against authoritative registries for 7(a).
FranchiseCodeSBA franchise registry code when the business is a franchisee. Enables franchise-level portfolio analysis across all participating SBA lenders.

The 504 files follow a similar structure, substituting CDC-specific fields: the lender is identified as the CDC rather than the originating bank, and the loan amount reflects the CDC debenture portion (40% of project cost) rather than the total project financing. A separate field records the bank first mortgage amount for project-cost reconstruction. The 504 data also includes the debenture rate at origination—the fixed rate locked at closing for the term of the SBA debenture—which is valuable for tracking the rate environment at the time of each project.

Access via the Socrata API at data.sba.gov supports SQL-style filtering via the $where query parameter, making it practical to pull subsets by state, lender, NAICS code, approval date range, or loan status without downloading entire annual files. Socrata dataset IDs change when the SBA republishes data; check data.sba.gov for current identifiers before building automated download pipelines.

For FOIA requests covering fields not in the public dataset, the SBA's FOIA office processes requests within the standard statutory timeframe. Historically, researchers have successfully obtained credit score data and lender-reported collateral valuations through FOIA, though individual loan-level requests for non-public fields are subject to exemptions protecting proprietary business financial information.

SBA size standards: what “small business” means

The SBA defines “small business” by industry using NAICS codes, and the thresholds are larger than most people expect. The size standards are codified at 13 CFR Part 121 and published as the Table of Small Business Size Standards, which the SBA updates periodically. Two measurement bases are used: employee count and average annual receipts.

Manufacturing firms are generally sized by employee count, with thresholds ranging from 500 to 1,500 depending on the specific manufacturing NAICS code. A manufacturer of turbine engines (NAICS 336412) with 1,400 employees is a small business under SBA definitions. A computer systems design firm (NAICS 541512) is sized by annual receipts and qualifies as small if it generates under $30 million per year. Retail trade businesses are sized by annual receipts, with thresholds from $8 million to $47 million. Wholesale trade firms are sized by employee count at 100 or 250 employees depending on the sub-sector.

For 7(a) and 504 purposes, the lender certifies that the borrower meets the applicable size standard at the time of application. The size standard determination uses a three-year average of annual receipts for revenue-based standards and a payroll-period average for employee-based standards. Affiliated businesses—those under common ownership or control—are counted together for size determination, which can disqualify firms that appear small in isolation but are part of a larger corporate family.

The practical consequence of broad size standards is that the SBA programs serve a far larger universe of businesses than is commonly understood. A regional hospital management company, a medium-sized construction firm, a chain of ten restaurants, or a technology consulting practice with 200 employees may all qualify. The NAICS code breakdown in the public 7(a) data shows this breadth directly: the most common NAICS codes are food service (722), healthcare (621), construction (236, 238), retail (44-45), and professional services (54).

Lender rankings and market concentration

The 7(a) program is highly concentrated among a small number of lenders. Live Oak Bank, headquartered in Wilmington, North Carolina, has ranked as the top SBA 7(a) lender by loan count for multiple consecutive years. Live Oak built its SBA practice through sector specialization: dedicated underwriting teams for veterinary practices, dental practices, funeral homes, agribusiness operations, self-storage facilities, craft breweries, pharmacies, and other niche markets where the bank has developed deep collateral expertise and industry-specific financial benchmarks. Live Oak's model is the most successful example of building a national SBA franchise around vertical market knowledge rather than geographic branch density.

By dollar volume, JPMorgan Chase and Wells Fargo periodically hold top positions in high-dollar-volume years, reflecting their large average loan sizes. JPMorgan's SBA practice skews toward larger loans in commercial real estate and business acquisition; Live Oak's top position by count reflects a higher volume of smaller professional practice loans. Huntington National Bank, Newtek Business Services (a non-bank SBA lender), Celtic Bank, and Byline Bank appear consistently in the top-10 by volume, each having built SBA originations as a core business line.

For the 504 program, the top CDCs by debenture volume include TMC Financing (California), Accion Opportunity Fund, and regional CDCs in Texas, Florida, and the Mid-Atlantic. CDCs are mission-driven nonprofits; their concentration by geography is more pronounced than 7(a) lenders, because each CDC has a defined service territory (though the SBA has permitted geographic expansion in some cases). The top five CDCs typically account for 20–25% of annual 504 debenture volume.

The CDFI channel within Community Advantage represents a distinct segment of the market. CDFIs originating SBA loans tend to be smaller by volume, serve higher-risk borrowers in underserved geographies, and carry higher charge-off rates that are an expected consequence of their mandate. The public data allows direct comparison of CDFI charge-off rates versus mainstream lenders, controlling for loan size and industry—a comparison the SBA Inspector General has performed and published.

The SBA publishes annual lender performance reports identifying the top 100 7(a) lenders by approval dollar volume and loan count. These reports are public and provide a useful starting point for lender analysis, though the loan-level data allows far more granular examination: lender charge-off rates by NAICS sector, average loan size over time, spread above prime by lender, and vintage performance by approval year cohort.

Industry concentration and geographic patterns

NAICS analysis of 7(a) data shows consistent patterns across fiscal years. Food service and restaurants (NAICS 722510, 722511, 722513) dominate by loan count in most years. Restaurants are capital-hungry at startup and acquisition, have limited conventional collateral (kitchen equipment depreciates quickly and is illiquid in liquidation), and turn over frequently, creating a recurring pipeline of acquisition and startup financing. The food service sector also carries the highest charge-off rates in the 7(a) portfolio, a relationship that has persisted across economic cycles.

Healthcare practices appear with high approval rates and relatively low default rates. Dental practices (NAICS 621210), veterinary practices (NAICS 541940 in older classification, 812910 in newer), optometry offices (NAICS 621320), and physical therapy clinics (NAICS 621340) generate predictable insurance reimbursement cash flows, have high-value equipment that retains collateral value, and are often acquired by younger practitioners taking on practice debt—a transaction type that 7(a) is specifically designed to support.

Construction and real estate development appear prominently by dollar volume, reflecting the 25-year maturity available for real estate loans and the larger average loan sizes. Franchised businesses appear as a distinct analytical segment in the data via the FranchiseCode field: franchise-specific default rates differ substantially from non-franchise rates in the same NAICS sector, because franchise systems impose operational standards and provide central marketing that reduce individual unit failure rates compared to independent operators.

Geographic distribution shows that SBA loan volume correlates closely with the SBA district office footprint and existing lender network density. California, Texas, Florida, New York, and Illinois generate the highest absolute 7(a) volumes by both count and dollar amount, consistent with their large small business populations. On a per-establishment basis—computing SBA loans per Census County Business Patterns establishment by state—some rural states with strong agricultural lending cultures and active SBA district offices show penetration rates well above their absolute volume would suggest. The rural/urban flag in the data enables direct geographic segmentation: rural loans carry the full 85% guarantee regardless of size, making them more economically attractive to lenders than their dollar amount alone implies.

Charged-off loans and default analysis

The LoanStatus field distinguishes loans that have been charged off from those that are paid in full, cancelled, or still active. When a 7(a) borrower defaults and the lender exhausts standard collection procedures, the lender files a guarantee claim with the SBA. The SBA reviews the claim and, if approved, pays the lender the guaranteed percentage of the outstanding principal. This payment triggers the CHGOFF status in the public data and populates the SBA_Guaranteed_Portion_Charged_Off dollar field.

One methodological note: the charge-off in the public data reflects the date the SBA paid the lender, not the date the borrower first missed payments. The lag between initial default and SBA guarantee payout is typically six to eighteen months, as lenders are required to exhaust collection procedures before filing the guarantee claim. This means the charge-off records for a given origination cohort continue to accumulate in the data for two to three years after origination—a FY2022 vintage loan that defaults in 2023 may not appear as charged off in the public data until 2024 or 2025.

Cumulative charge-off rates for the 7(a) program across the program's history run approximately 10–15% by loan count, with substantial variation by vintage year, lender, and industry. Dollar-weighted default rates are lower because larger loans—which go to more established businesses with more collateral—default at lower rates than smaller startup and working capital loans. Vintage analysis by approval year reveals the pattern: loans originated in the years immediately preceding economic contractions (2006–2007, 2019–2020) show elevated charge-off rates that materialize two to five years after origination as business conditions deteriorate and marginal borrowers who could service debt in good times cannot do so in downturns.

The SBA Inspector General has published multiple reports using publicly available charge-off data to identify high-risk lenders. The 2014 OIG report identified lenders whose cumulative charge-off rates substantially exceeded program averages and recommended enhanced oversight. Those analyses were performed entirely on data that remains publicly downloadable today. Replicating and extending them—with more recent data, additional lender controls, and sector-level decomposition—is straightforward with the Socrata API and a few hundred lines of Python.

The SBIC program

The Small Business Investment Company program operates separately from 7(a) and 504 but is administered by the SBA and addresses the equity and mezzanine capital gap that loan programs cannot fill. SBICs are private investment funds licensed and regulated by the SBA. A licensed SBIC raises private capital from limited partners, then leverages that capital with low-cost SBA-guaranteed debentures, effectively accessing government-backed debt at below-market rates to enhance returns for private investors.

The leverage ratio allowed is typically 2:1—a fund with $100 million in private capital may draw $200 million in SBA debentures, deploying $300 million into equity or subordinated debt investments in small businesses. The SBA guarantee on the debenture makes SBIC-issued debt attractive to institutional investors at rates below what an unguaranteed private fund could access. Combined SBIC investment runs approximately $5–6 billion per year, concentrated in technology, healthcare, and business services.

Historically, the SBIC program backed some of the most consequential technology companies in their early stages: Intel and Apple both received SBIC investment in the 1970s and early 1980s, when they were still small enough to qualify. The program has since shifted toward later-stage private equity, and SBICs today more closely resemble conventional PE funds than the venture programs of the program's early decades. SBIC data is published annually by the SBA at the fund level, with less loan-level granularity than 7(a) and 504—aggregate statistics by fund rather than individual portfolio company data.

Demographic data and equity analysis

The SBA collects self-reported ownership demographic flags on every 7(a) and 504 loan: women-owned business (WomenOwned), minority-owned business (MIS_Flag, with race/ethnicity disaggregation in some data versions), veteran status (VeteranStatus), and rural location (RuralUrbanIndicator). Academic research using this data has documented consistent disparities: minority-owned businesses receive smaller average loan amounts and pay higher average interest rates than otherwise comparable non-minority businesses—even within the SBA program, which is designed explicitly to expand credit access.

Some of this disparity reflects industry concentration differences. Businesses owned by members of underrepresented groups are more concentrated in food service, personal care, and retail—industries with lower average loan amounts and higher historic default rates—than in the healthcare and professional services sectors where approval rates and average loan sizes are higher. Controlling for NAICS sector and loan size reduces but does not eliminate the rate differentials in the public data.

The rural indicator is analytically useful beyond its demographic implications. Rural 7(a) loans receive a structural advantage—the full 85% guarantee regardless of loan amount—and rural lenders operating in markets with fewer competitors can sometimes charge wider spreads above prime. Whether rural borrowers pay more or less than urban borrowers, controlling for loan size and sector, is directly computable from the InitialInterestRate field across the full public dataset.

Python workflow: downloading and analyzing FY2022 7(a) data

The following script downloads the SBA 7(a) FY2022 public CSV, summarizes the dataset, identifies the top ten lenders by loan count, computes average loan size by state, and examines the top NAICS industries by loan count. The URL reflects the direct CSV download link; check data.sba.gov for the current FY file identifiers, as these change when the SBA publishes updated releases.

import requests, pandas as pd, io, zipfile

# SBA 7(a) loan data — public quarterly release
# Download from: https://www.sba.gov/about-sba/sba-performance/open-government/digital-sba/open-data/lending-data
# Direct CSV (FY2022):
url = "https://data.sba.gov/dataset/8aa276d2-6346-4753-a35e-1df51a0f95de/resource/72e04bc7-2369-4484-86c1-21a8eb27dd32/download/foia-7afy2022.csv"
resp = requests.get(url, timeout=60)
df = pd.read_csv(io.StringIO(resp.text), low_memory=False)

print(f"FY2022 7(a) loans: {len(df):,} rows, {len(df.columns)} columns")
print(f"Columns: {list(df.columns[:10])}")

# Top lenders by approval count
top_lenders = df["BankName"].value_counts().head(10)
print("\nTop 10 lenders by 7(a) loan count (FY2022):")
for lender, count in top_lenders.items():
    print(f"  {str(lender):<40} {count:>6,}")

# Average loan size by state
df["GrossApproval"] = pd.to_numeric(df.get("GrossApproval", df.get("SBAGuaranteedApproval", 0)), errors="coerce")
by_state = df.groupby("BorrState")["GrossApproval"].agg(["count", "mean"]).sort_values("count", ascending=False)
print("\nTop 10 states by 7(a) loan count:")
print(by_state.head(10).to_string())

# NAICS industry breakdown
top_industries = df["NaicsCode"].value_counts().head(10)
print("\nTop NAICS codes (industries) by loan count:")
print(top_industries.to_string())

Running this against the FY2022 file typically returns 50,000–70,000 rows—each representing a single approved loan. The top-lender output will show Live Oak Bank near the top by count, with JPMorgan, Wells Fargo, and Huntington appearing by volume when sorted differently. The state breakdown will show California, Texas, and Florida at the top by count. The NAICS breakdown will show food service codes (722) and healthcare codes (621) dominating.

For multi-year analysis, pull five years of files, concatenate them, and compute vintage charge-off rates by filtering on LoanStatus. The LoanStatus field requires normalization: values like P I F, PIF, Paid in Full, and PAID IN FULL all represent the same status across different file versions. Use uppercase conversion and substring matching rather than exact equality for reliable cross-vintage filtering.

For lender charge-off analysis, join on BankName across all years, group by lender and LoanStatus, and compute the fraction of each lender's portfolio that has reached CHGOFF status. Lenders with charge-off rates more than two standard deviations above the program mean warrant closer examination: the cause is usually sector concentration (a lender heavily exposed to restaurants will have higher charge-offs in any recession) or underwriting culture (lenders that approved marginal credits to hit volume targets show elevated charge-offs within two to three years).


Related writing: FDIC Bank Failure Data: Every US Bank That Has Failed Since 1934—SBA 7(a) lenders are FDIC-regulated institutions; bank failure records can be joined to SBA lender data by institution name and state to identify cases where a lender's SBA charge-off rate preceded its FDIC supervisory problems.

Related writing: USASpending.gov: The Federal Spending Database Behind $6 Trillion in Annual Contracts, Grants, and Loans—many SBA borrowers also pursue federal contracts through the 8(a) and HUBZone small business set-aside programs; USASpending contract data is joinable to SBA loan data by business name and SAM.gov registration to examine how federal contracting and SBA lending interact for individual firms.

Related writing: HMDA Mortgage Lending Data: The Federal Database Behind 15 Million Annual Mortgage Applications—the structural parallel to SBA for commercial real estate; HMDA covers the residential mortgage market with comparable loan-level transparency, and the same lenders that dominate SBA commercial real estate financing often appear in HMDA data for residential mortgage originations.