Skip to content

We build tools to semi-automate a certain type of philanthropy. This repository contains two programs: 1) Excel VBA code to score IRS Form 990s and 2) Python to scrape Labor Department directory of scholarships. The user can compare organizations by how many scholarships they give and their 990 score.

License

Notifications You must be signed in to change notification settings

royrada/Automating-Philanthropy

Repository files navigation

Automated Philanthropy Scoring Framework

This project develops tools to semi-automate a specific style of philanthropic evaluation. The current system has two distinct components for processing different data:

  • IRS Form 990s which must be filed by 501c3s
  • Dept of Labor Scholarship Directory

The Form 990 code uses Excel VBA to score IRS Form 990s submitted by 501(c)(3) organizations. The scoring highlights entities that have endowments, award scholarships, and emphasize science education or research. This framework is at an early stage of development, and future versions are expected to be implemented in Python with relational database support and a web-based interface.

The Scholarship Directory information is scraped from the Labor Department website using Python and associated tools.

Form 990 Scorer

Getting Started

To get the system running:

  1. Download Code.xlsm and place it in your working directory (e.g., x/).
  2. Download the .txt files:
    • nodenames.txt
    • stopwords.txt
    • punctuation.txt
    • rule.txt Place them in the same directory as the Excel file.
  3. Download IRS Form 990 XML files from IRS Form 990 Series Downloads.
  4. Unzip those forms into x/testforms/, and create subdirectories:
    • x/testforms/990 for standard 990 files
    • x/testforms/errant for nonstandard or filtered-out files

Prerequisites

  • Microsoft Excel with VBA enabled (or compatible environment)
  • Basic familiarity with running macros and editing file paths

Installation Notes

  • The system is modular and can be customized to suit different evaluation criteria.
  • You may edit any of the supporting .txt files to change what data is parsed or scored.

nodenames.txt

Each line defines:

  • Data type (String, Date, Integer, AbsInt)
  • Field length
  • XML path to the node Example:
Date;10;Return/ReturnHeader/TaxPeriodBeginDt  
Integer;4;Return/ReturnHeader/TaxYr  
AbsInt;15;Return/ReturnData/IRS990/CYInvestmentIncomeAmt  
String;600;Return/ReturnData/IRS990/ActivityOrMissionDesc  

stopwords.txt and punctuation.txt

Used to clean and tokenize text fields—feel free to modify.

rule.txt

Defines scoring logic for each rule. Users can modify or add rules.

Parsed & Scored Worksheets

  • Parsed990Data contains extracted data:
    • Headers: nodenames
    • Rows: form unique IDs and their values
  • Scored990Data contains rule evaluations:
    • Headers: rule names
    • Rows: binary scores (1 or 0)

Rule Types

There are four rule types. Each uses a semicolon-delimited format:

1. Substring

Substring;RuleName;Nodename;Present;token1,token2,...
  • Checks if tokens are present (or absent) in the specified text node.

2. Trend

Trend;RuleName;Nodename1,Nodename2,...
  • Compares values across nodes for an upward/downward trend.

3. Percentile

Percentile;RuleName;Nodename;Cutoff
  • Scores 1 if a value is above the given percentile cutoff.

4. Eval

Eval;RuleName;Nodename;NumOrTxt;Expression
  • Evaluates logical expressions involving the node's value.

Sample Rules from rule.txt:

Eval;Age;IRS990_FormationYr;Num;Year(Now()) - IRS990_FormationYr > 15  
Substring;Web;IRS990_WebsiteAddressTxt;T;academy,edu  
Percentile;EndYrBal;CYEndwmtFundGrp_EndYearBalanceAmt;0.50  
Trend;YrNet;IRS990_NetAssetsOrFundBalancesBOYAmt,IRS990_NetAssetsOrFundBalancesEOYAmt  

Running the System

  1. Move Files
    In VBA module move990, run Move990Files

    Moves Form 990 files to /990/, skips Form 990EZ and others

  2. Parse XMLs
    In module Parse, run ParseXML990Files

    Extracts nodename data into Parsed990Data

  3. Clean Text
    In module Strip, run Master

    Cleans descriptions and web addresses; populates DescFiltered

  4. Score Data
    In module Score, run Score

    Evaluates rules and outputs to Scored990Data

Scraping Scholarship Directory

Prerequisites

Activate ChromeDriver.exe

If you don't have these Python extensions, then run:

  • pip install requests
  • pip install beautifulsoup4
  • pip install selenium
  • pip install pandas

Download scraper.py from GitHub repository and place it in your working directory.

Output

The output goes to a csv file with 8 columns and as many rows as scholarships. The 8 columns are labeled

  1. ID
  2. Award Name
  3. Organization
  4. Purpose
  5. Level of Study
  6. Award Type
  7. Award Amount
  8. Deadline

In the output uploaded to GitHub, the csv file (called scholarships.csv) has 10,000 rows which was the entirety of what was available from the Labor Department's CareerOneStop website on July 30, 2025.

Authors & Contributions

  • Roy Rada: Project lead, architecture, Excel VBA development, Python coding, testing, refining parameters, maintaining
  • Microsoft Copilot: Collaborative assistance in coding and system design

License

This project is licensed under the GNU General Public License v3.0.

About

We build tools to semi-automate a certain type of philanthropy. This repository contains two programs: 1) Excel VBA code to score IRS Form 990s and 2) Python to scrape Labor Department directory of scholarships. The user can compare organizations by how many scholarships they give and their 990 score.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published