Made at/for the Chicago Innovate Hackathon 2023
G-Chomp is a project to collect and analyze the collective knowledge of the community using Rhino and Grasshopper on the McNeel Forum. Our current goal is to develop a dataset which can be used to tune and/or train an LLM to act as a co-pilot for an LLM-driven Grasshopper Co-pilot.
Cesar Hidalgo, author of "Why Information Grows" coined the term "personbyte" to refer to the maximum knowledge and knowhow of an individual. He additionally refers to the "teambyte" and "firmbyte" as extensions of this concept. we propose the term “community-byte” as a measure of the limits of a communal store of knowledge, and “G-CHOMP” as a Grasshopper-specific community-byte. This project seeks to transform the collective knowledge of the Rhino/Grasshopper community as represented in the Rhino community forum to make it usable and accessible in new contexts.
Here are the steps we are planning:
- gather posts from the Rhino/McNeel forum
- gather only posts with Grasshopper Scripts attached
-
method 2: using Discourse OpenAPI
- gather posts from the Rhino/McNeel forum
- gather only posts with Grasshopper Scripts attached
- overcome 30 post limit
- further characterize the posts with additional attributes relating to the contents of the scripts and/or other data points
- Get the list of components in each script (using david rutten's script parser or .ghx xml crawler)
- convert program to output csv
- automate use of program and link per dataset entry
- Characterize images in posts - is it a picture of the script? of the script output? neither?
- perform analysis of the dataset with several goals in mind: Cleaning the data, qualifying it (predicting script quality), and providing insights about the data.
- Find/identify a list of Rhino/Grasshopper "stop words" - common words in posts that should be ignored when characterizing topic conversations
- Find/identify a list of Grasshopper "stop words" - the most common components that should be ignored when characterizing scripts
- qualify script quality by proxy (author? number of posts? Some keywords to include/exclude?)
- identify example scripts from Omid Sajedi workshop to use as bases for development
- modify example scripts with cbyte dataset (simplified or otherwise)
- Document our process and our findings
- maintain readme to reflect progress
- Share the dataset on Kaggle
We hope/plan to pursue the following additional goals:
- Use the model to train/tune an LLM to understand the semantic connection between a description of what a script does and the contents of the script
- create new scripts using an LLM to automatically generate grasshopper components on the canvas
- possibly using/in combination with ghPT
- analyze the quality of the scripts
- identify potential changes to improve results
- Expand the dataset
Jerry - Chieh Jui Lee - JERRRRY - IIT (Illinois Institute of Technology)
Jo Kamm - jkamm - Digital Technology Lead at Dimensional Innovations
Patryk Wozniczka - patrykwoz
Saumya Borwankar - saumyaborwankar - IIT
Siddhartha Upase - IIT
YongWon Jeong - yjeong93 - Architectural professional 1 at Wight&Company
- Discourse API
- Grasshopper file analytics tool
- gh to ghx tool
- Becoming AI Power-Users: A Hands-on Workshop on Machine Learning and Generative AI by Seyedomid Sajedi of Thornton Thomasetti during Chicago Innovate 2023 (github link forthcoming soon)
