I am a senior software engineer focused on machine learning. I have
played a critical role in two paradigm shifts in fields of science using
machine learning â recently, in weather prediction; previously, in
materials science.
Sabbatical Projects
Writer, vaga ⢠bon ⢠vivants
October 2023 - Present. Remote.
My wife and I have a travel blog! Weâre telling the story of our
gap year, one chapter at a time. vagabonvivants.com
EarthRanger is a project funded by the Allen Institute of AI
(AI2) to support African Elephant Conservation, among other conservation
projects. Itâs a collaborative effort involving the Mara Elephant
Project and Ecoscope.io.
dask-ee. Google
Earth Engine Feature Collections via Dask Dataframes. Featured at SciPy 2024.
xarray-sql. An
experiment to join raster, vector, and point data by treating pixels as
tables.
Google Experience
Anthromet Team, Google
Research
2021 - 2023. Remote, CA
Anthromet is on a mission to make weather information universally
accessible and useful by developing state-of-the-art AI weather
forecasts and integrating them into products.
Xee: An Xarray backend
for Google Earth Engine. (Xarray, Google Earth Engine)
This connects Google Earth Engine to the scientific Python
ecosystem.
When integarted with Xarray-Beam, it only takes ~25 lines of code
and a few hours to export 20 TiBs of data from Google Earth Engine to
Zarr, saving thousands of LOC and days of debugging quota limits.
Built to serve an internal weather research platform to build and
ship new weather models.
Since launch, the project has at .
weather-tools,
a set of data pipelines to make weather data universally accessible and
useful. Originally a side project (20% time), I grew the project to a
team of 8 engineers to serve ~25 research and product teams across
Google AI, Brain, X, DeepMind and Cloud. (Apache Beam, Xarray,
Google Earth Engine, Google BigQuery, MetView):
GraphCast:
Enabled DeepMind to ingest and regrid ERA5, the dataset behind their
autoregressive graph neural network. At the time of publishing, this was
the SOTA 10 day weather forecast, beating physics based models.
GraphCast was among 2023's
top-ten biggest breakthroughs published in Science,
ushering in a new generation of AI-based weather forecasts.
MetNet
v3 & Nowcasting in Google Search: MetNet is the worldâs leading
Nowcast, or 24 hour, minute by minute weather forecast at 1-4 km
resolution. weather-tools created gobal training,
inference, and validation datasets, enabling our team to ship to
GSRP.
Project
Contrails. I provided critical weather data and data engineering
pipelines that made this project possible. This project alone will solve
1% of anthropogenic climate change by reducing solar irradiance from
airplanes.
ARCO-ERA5
& Weatherbench2: I
ingested and published the two biggest datasets in Google Cloudâs Public
Dataset program. I worked with Cloud to shape weather tools to ingest
ERA5 into BigQuery.
DeepMindâs
Wind Energy optimization. By ingesting weather data into Google
BigQuery, I helped create an ML model with DeepMind and Cloud to make
wind energy more profitable in the Texas energy grid. This lead to a ~3%
improvement of mean absolute zero error and ~$7 million more in revenue
over 8 months from wind power.
Weatherbench2.
The definitive benchmark to fairly compare AI-based, mid-range weather
forecasts and a cornerstone for all future ML weather model development.
(Xarray, Apache Beam, Zarr)
In addition to ingesting the fundamental datasets (see above), I
made core updates to Xarray-Beam, the
underlying engine behind the benchmark.
Contributed code to the benchmark itself, helping ship it to
production.
ARCO-ERA5 is the biggest dataset in Cloud Public Datasets, at 12+
petabytes. It represents the most accurate history of weather on Earth
from 1940 to the present.
Pangeo Forge is aiming to become the canonical open ecosystem of
data engineering recipes for producing cloud-optimized, analysis ready
data, i.e. the conda-forge of scientific datasets.
I contributed bug fixes and the Beam integration upstream to Pangeo
Forge to produce ARCO-ERA5.
I
was the impetus for Pangeo Forge to transition their data
engineering system to Apache Beam.
Awards and recognition received while on Anthromet:
Google Researchâs Science Award for Best Collaboration,
along with my team and partner teams.
A Greenie award from Anthropocene, an internal grassroots
organization focused on climate technologies.
A spot bonus from John Platt, the head of Google Applied Sciences,
for speeding up a critical data ingestion workflow by a few orders of
magnitude.
Lead a team of twelve 20%-engineers to make contributions to the
weather community, including internal changes to Google Earth Engine and
open source changes to Weather Tools.
Arcs is an experimental new programming model for
privacy-preserving computation and AI. It enables rapid compositional
development and probable privacy via data flow analysis. Before the 2023
layoffs, the project internally debuted as a core AI-safety system for
ambient computing.
Extended Arcs data flow analysis system to verify MediaPipe graphs.
This was an important step towards provably private machine learning
applications.
Created a system for automatic claim deduction in a SQL-like subset
of the Arcs language. Claim deduction is a core routine to prove that a
program adheres to a privacy policy (Kotlin, Visitor
Pattern).
Added features to the projectâs domain-specific language;
specifically, type variables, maximum-valued types, and reflection.
Together, this helped ship a compile-time privacy checking system into
production on Android (Typescript, Kotlin, Data Flow
Analysis).
Created a key compiler component to facilitate allocation of modular
programs across distributed computing environments, bridging our web
technology codebase to Android. (Typescript, Bazel,
Kotlin).
Extended Googleâs build system to support Kotlin to Wasm
compilation; created code generators to simplify programming with Kotlin
on the web (Kotlin, Wasm, Bazel).
Developed prototypes to discover machine-learning capabilities
within the Arcs programming model (Typescript, Tensorflow
JS).
Aira helps blind and low-vision users access visual information
via remote assistance with smart glasses.
Built core dialog engine for an Android Voice-UX. This allowed our
blind and low-vision users to pair Bluetooth devices, call a remote
assistant, and rate call experiences conversationally. (Java
8)
Created a visual-question-answer research prototype for an NSF
grant. The system used real-time object detection and rule-based NLP to
investigate assistive user experiences for blind and low vision people.
(Tensorflow, OpenCV, YOLO)
Technical lead for prototype of indoor navigation system using
computer vision (OpenSfM, ArcGIS).
Trained and productionized a mobile USD currency classifier model to
help blind users identify paper bills. (Tensorflow, Android,
MobileNet, Firebase MLKit)
Created an image tagging game to label integral internal datasets.
(Spring Boot, Vue.js, Typescript)
Led agile rituals such as daily stand-ups, sprint planning meetings,
and retrospectives. Started a machine-learning brownbag lunch
series.
Dr. Vecchio's research group focuses on advanced materials
discovery and their translation to industrial applications.
Used neural networks and boosted tree based algorithms to make
materials science discoveries (Tensorflow/Keras, ResNet, XGBoost,
Sklearn).
Applied gradient-based techniques to explain classifications of
convolutional neural networks (GradCam).
Used OOP design principles to create a framework to track
hyperparameters & ML pipelines in git.
Taught nanoengineering graduate students machine learning, focusing
on neural networks and tree algorithms.
The papers produced as a result of my contributions would go on to
be published in Science, Nature, and respected
journals within the materials science world.
de Sa Lab specializes in machine learning and brain computer
interfaces (BCI).
Developed an In-Ear EEG
prototype aiming to help people with epilepsy. (OpenBCI, Matlab,
SVMs, Ensemble Methods)
Developed BrainTag,
an open source neurofeedback game for children with autism spectrum
disorders. Presented at UCSD's Undergrad Research Conference.
(Arduino, Neurosky, C)
Initiated collaboration between de Sa lab and OpenBCI. We were one of the startupâs
first university partners.
Taught open source BCI workshops to UCSD students through hands-on
workshops. (Python, OpenBCI)
Wrote data visualization and machine learning toolbox for an open
source brain computer interface. (Python, Java, C, JS)
Learning skillful medium-range global weather
forecasting. Remi Lam, Alvaro Sanchez-Gonzalez, Matthew
Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri,
Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, Alexander
Merose, Stephan Hoyer, George Holland, Oriol Vinyals, Jacklynn
Stott, Alexander Pritzel, Shakir Mohamed, Peter Battaglia.
Science 382 (6677), 1416-1421
Discovery of high-entropy ceramics via machine
learning. Kevin Kaufmann, Daniel Maryanovsky, William M Mellor,
Chaoyi Zhu, Alexander S Rosengarten, Tyler J
Harrington, Corey Oses, Cormac Toher, Stefano Curtarolo, Kenneth S
Vecchio. Npj Computational Materials 6 (1), 42
Crystal symmetry determination in electron diffraction using
machine learning. Kevin Kaufmann, Chaoyi Zhu°,
Alexander S Rosengarten°, Daniel Maryanovsky, Tyler J
Harrington, Eduardo Marin, Kenneth S Vecchio. Science 367
(6477), 564-568
WeatherBench 2: A benchmark for the next generation of
dataâdriven global weather models. Stephan Rasp, Stephan Hoyer,
Alexander Merose, Ian Langmore, Peter Battaglia, Tyler
Russell, Alvaro SanchezâGonzalez, Vivian Yang, Rob Carver, Shreya
Agrawal, Matthew Chantry, Zied Ben Bouallegue, Peter Dueben, Carla
Bromberg, Jared Sisk, Luke Barrington, Aaron Bell, Fei Sha. Journal
of Advances in Modeling Earth Systems 16 (6), e2023MS004019
Deep learning for day forecasts from sparse
observations. Marcin Andrychowicz, Lasse Espeholt, Di Li,
Samier Merchant, Alexander Merose, Fred Zyda, Shreya
Agrawal, Nal Kalchbrenner. arXiv preprint arXiv:2306.06079
Deep neural network enabled space group identification in
EBSD. K Kaufmann, C Zhu, AS Rosengarten, KS
Vecchio. Microscopy and Microanalysis 26 (3), 447-457