I am a senior software engineer focused on machine learning. I have
played a critical role in two paradigm shifts in fields of science using
machine learning â first, in materials science; second, in weather
prediction.
Sabbatical Projects
Writer, vaga ⢠bon ⢠vivants
October 2023 - Present. Remote.
My wife and I have a travel blog! Weâre telling the story of our
gap year, one chapter at a time. vagabonvivants.com
EarthRanger is a project funded by the Allen Institute of AI
(AI2) to support African Elephant Conservation, among other conservation
projects. Itâs a collaborative effort involving the Mara Elephant
Project and Ecoscope.io.
dask-ee. Google
Earth Engine Feature Collections via Dask Dataframes. Featured at SciPy 2024.
xarray-sql. What
if we could join raster, vector, and point data by treating pixels as
tables?
Google Experience
Anthromet Team, Google
Research
2021 - 2023. Remote, CA
Anthromet is on a mission to make weather information universally
accessible and useful. It does this by developing state-of-the-art AI
weather forecasts and integrating them into products.
Xee: An Xarray backend
for Google Earth Engine. (Xarray, Google Earth Engine)
weather-tools,
a set of data pipelines to make weather data universally accessible and
useful. Originally a 20% effort, I grew the project to a team of 8
engineers to serve ~25 research and product teams across Google AI,
Brain, X, DeepMind and Cloud. (Apache Beam, Xarray, Google Earth
Engine, Google BigQuery, MetView):
GraphCast:
Enabled DeepMind to ingest and regrid ERA5, the dataset behind their
autoregressive graph neural network. At the time of publishing, this was
the SOTA 10 day weather forecast, beating physics based models.
GraphCast was among 2023's
top-ten biggest breakthroughs published in Science,
ushering in a new generation of AI-based weather forecasts.
MetNet
v3 & Nowcasting in Google Search: MetNet is the worldâs leading
Nowcast, or 24 hour, minute by minute weather forecast at 1-4 km
resolution. weather-tools created gobal training,
inference, and validation datasets, enabling our team to ship to
GSRP.
Project
Contrails. I provided critical weather data and data engineering
pipelines that made this project possible. This project alone will solve
1% of anthropogenic climate change by reducing solar irradiance from
airplanes.
ARCO-ERA5
& Weatherbench2: I
ingested and published the two biggest datasets in Google Cloudâs Public
Dataset program. I worked with Cloud to shape weather tools to ingest
ERA5 into Google BigQuery.
DeepMindâs
Wind Energy optimization. By ingesting weather data into Google
BigQuery, I was able to help create an ML model with DeepMind and Cloud
to make wind energy more profitable in the Texas energy grid. This lead
to a ~3% improvement of mean absolute zero error and ~$7 million more in
revenue over 8 months from wind power.
Weatherbench2.
The definitive benchmark to fairly compare AI-based, mid-range weather
forecasts and a cornerstone for all future ML weather model development.
(Xarray, Apache Beam, Zarr)
Besides ingesting the fundamental datasets (see above), I made core
updates to Xarray-Beam, the
underlying engine behind the benchmark.
Contributed code to the benchmark itself, helping ship it to
production.
ARCO-ERA5 is the biggest dataset in Cloud Public Datasets, at 12+
petabytes. It represents the most accurate history of weather on Earth
from 1940 to the present.
Pangeo Forge is aiming to become the conda-forge of scientific
datasets, or open ecosystem of data engineering recipes for producing
cloud-optimized, analysis ready data.
I contributed bug fixes and the Beam integration upstream to Pangeo
Forge to produce ARCO-ERA5.
I
was the impetus for Pangeo Forge to transition their data
engineering system to Apache Beam.
To support Pangeo Forge, I contributed a Dask runner to Apache Beam
(presented at PyData
NYC).
Awards and recognition received while on Anthromet:
Google Researchâs Science Award for Best Collaboration,
along with my team and partner teams.
A Greenie award from Anthropocene, an internal grassroots
organization focused on climate technologies.
A spot bonus from John Platt, the head of Google Applied Sciences,
for speeding up a critical data ingestion workflow by a few orders of
magnitude.
Lead a team of twelve 20%-engineers to make contributions to the
weather community, including internal changes to Google Earth Engine and
open source changes to Weather Tools.
Arcs is an experimental new programming model for
privacy-preserving computation and AI. It enables rapid compositional
development and probable privacy via data flow analysis. Before the 2023
layoffs, the project internally debuted as a core AI-safety system for
ambient computing.
Extended Arcs data flow analysis system to verify MediaPipe graphs.
This was an important step towards provably private machine learning
applications.
Created a system for automatic claim deduction in a SQL-like subset
of the Arcs language. Claim deduction is a core routine to prove that a
program adheres to a privacy policy (Kotlin, Visitor
Pattern).
Added features to the projectâs domain-specific language;
specifically, type variables, maximum-valued types, and reflection.
Together, this helped ship a compile-time privacy checking system into
production on Android (Typescript, Kotlin, Data Flow
Analysis).
Created a key compiler component to facilitate allocation of modular
programs across distributed computing environments, bridging our web
technology codebase to Android. (Typescript, Bazel,
Kotlin).
Extended Googleâs build system to support Kotlin to Wasm
compilation; created code generators to simplify programming with Kotlin
on the web (Kotlin, Wasm, Bazel).
Developed prototypes to discover machine-learning capabilities
within the Arcs programming model (Typescript, Tensorflow
JS).
Aira helps blind and low-vision users access visual information
via remote assistance with smart glasses.
Built core dialog engine for an Android Voice-UX. This allows our
blind and low-vision users to pair Bluetooth devices, call a remote
assistant, and rate call experiences conversationally. (Java
8)
Created a visual-question-answer research prototype for an NSF
grant. The system used real-time object detection and rule-based NLP to
investigate assistive user experiences for blind and low vision people.
(Tensorflow, OpenCV, YOLO)
Technical lead for prototype of indoor navigation system using
computer vision (OpenSfM, ArcGIS).
Trained and productionized a mobile USD currency classifier model to
help blind users identify paper bills. (Tensorflow, Android,
MobileNet, Firebase MLKit)
Created an image tagging game to label integral internal datasets.
(Spring Boot, Vue.js, Typescript)
Led agile rituals such as daily stand-ups, sprint planning meetings,
and retrospectives; started a machine-learning brownbag lunch
series.
2018 â 2019. UCSD Nanoengineering Department. La Jolla, CA
Dr. Vecchio's research group focuses on advanced materials
discovery and their translation to industrial applications.
Used neural networks and boosted tree based algorithms to make
materials science discoveries (Tensorflow/Keras, ResNet, XGBoost,
Sklearn).
Applied gradient-based techniques to explain classifications of
convolutional neural networks (GradCam).
Used OOP design principles to create a framework to track
hyperparameters & ML pipelines in git.
Taught nanoengineering grad students machine learning, focusing on
neural networks and tree algorithms.
The papers produced as a result of my contributions would go on to
be published in Science, Nature, and respected journals within the
materials science world.
de Sa Lab specializes in machine learning and brain computer
interfaces (BCI).
Developed an In-Ear EEG
prototype aiming to help people with epilepsy. (OpenBCI, Matlab,
SVMs, Ensemble Methods)
Developed BrainTag,
an open source neurofeedback game for children with autism spectrum
disorders. Presented at UCSD's Undergrad Research Conference.
(Arduino, Neurosky, C)
Initiated collaboration between de Sa lab and OpenBCI. We were one of the startupâs
first university partners.
Taught open source BCI workshops to UCSD students through hands-on
workshops. (Python, OpenBCI)
Wrote data visualization and machine learning toolbox for an open
source brain computer interface. (Python, Java, C, JS)
Learning skillful medium-range global weather
forecasting. Remi Lam, Alvaro Sanchez-Gonzalez, Matthew
Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri,
Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, Alexander
Merose, Stephan Hoyer, George Holland, Oriol Vinyals, Jacklynn
Stott, Alexander Pritzel, Shakir Mohamed, Peter Battaglia.
Science 382 (6677), 1416-1421
Discovery of high-entropy ceramics via machine
learning. Kevin Kaufmann, Daniel Maryanovsky, William M Mellor,
Chaoyi Zhu, Alexander S Rosengarten, Tyler J
Harrington, Corey Oses, Cormac Toher, Stefano Curtarolo, Kenneth S
Vecchio. Npj Computational Materials 6 (1), 42
Crystal symmetry determination in electron diffraction using
machine learning. Kevin Kaufmann, Chaoyi Zhu, Alexander
S Rosengarten, Daniel Maryanovsky, Tyler J Harrington, Eduardo
Marin, Kenneth S Vecchio. Science 367 (6477), 564-568
WeatherBench 2: A benchmark for the next generation of
dataâdriven global weather models. Stephan Rasp, Stephan Hoyer,
Alexander Merose, Ian Langmore, Peter Battaglia, Tyler
Russell, Alvaro SanchezâGonzalez, Vivian Yang, Rob Carver, Shreya
Agrawal, Matthew Chantry, Zied Ben Bouallegue, Peter Dueben, Carla
Bromberg, Jared Sisk, Luke Barrington, Aaron Bell, Fei Sha. Journal
of Advances in Modeling Earth Systems 16 (6), e2023MS004019
Deep learning for day forecasts from sparse
observations. Marcin Andrychowicz, Lasse Espeholt, Di Li,
Samier Merchant, Alexander Merose, Fred Zyda, Shreya
Agrawal, Nal Kalchbrenner. arXiv preprint arXiv:2306.06079
Deep neural network enabled space group identification in
EBSD. K Kaufmann, C Zhu, AS Rosengarten, KS
Vecchio. Microscopy and Microanalysis 26 (3), 447-457