I am a senior software engineer and machine learning researcher. I
have played a critical role in two paradigm shifts in fields of science
using machine learning â recently, in weather
prediction; previously, in materials
science.
(NB: You can get a curated, 3-page version of this CV my printing
the page.)
Experience
Founding
Member of the Technical Staff, OpenAthena
January 2025 - Present. Remote.
OpenAthena is a non-profit that helps academic research labs
build foundation models for science.
EarthRanger is a project funded by the Allen Institute of AI
(AI2) to support African Elephant Conservation, among other conservation
projects. Itâs a collaborative effort involving the Mara Elephant
Project and Ecoscope.io.
dask-ee. Google
Earth Engine Feature Collections via Dask Dataframes. Featured at SciPy 2024.
xarray-sql. An
experiment to join raster, vector, and point data by treating pixels as
tables.
Anthromet Team, Google
Research
2021 - 2023. Remote, CA
Anthromet is on a mission to make weather information universally
accessible and useful by developing state-of-the-art AI weather
forecasts and integrating them into products.
Xee: An Xarray backend
for Google Earth Engine. (Xarray, Google Earth Engine)
Part of an internal platform for building AI weather models, this
connects GEE to the SciPy ecosystem.
Since launch, the library has had at .
weather-tools.
Pipelines to make weather data universally accessible and useful.
Originally a 20%-project,
I grew weather-tools to a team of 8 engineers serving ~25 research and
product teams across Alphabet (Apache Beam, Xarray, Google Earth
Engine, Google BigQuery):
GraphCast:
Enabled DeepMind to ingest and regrid ERA5, the dataset behind their
autoregressive graph neural network. This was the first AI model to beat
physics-based models at predicting the weather. GraphCast was among 2023's
top-ten biggest breakthroughs published in Science,
ushering in a new generation of AI-based weather forecasts.
MetNet
v3 & Nowcasting in Google Search: MetNet is the worldâs leading
Nowcast, or 24 hour, minute-by-minute weather forecast at 1-4 km
resolution. weather-tools created gobal training, inference, and
validation datasets, enabling our team to ship to Google Search.
Project
Contrails. I provided critical weather datasets and pipelines that
made this project possible. Project contrails will help address 1% of
anthropogenic climate change by reducing solar irradiance from
airplanes.
Weatherbench2.
The definitive benchmark to fairly compare physics and AI, mid-range
weather forecasts and a cornerstone for all future ML weather model
development. (Xarray, Apache Beam, Zarr)
I acquired and cloud-optimized the benchmark's datasets.
I improved Xarray-Beam, the
underlying engine behind the benchmark.
Contributed core benchmark code and helped design the scorecard (a
Plotly Dash app!).
ARCO-ERA5. An
analysis-ready, cloud-optimized history of weather on Earth.
(Pangeo-Forge, Apache Beam, Xarray, Zarr)
ARCO-ERA5 is the biggest dataset in Cloud Public Datasets, at 12+
petabytes, and likely includes the biggest single
Zarr ever created.
Awards and recognition received while on Anthromet:
Google Researchâs Science Award for Best Collaboration,
along with my team and partner teams.
A Greenie award from Anthropocene, an internal grassroots
organization focused on climate technologies.
Lead a team of twelve 20%-engineers to make contributions to the
weather community, including internal changes to Google Earth Engine and
open source changes to Weather Tools.
Arcs is an experimental new programming model for
privacy-preserving computation and AI. It enables rapid compositional
development and probable privacy via data flow analysis. Before the 2023
layoffs, the project internally debuted as a core AI-safety system for
ambient computing.
Extended Arcs data flow analysis system to verify MediaPipe graphs.
This was an important step towards provably private machine learning
applications.
Created a system for automatic claim deduction in a SQL-like subset
of the Arcs language. Claim deduction is a core routine to prove that a
program adheres to a privacy policy (Kotlin, Visitor
Pattern).
Added features to the projectâs domain-specific language;
specifically, type variables, maximum-valued types, and reflection.
Together, this helped ship a compile-time privacy checking system into
production on Android (Typescript, Kotlin, Data Flow
Analysis).
Created a key compiler component to facilitate allocation of modular
programs across distributed computing environments, bridging our web
technology codebase to Android. (Typescript, Bazel,
Kotlin).
Extended Googleâs build system to support Kotlin to Wasm
compilation; created code generators to simplify programming with Kotlin
on the web (Kotlin, Wasm, Bazel).
Developed prototypes to discover machine-learning capabilities
within the Arcs programming model (Typescript, Tensorflow
JS).
Aira helps blind and low-vision users access visual information
via remote assistance with smart glasses.
Built core dialog engine for an Android Voice-UX. This allowed our
blind and low-vision users to pair Bluetooth devices, call a remote
assistant, and rate call experiences conversationally. (Java
8)
Created a visual-question-answer research prototype for an NSF
grant. The system used real-time object detection and rule-based NLP to
investigate assistive user experiences for blind and low vision people.
(Tensorflow, OpenCV, YOLO)
Technical lead for prototype of indoor navigation system using
computer vision (OpenSfM, ArcGIS).
Trained and productionized a mobile USD currency classifier model to
help blind users identify paper bills. (Tensorflow, Android,
MobileNet, Firebase MLKit)
Created an image tagging game to label integral internal datasets.
(Spring Boot, Vue.js, Typescript)
Led agile rituals such as daily stand-ups, sprint planning meetings,
and retrospectives. Started a machine-learning brownbag lunch
series.
Dr. Vecchio's research group focuses on advanced materials
discovery and their translation to industrial applications.
Used neural networks and boosted tree based algorithms to make
materials science discoveries (Tensorflow/Keras, ResNet, XGBoost,
Sklearn).
Applied gradient-based techniques to explain classifications of
convolutional neural networks (GradCam).
Used OOP design principles to create a framework to track
hyperparameters & ML pipelines in git.
Taught nanoengineering graduate students machine learning, focusing
on neural networks and tree algorithms.
The papers produced as a result of my contributions would go on to
be published in Science, Nature, and respected
journals within the materials science world.
de Sa Lab specializes in machine learning and brain computer
interfaces (BCI).
Developed an In-Ear EEG
prototype aiming to help people with epilepsy. (OpenBCI, Matlab,
SVMs, Ensemble Methods)
Developed BrainTag,
an open source neurofeedback game for children with autism spectrum
disorders. Presented at UCSD's Undergrad Research Conference.
(Arduino, Neurosky, C)
Initiated collaboration between de Sa lab and OpenBCI. We were one of the startupâs
first university partners.
Taught open source BCI workshops to UCSD students through hands-on
workshops. (Python, OpenBCI)
Wrote data visualization and machine learning toolbox for an open
source brain computer interface. (Python, Java, C, JS)
Learning skillful medium-range global weather
forecasting. Remi Lam, Alvaro Sanchez-Gonzalez, Matthew
Willson, Peter Wirnsberger, Meire Fortunato, Ferran Alet, Suman Ravuri,
Timo Ewalds, Zach Eaton-Rosen, Weihua Hu, Alexander
Merose, Stephan Hoyer, George Holland, Oriol Vinyals, Jacklynn
Stott, Alexander Pritzel, Shakir Mohamed, Peter Battaglia.
Science 382 (6677), 1416-1421
Discovery of high-entropy ceramics via machine
learning. Kevin Kaufmann, Daniel Maryanovsky, William M Mellor,
Chaoyi Zhu, Alexander S Rosengarten, Tyler J
Harrington, Corey Oses, Cormac Toher, Stefano Curtarolo, Kenneth S
Vecchio. Npj Computational Materials 6 (1), 42
Crystal symmetry determination in electron diffraction using
machine learning. Kevin Kaufmann, Chaoyi Zhu°,
Alexander S Rosengarten°, Daniel Maryanovsky, Tyler J
Harrington, Eduardo Marin, Kenneth S Vecchio. Science 367
(6477), 564-568
WeatherBench 2: A benchmark for the next generation of
dataâdriven global weather models. Stephan Rasp, Stephan Hoyer,
Alexander Merose, Ian Langmore, Peter Battaglia, Tyler
Russell, Alvaro SanchezâGonzalez, Vivian Yang, Rob Carver, Shreya
Agrawal, Matthew Chantry, Zied Ben Bouallegue, Peter Dueben, Carla
Bromberg, Jared Sisk, Luke Barrington, Aaron Bell, Fei Sha. Journal
of Advances in Modeling Earth Systems 16 (6), e2023MS004019
Deep learning for day forecasts from sparse
observations. Marcin Andrychowicz, Lasse Espeholt, Di Li,
Samier Merchant, Alexander Merose, Fred Zyda, Shreya
Agrawal, Nal Kalchbrenner. arXiv preprint arXiv:2306.06079
Deep neural network enabled space group identification in
EBSD. K Kaufmann, C Zhu, AS Rosengarten, KS
Vecchio. Microscopy and Microanalysis 26 (3), 447-457
°Equal contribution
Technical Skills
Expert
Proficient
Language
Python, Typescript, JS, Kotlin, Java
C++, C, SQL, Wasm
Data
Tensorflow, Xarray, Zarr, NumPy, Dask, Beam, GEE
JAX, PyTorch, Spark, Parquet, BQ, Postgres
Product
GCP, HTML, CSS, Android, REST, Serverless, Docker
AWS, FastAPI, Express, Spring, VueJS
Soft
Leadership, Collaboration, Public Speaking, Empathy