Data Engineering

Geospatial ML Pipelines

Scalable workflows for geospatial imagery and LiDAR, from preprocessing through inference and deployment.

Geospatial processing background

Many machine-learning projects fail not because of the model, but because the data pipeline is brittle. A core part of my work is building geospatial ML workflows that are reliable at scale, especially for imagery and LiDAR tasks where preprocessing, normalization, and data movement can dominate the overall system design.

My stack in this area includes tools such as PDAL, GDAL/OGR, GeoPandas, Rasterio, PostGIS, and Earth Engine, combined with Python services, Docker packaging, and cloud execution. In practice, that means designing tiling workflows, normalization stages, feature engineering steps, coordinate-system handling, and inference-ready data products that can support both research and production use cases.

This work also connects directly to product engineering. In both internal systems and public-facing tools, I have built pipelines that support large geospatial inputs, automated job execution, monitoring, reproducibility, and downstream integration with visualization or ML services. The technical value is not just correctness, but the ability to operationalize geospatial computation in a form that teams can actually use.

This case study represents the broader systems layer behind my portfolio: the engineering discipline required to move from models and datasets to dependable geospatial ML products.