Behzad Valipour Sh.

I'm

Science • Engineering • Design

About

I am a Senior Software Engineer at RSE (Research Software Engineering) Team in University of Newcastle. I am leading a team of four RSEs and we work on building scalable geospatial data pipelines and tools to support research in public health, urban planning, and environmental science in Imago (Imagery Smart Data Service) project, part of the Smart Data Research UK programme funded by the ESRC. Our work involves developing reproducible workflows for processing large geospatial datasets in high-performance computer (Comet), integrating diverse data sources, and creating user-friendly interface (Data Catalogue) for data exploration and analysis. We collaborate closely with researchers to understand their needs and deliver tailored solutions that facilitate cutting-edge research.

Profile Picture for Behzad Valipour Sh.

Before joing Newcastle University, I worked as GIS Data Scientist on MAGENTA (Maternal And preGnancy hEalth aNd elevaTed heArt) project at University of Swansea, funded by the Wellcome Trust. I leaded the WP1 (Environmental exposure modelling for Wales and London) , and developed heat-exposure model at 1x1km resolution and managed environmental data pipelines for the UK. My work involved integrating satellite data, weather station data, and land-use information to create high-resolution exposure maps. I collaborated with epidemiologists to analyze the impact of heat exposure on maternal and pregnancy health outcomes, contributing to several publications in high-impact journals.

I completed my PhD in Environmental Epidemiology at Swiss TPH (Swiss Tropical and Public Health Institute), affiliated with the University of Basel, where I worked on the PoCHAS (Effects of Airborne Pollen on Cardiorespiratory Health and Allergic Symptoms) project. My work focused on integrating machine learning with environmental data to develop the first spatio-temporal model for predicting pollen concentrations at a 1 × 1 km scale across Switzerland.


I also have industrial experience working as a Geospatial Data Scientist at CollectiveCrunch oy., where I developed geospatial data processing pipelines and tools for remote sensing applications. My work involved processing large satellite datasets to create national level cloudless mosaics, developing algorithms for forest damage detection. I also worked in Regio OÜ as a GIS specialist, where I developed an automated process to generate internet connections to houses using GIS data, improving the efficiency of the planning process for network expansion projects.

I have a strong background in geospatial data science, with expertise in remote sensing, GIS, and spatial statistics. I am proficient in programming languages such as Python and R, and have experience working with big data technologies such as Hadoop and Spark. I am passionate about using data to solve real-world problems and am always looking for new challenges and opportunities to learn and grow. I design and operate scalable cloud infrastructure on Azure and AWS, including VMs/EC2, Blob storage / S3, managed databases, Kubernetes (AKS/EKS), and serverless services (Azure Functions / Lambda). I build reproducible geospatial data workflows (Prefect/Airflow), containers and CI/CD and integrate these with on‑prem HPC to enable reliable, scalable processing and delivery of research outputs.

Projects

I am passionate about leveraging AI, big data, and cloud technologies to solve complex geoscience challenges and create meaningful real-world impact. I continuously seek opportunities to learn, build, and grow at the intersection of geospatial science, data engineering, and analytics. This page highlights a selection of projects I am involved in, spanning academic research, open-source development, community building, and educational initiatives.



PoCHAS (Pollen, Cardiorespiratory Health and Allergic Symptoms)

Developed the Python package pochas-geoutils to support data engineering and machine learning pipelines, enabling extraction, loading, and transformation of large spatiotemporal satellite (MODIS, Landsat) and meteorological (ERA5) datasets; implemented and evaluated Random Forest, XGBoost, ANN, statistical (Lasso, Ridge, Elastic Net), ensemble, and dispersion models to predict concentrations of multiple allergenic pollen types across Switzerland; and built a GIS-based web application to visualize location-specific, time-series pollen forecasts more.

MAGENTA (Maternal And preGnancy hEalth aNd elevaTed heArt)

Created longitudinal, household-level machine learning models of heat stress exposure for all household locations in Wales and London by integrating earth observation, meteorological, qualitative, and housing data; implemented reproducible analytical pipelines to document exposure data creation; and conducted robust data management and complex statistical analyses, including handling duplicate, missing, and erroneous data while promoting best practices among collaborators more.

Wind Damage Detection for Precision Forest Management

Extracted and processed country-wide radar satellite imagery and designed a CNN to detect wind damage stands from Earth observation images, achieving 91% AUC, which led to the successful launch of the wind damage product, increased sales, and acquisition of new customers more.


Dead Tree Detection for Forestry Management

Developed a machine learning model to detect dead trees from aerial imagery, achieving 99% test accuracy, which led to the successful launch of the bark beetle product, increased sales, and acquisition of new customers more.

Cloudless Satellite Mosaic from Sentinel-2 satellite

Cloudless satellite mosaic is an image processing technique that is used to create a high-resolution satellite image of a specific region without any cloud cover. This technique is achieved by combining multiple satellite images taken at different times to create a single composite image. The process involves selecting the best quality images from a series of satellite images, removing any cloud cover, and then stitching the images together to create a seamless mosaic. I designed the process for Imago, CollectiveCrunch oy. and PoCHAS projects to generate cloudless mosaics for large areas using Sentinel-2 & Landsat satellite imagery.

Scalable Weather Data Pipeline Using Cloud-Native Technologies

Developed an end-to-end weather data pipeline that extracts data from Open-Meteo API, transforms it using Oracle, and loads it into an Aiven-managed PostgreSQL database. Implemented workflow automation with Prefect to ensure reliable, repeatable data pipelines. Integrated CI/CD through GitHub Actions, deployed the FastAPI service on Vercel (weather-api-beta-flax.vercel.app).


-->

Blog

I write to explore and share my work at the intersection of geospatial data science, machine learning, and environmental analytics. Through technical tutorials, project write-ups, and applied research insights, I document practical approaches to spatial data processing, remote sensing, and scalable geospatial workflows. These writings reflect both my professional experience and my broader interest in building open, reproducible, and impact-driven geospatial technologies.

Medium

Latest posts from my Medium. Click any title to open the full article on Medium.

  • Loading posts…

Note: titles are loaded from the Medium RSS feed. If the list does not appear, Medium's cross-origin policies may block direct loading — use the "Visit Medium" link above.

Contact

behzad.valipour-shokouhi@newcastle.ac.uk

b.valipour-shokouhi@liverpool.ac.uk