Data Engineer

World Bank Group - WBG

Staff Closes 03 Jul 2026 10 days left

Overview

The Data Engineer is responsible for designing, building, and maintaining the data infrastructure that supports the organization's data-driven decision-making processes. This role develops ETL processes, optimizes data retrieval performance, and collaborates with stakeholders to gather and understand data requirements.


Key Responsibilities
  • Design, develop, and maintain data pipelines for ingestion, transformation, and serving across batch and streaming workloads
  • Build ETL/ELT workflows to integrate data from diverse sources into enterprise data platforms
  • Develop data transformation logic using Apache Spark, PySpark, SparkSQL, and SQL
  • Implement change data capture (CDC) patterns for real-time and near-real-time data synchronization
  • Build streaming data pipelines for real-time analytics and operational use cases
  • Optimize pipeline performance, resource utilization, and cost efficiency
  • Support federated data pipeline architecture that enables Line of Business (LOB) teams to own and manage their domain data
  • Contribute to self-serve data infrastructure that abstracts complexity and allows domain teams to build pipelines independently
  • Develop standardized pipeline deployment patterns that LOB teams can adopt while maintaining autonomy
  • Support domain teams in building data products that are discoverable, interoperable, and compliant with enterprise standards
  • Enable distributed data processing across domains while ensuring consistency through federated governance
  • Assist in establishing data contracts and interoperability standards that allow seamless data sharing across domains
  • Support the balance between domain autonomy and enterprise-wide governance requirements
  • Develop reusable pipeline templates and Infrastructure as Code (IaC) patterns for common data product types
  • Create blueprints for data ingestion, transformation, quality validation, and serving that LOB teams can customize
  • Build standardized patterns for batch pipelines, streaming pipelines, CDC implementations, and API-based integrations
  • Contribute to a pattern library covering medallion architecture, dimensional modeling, and data product packaging
  • Document best practices and reference architectures that guide LOB teams in building compliant, high-quality pipelines
  • Develop starter kits and accelerators that reduce time-to-value for domain teams building new data products
  • Create cookbooks and implementation guides that translate enterprise standards into actionable steps
  • Support LOB teams in adopting templates while allowing appropriate customization for domain-specific needs
  • Integrate data from multiple internal and external sources into unified data assets
  • Build reusable data integration patterns and connectors for enterprise data sources
  • Implement data ingestion using Auto Loader, COPY INTO, and other ingestion frameworks
  • Develop API-based data integrations and file-based data processing workflows
  • Ensure data consistency and reliability across integrated sources
  • Support data migration efforts and legacy system integrations
  • Implement medallion architecture patterns (bronze, silver, gold) for data organization and quality progression
  • Develop dimensional models, fact tables, and aggregations for analytics use cases
  • Build data transformation logic that ensures accuracy, consistency, and business alignment
  • Create reusable transformation components and modular pipeline designs
  • Optimize data models for query performance and consumption patterns
  • Support schema evolution and data versioning requirements
  • Implement data quality checks, validation rules, and automated testing within pipelines
  • Develop data profiling and anomaly detection to identify quality issues
  • Build data reconciliation processes to ensure accuracy across systems
  • Implement unit testing, integration testing, and regression testing for pipelines
  • Monitor data quality metrics and remediate issues proactively
  • Document data quality rules and thresholds for pipeline outputs
  • Implement logging, monitoring, and alerting for pipeline health and performance
  • Build dashboards to track pipeline execution, data freshness, and quality metrics
  • Develop automated error handling, retry logic, and failure notifications
  • Support incident response and troubleshooting for pipeline failures
  • Implement data lineage tracking to support auditability and impact analysis
  • Ensure pipelines meet SLAs for data availability and freshness
  • Build data pipelines that enable analytics, reporting, and business intelligence use cases
  • Prepare and serve data for machine learning and AI workloads
  • Develop feature engineering pipelines for ML model development
  • Create semantic layers and curated datasets that enable self-service analytics
  • Support integration with analytics tools including Power BI and Tableau
  • Build data products with clear documentation and consumption guidance
  • Partner with data architects to align pipeline development with architectural standards
  • Collaborate with business analysts and data scientists to understand data requirements
  • Work with platform engineers to leverage platform capabilities effectively
  • Contribute to technical documentation, runbooks, and knowledge sharing
  • Support data consumers in understanding and accessing data assets
  • Participate in code reviews and follow engineering best practices
  • Support data engineering delivery with contractor and consultant teams under guidance from senior team members
  • Contribute to knowledge-sharing sessions and workshops to build data engineering capability across LOB teams
  • Document best practices, lessons learned, and technical standards for data engineering
  • Stay current with industry trends in data mesh, federated architectures, and cloud data services
  • Share insights and learnings with the broader team to foster continuous improvement
  • Assist in evaluating emerging data engineering technologies, frameworks, and tools
  • Identify opportunities to enhance pipeline performance, reliability, and cost efficiency
  • Contribute to the evolution of best practices and standards for data engineering
  • Propose automation opportunities to reduce manual effort and improve consistency
Required Experience
  • Typically requires a master's degree with 5 years of experience or a bachelor’s degree with a minimum of 7 years of relevant experience, or equivalent combination of education and experience.
  • Demonstrated expertise in Data Engineering, including the design, development, and optimization of scalable data pipelines, data platforms, and data processing solutions.
  • Strong knowledge of data modeling, data structures and algorithms, and data integration techniques to support efficient and reliable data management.
  • Advanced experience designing and implementing modern data lake architectures and leveraging Databricks to build and maintain data engineering solutions.
  • Proven experience applying DevOps principles and practices, including automation, deployment, monitoring, and continuous improvement of data products and platforms.
  • Strong understanding of workflow management and orchestration tools to support complex data processing and integration workflows.
  • Experience managing and supporting the Product Development Life Cycle (PDLC), from requirements gathering and solution design through deployment and operational support.
  • Demonstrated ability to leverage business intelligence concepts and tools to deliver actionable insights and support data-driven decision-making.
  • Strong business acumen with the ability to understand organizational priorities and translate business requirements into effective technical solutions.
  • Experience working within Agile environments, including the Scaled Agile Framework (SAFe), and collaborating effectively across cross-functional teams.
Qualifications
  • SAFe Product Owner/Product Manager (PO/PM) certification or other relevant Agile certifications.
  • Industry-recognized certifications in Data Engineering, Data Analytics, Platform Architecture, Data Integration, Cloud Technologies, or related disciplines.
Other Details
Languages Required
English
Languages Preferred
Not specified
Contract Duration
3 years 0 months
Work Modality
Not specified
Remuneration
Not specified
Apply

Similar Opportunities

INGO.WORK: