Data Engineer

World Bank Group - WBG

Staff Closes 23 Mar 2026 6 days left

Overview

The Data Engineer is responsible for designing, building, and maintaining the data infrastructure that supports the organization's data-driven decision-making processes. This role develops ETL processes, optimizes data retrieval performance, and collaborates with stakeholders to gather and understand data requirements.


Key Responsibilities
  • Design, develop, and maintain data pipelines for ingestion, transformation, and serving across batch and streaming workloads
  • Build ETL/ELT workflows to integrate data from diverse sources into enterprise data platforms
  • Develop data transformation logic using Apache Spark, PySpark, SparkSQL, and SQL
  • Implement change data capture (CDC) patterns for real-time and near-real-time data synchronization
  • Build streaming data pipelines for real-time analytics and operational use cases
  • Optimize pipeline performance, resource utilization, and cost efficiency
  • Support federated data pipeline architecture that enables Line of Business (LOB) teams to own and manage their domain data
  • Contribute to self-serve data infrastructure that abstracts complexity and allows domain teams to build pipelines independently
  • Develop standardized pipeline deployment patterns that LOB teams can adopt while maintaining autonomy
  • Support domain teams in building data products that are discoverable, interoperable, and compliant with enterprise standards
  • Enable distributed data processing across domains while ensuring consistency through federated governance
  • Assist in establishing data contracts and interoperability standards that allow seamless data sharing across domains
  • Support the balance between domain autonomy and enterprise-wide governance requirements
  • Develop reusable pipeline templates and Infrastructure as Code (IaC) patterns for common data product types
  • Create blueprints for data ingestion, transformation, quality validation, and serving that LOB teams can customize
  • Build standardized patterns for batch pipelines, streaming pipelines, CDC implementations, and API-based integrations
  • Contribute to a pattern library covering medallion architecture, dimensional modeling, and data product packaging
  • Document best practices and reference architectures that guide LOB teams in building compliant, high-quality pipelines
  • Develop starter kits and accelerators that reduce time-to-value for domain teams building new data products
  • Create cookbooks and implementation guides that translate enterprise standards into actionable steps
  • Support LOB teams in adopting templates while allowing appropriate customization for domain-specific needs
  • Integrate data from multiple internal and external sources into unified data assets
  • Build reusable data integration patterns and connectors for enterprise data sources
  • Implement data ingestion using Auto Loader, COPY INTO, and other ingestion frameworks
  • Develop API-based data integrations and file-based data processing workflows
  • Ensure data consistency and reliability across integrated sources
  • Support data migration efforts and legacy system integrations
  • Implement medallion architecture patterns (bronze, silver, gold) for data organization and quality progression
  • Develop dimensional models, fact tables, and aggregations for analytics use cases
  • Build data transformation logic that ensures accuracy, consistency, and business alignment
  • Create reusable transformation components and modular pipeline designs
  • Optimize data models for query performance and consumption patterns
  • Support schema evolution and data versioning requirements
  • Implement data quality checks, validation rules, and automated testing within pipelines
  • Develop data profiling and anomaly detection to identify quality issues
  • Build data reconciliation processes to ensure accuracy across systems
  • Implement unit testing, integration testing, and regression testing for pipelines
  • Monitor data quality metrics and remediate issues proactively
  • Document data quality rules and thresholds for pipeline outputs
  • Implement logging, monitoring, and alerting for pipeline health and performance
  • Build dashboards to track pipeline execution, data freshness, and quality metrics
  • Develop automated error handling, retry logic, and failure notifications
  • Support incident response and troubleshooting for pipeline failures
  • Implement data lineage tracking to support auditability and impact analysis
  • Ensure pipelines meet SLAs for data availability and freshness
  • Build data pipelines that enable analytics, reporting, and business intelligence use cases
  • Prepare and serve data for machine learning and AI workloads
  • Develop feature engineering pipelines for ML model development
  • Create semantic layers and curated datasets that enable self-service analytics
  • Support integration with analytics tools including Power BI and Tableau
  • Build data products with clear documentation and consumption guidance
  • Partner with data architects to align pipeline development with architectural standards
  • Collaborate with business analysts and data scientists to understand data requirements
  • Work with platform engineers to leverage platform capabilities effectively
  • Contribute to technical documentation, runbooks, and knowledge sharing
  • Support data consumers in understanding and accessing data assets
  • Participate in code reviews and follow engineering best practices
  • Support data engineering delivery with contractor and consultant teams under guidance from senior team members
  • Contribute to knowledge-sharing sessions and workshops to build data engineering capability across LOB teams
  • Document best practices, lessons learned, and technical standards for data engineering
  • Stay current with industry trends in data mesh, federated architectures, and cloud data services
  • Share insights and learnings with the broader team to foster continuous improvement
  • Assist in evaluating emerging data engineering technologies, frameworks, and tools
  • Identify opportunities to enhance pipeline performance, reliability, and cost efficiency
  • Contribute to the evolution of best practices and standards for data engineering
  • Propose automation opportunities to reduce manual effort and improve consistency
  • Other duties as assigned
Required Experience
  • Typically requires a master's degree with 5 years of experience or a bachelor’s degree with a minimum of 7 years of relevant experience, or equivalent combination of education and experience.
  • Data Modeling – Skilled
  • Data Structure and Algorithms – Skilled
  • Business Intelligence – Skilled
  • DevOps – Advanced
  • Data Integration – Skilled
  • Business Acumen – Skilled
  • Product Development Life Cycle – Skilled
  • Influencing Others – Skilled
  • Scaled Agile Framework (SAFe) – Advanced
  • Data Lake Architecture – Advanced
  • Databricks – Skilled
  • Data Engineering – Advanced
  • DevOps – Advanced
  • Workflow Management – Advanced
Other Details
Languages Required
• English
Languages Preferred
Not specified
Contract Duration
4 years 0 months
Work Modality
Not specified
Remuneration
Not specified
Apply

Similar Opportunities

INGO.WORK: