Data Engineer

World Bank Group - WBG

New Delhi, India

Chennai, India

Staff Closes 23 Mar 2026 6 days left

Apply

Overview

The Data Engineer is responsible for designing, building, and maintaining the data infrastructure that supports the organization's data-driven decision-making processes. This role develops ETL processes, optimizes data retrieval performance, and collaborates with stakeholders to gather and understand data requirements.

Key Responsibilities

Design, develop, and maintain data pipelines for ingestion, transformation, and serving across batch and streaming workloads
Build ETL/ELT workflows to integrate data from diverse sources into enterprise data platforms
Develop data transformation logic using Apache Spark, PySpark, SparkSQL, and SQL
Implement change data capture (CDC) patterns for real-time and near-real-time data synchronization
Build streaming data pipelines for real-time analytics and operational use cases
Optimize pipeline performance, resource utilization, and cost efficiency
Support federated data pipeline architecture that enables Line of Business (LOB) teams to own and manage their domain data
Contribute to self-serve data infrastructure that abstracts complexity and allows domain teams to build pipelines independently
Develop standardized pipeline deployment patterns that LOB teams can adopt while maintaining autonomy
Support domain teams in building data products that are discoverable, interoperable, and compliant with enterprise standards
Enable distributed data processing across domains while ensuring consistency through federated governance
Assist in establishing data contracts and interoperability standards that allow seamless data sharing across domains
Support the balance between domain autonomy and enterprise-wide governance requirements
Develop reusable pipeline templates and Infrastructure as Code (IaC) patterns for common data product types
Create blueprints for data ingestion, transformation, quality validation, and serving that LOB teams can customize
Build standardized patterns for batch pipelines, streaming pipelines, CDC implementations, and API-based integrations
Contribute to a pattern library covering medallion architecture, dimensional modeling, and data product packaging
Document best practices and reference architectures that guide LOB teams in building compliant, high-quality pipelines
Develop starter kits and accelerators that reduce time-to-value for domain teams building new data products
Create cookbooks and implementation guides that translate enterprise standards into actionable steps
Support LOB teams in adopting templates while allowing appropriate customization for domain-specific needs
Integrate data from multiple internal and external sources into unified data assets
Build reusable data integration patterns and connectors for enterprise data sources
Implement data ingestion using Auto Loader, COPY INTO, and other ingestion frameworks
Develop API-based data integrations and file-based data processing workflows
Ensure data consistency and reliability across integrated sources
Support data migration efforts and legacy system integrations
Implement medallion architecture patterns (bronze, silver, gold) for data organization and quality progression
Develop dimensional models, fact tables, and aggregations for analytics use cases
Build data transformation logic that ensures accuracy, consistency, and business alignment
Create reusable transformation components and modular pipeline designs
Optimize data models for query performance and consumption patterns
Support schema evolution and data versioning requirements
Implement data quality checks, validation rules, and automated testing within pipelines
Develop data profiling and anomaly detection to identify quality issues
Build data reconciliation processes to ensure accuracy across systems
Implement unit testing, integration testing, and regression testing for pipelines
Monitor data quality metrics and remediate issues proactively
Document data quality rules and thresholds for pipeline outputs
Implement logging, monitoring, and alerting for pipeline health and performance
Build dashboards to track pipeline execution, data freshness, and quality metrics
Develop automated error handling, retry logic, and failure notifications
Support incident response and troubleshooting for pipeline failures
Implement data lineage tracking to support auditability and impact analysis
Ensure pipelines meet SLAs for data availability and freshness
Build data pipelines that enable analytics, reporting, and business intelligence use cases
Prepare and serve data for machine learning and AI workloads
Develop feature engineering pipelines for ML model development
Create semantic layers and curated datasets that enable self-service analytics
Support integration with analytics tools including Power BI and Tableau
Build data products with clear documentation and consumption guidance
Partner with data architects to align pipeline development with architectural standards
Collaborate with business analysts and data scientists to understand data requirements
Work with platform engineers to leverage platform capabilities effectively
Contribute to technical documentation, runbooks, and knowledge sharing
Support data consumers in understanding and accessing data assets
Participate in code reviews and follow engineering best practices
Support data engineering delivery with contractor and consultant teams under guidance from senior team members
Contribute to knowledge-sharing sessions and workshops to build data engineering capability across LOB teams
Document best practices, lessons learned, and technical standards for data engineering
Stay current with industry trends in data mesh, federated architectures, and cloud data services
Share insights and learnings with the broader team to foster continuous improvement
Assist in evaluating emerging data engineering technologies, frameworks, and tools
Identify opportunities to enhance pipeline performance, reliability, and cost efficiency
Contribute to the evolution of best practices and standards for data engineering
Propose automation opportunities to reduce manual effort and improve consistency
Other duties as assigned

Required Experience

Typically requires a master's degree with 5 years of experience or a bachelor’s degree with a minimum of 7 years of relevant experience, or equivalent combination of education and experience.
Data Modeling – Skilled
Data Structure and Algorithms – Skilled
Business Intelligence – Skilled
DevOps – Advanced
Data Integration – Skilled
Business Acumen – Skilled
Product Development Life Cycle – Skilled
Influencing Others – Skilled
Scaled Agile Framework (SAFe) – Advanced
Data Lake Architecture – Advanced
Databricks – Skilled
Data Engineering – Advanced
DevOps – Advanced
Workflow Management – Advanced

Other Details

Languages Required

• English

Languages Preferred

Not specified

Contract Duration

4 years 0 months

Work Modality

Not specified

Remuneration

Not specified

Data Engineer

Overview

Key Responsibilities

Required Experience

Other Details

Similar Opportunities