Overview
The Data Engineer is responsible for designing, building, and maintaining the data infrastructure that supports the organization's data-driven decision-making processes. This role develops ETL processes, optimizes data retrieval performance, and collaborates with stakeholders to gather and understand data requirements.
Key Responsibilities
- Design, develop, and maintain data pipelines for ingestion, transformation, and serving across batch and streaming workloads
- Build ETL/ELT workflows to integrate data from diverse sources into enterprise data platforms
- Develop data transformation logic using Apache Spark, PySpark, SparkSQL, and SQL
- Implement change data capture (CDC) patterns for real-time and near-real-time data synchronization
- Build streaming data pipelines for real-time analytics and operational use cases
- Optimize pipeline performance, resource utilization, and cost efficiency
- Support federated data pipeline architecture that enables Line of Business (LOB) teams to own and manage their domain data
- Contribute to self-serve data infrastructure that abstracts complexity and allows domain teams to build pipelines independently
- Develop standardized pipeline deployment patterns that LOB teams can adopt while maintaining autonomy
- Support domain teams in building data products that are discoverable, interoperable, and compliant with enterprise standards
- Enable distributed data processing across domains while ensuring consistency through federated governance
- Assist in establishing data contracts and interoperability standards that allow seamless data sharing across domains
- Support the balance between domain autonomy and enterprise-wide governance requirements
- Develop reusable pipeline templates and Infrastructure as Code (IaC) patterns for common data product types
- Create blueprints for data ingestion, transformation, quality validation, and serving that LOB teams can customize
- Build standardized patterns for batch pipelines, streaming pipelines, CDC implementations, and API-based integrations
- Contribute to a pattern library covering medallion architecture, dimensional modeling, and data product packaging
- Document best practices and reference architectures that guide LOB teams in building compliant, high-quality pipelines
- Develop starter kits and accelerators that reduce time-to-value for domain teams building new data products
- Create cookbooks and implementation guides that translate enterprise standards into actionable steps
- Support LOB teams in adopting templates while allowing appropriate customization for domain-specific needs
- Integrate data from multiple internal and external sources into unified data assets
- Build reusable data integration patterns and connectors for enterprise data sources
- Implement data ingestion using Auto Loader, COPY INTO, and other ingestion frameworks
- Develop API-based data integrations and file-based data processing workflows
- Ensure data consistency and reliability across integrated sources
- Support data migration efforts and legacy system integrations
- Implement medallion architecture patterns (bronze, silver, gold) for data organization and quality progression
- Develop dimensional models, fact tables, and aggregations for analytics use cases
- Build data transformation logic that ensures accuracy, consistency, and business alignment
- Create reusable transformation components and modular pipeline designs
- Optimize data models for query performance and consumption patterns
- Support schema evolution and data versioning requirements
- Implement data quality checks, validation rules, and automated testing within pipelines
- Develop data profiling and anomaly detection to identify quality issues
- Build data reconciliation processes to ensure accuracy across systems
- Implement unit testing, integration testing, and regression testing for pipelines
- Monitor data quality metrics and remediate issues proactively
- Document data quality rules and thresholds for pipeline outputs
- Implement logging, monitoring, and alerting for pipeline health and performance
- Build dashboards to track pipeline execution, data freshness, and quality metrics
- Develop automated error handling, retry logic, and failure notifications
- Support incident response and troubleshooting for pipeline failures
- Implement data lineage tracking to support auditability and impact analysis
- Ensure pipelines meet SLAs for data availability and freshness
- Build data pipelines that enable analytics, reporting, and business intelligence use cases
- Prepare and serve data for machine learning and AI workloads
- Develop feature engineering pipelines for ML model development
- Create semantic layers and curated datasets that enable self-service analytics
- Support integration with analytics tools including Power BI and Tableau
- Build data products with clear documentation and consumption guidance
- Partner with data architects to align pipeline development with architectural standards
- Collaborate with business analysts and data scientists to understand data requirements
- Work with platform engineers to leverage platform capabilities effectively
- Contribute to technical documentation, runbooks, and knowledge sharing
- Support data consumers in understanding and accessing data assets
- Participate in code reviews and follow engineering best practices
- Support data engineering delivery with contractor and consultant teams under guidance from senior team members
- Contribute to knowledge-sharing sessions and workshops to build data engineering capability across LOB teams
- Document best practices, lessons learned, and technical standards for data engineering
- Stay current with industry trends in data mesh, federated architectures, and cloud data services
- Share insights and learnings with the broader team to foster continuous improvement
- Assist in evaluating emerging data engineering technologies, frameworks, and tools
- Identify opportunities to enhance pipeline performance, reliability, and cost efficiency
- Contribute to the evolution of best practices and standards for data engineering
- Propose automation opportunities to reduce manual effort and improve consistency
- Other duties as assigned
Required Experience
- Typically requires a master's degree with 5 years of experience or a bachelor’s degree with a minimum of 7 years of relevant experience, or equivalent combination of education and experience.
- Data Modeling – Skilled
- Data Structure and Algorithms – Skilled
- Business Intelligence – Skilled
- DevOps – Advanced
- Data Integration – Skilled
- Business Acumen – Skilled
- Product Development Life Cycle – Skilled
- Influencing Others – Skilled
- Scaled Agile Framework (SAFe) – Advanced
- Data Lake Architecture – Advanced
- Databricks – Skilled
- Data Engineering – Advanced
- DevOps – Advanced
- Workflow Management – Advanced