Professional Experience專業(yè)經(jīng)驗(yàn):
? Minimum 5 years of experience in data engineering and data architecture.
? Proven expertise in open-source Data warehouse and Data Lake solutions and big data tools: Hadoop, Spark, Kafka, Flink, etc.
? Proven expertise and rich experience in cloud-based lake-house solutions and the relative toolkits, i.e. AWS redshift, AWS Lamda, AWS S3, AWS Timestream, AWS kinesis, AWS EMR, AWS Aurora, Azure Databrick, Azure Data Factory, Cosmos DB, etc.
? Proven expertise in ETL design and development and ensuring stability.
? Adept skills of manipulating, processing and extracting data from large, disconnected datasets.
? Advanced knowledge in data architecture design, familiar with various data architecture design patterns.
? Advanced SQL knowledge and experience working with relational and non-relational databases, working familiarity with a variety of databases, e.g. PostgreSQL, MySQL, HBase; MongoDB, Elasticsearch/ELK, etc.
? Expert with object-oriented/script programming languages, e.g. Java, Scala, Python, etc.
? In-depth knowledge of DevOps and methodologies and tools using GIT, CI/CD, docker, etc.
? Good at performance optimization, problem analysis and solving in both on-premises and cloud environments.
? Excellent communication skills in cross-functional collaboration.
Key areas of responsibility 主要工作職責(zé):
Build solution: build and optimize systems for data collection, storage, access, and analytics at scale. Create data pipelines used by data scientists, data-centric applications, and other data consumers.
Maintain solution: Responsible for maintaining sustainable data systems for the organization for easy data search and retrieval.
Implement architecture: Collaborate with cross-functional teams on the central data architect to create that vision, building and maintaining the data systems specified by the data architecture.
Data Governance: Ensure data governance which is defined by central data architect during execution.
Main Job Tasks 主要工作任務(wù):
1) Data enablement and management:
? Identify and map corresponding data objects across various enterprise strategic data sources, including databases within locally established systems.
? Establish and maintain robust data pipelines on the Databricks platform, which serves as the future data core. This includes ETL processes for Site Tianjin-specific data assets, encompassing data ingestion, processing, storage, transformation, and analytics at scale.
? Leverage domain-specific expertise to assemble large, complex data sets within Databricks to meet critical use-case-driven data consumption requirements.
? Perform comprehensive data enablement through OTDHL (an infrastructure that ingests, harmonizes, and delivers data from production sources to the cloud for analysis within Product Supply) and Databricks to support the realization of Site Tianjin-specific data sets.
? Automate manual processes, optimize data delivery systems, and redesign infrastructure to enhance scalability, efficiency, and reliability.
? Design and manage databases for locally developed solutions at Site Tianjin, ensuring robust and scalable architecture.
2) Data Architecture implementation and Performance Optimization:
? Ensure adherence to data governance principles defined by the OTDHL team and align with the data architecture and processes established by the Product Supply (PS) Data Foundation during the execution of data enablement tasks.
? Automate data processes and implement continuous integration practices using DevOps methodologies and tools to enhance data processing efficiency and streamline workflows.
? Conduct performance tuning and optimization of data systems, analyze and resolve bottlenecks, and ensure high performance and stability in data processing operations.
3) Cross Departmental Collaboration:
? Partner with Business Analysts to understand the needs of the Line of Business, ensuring data assets address meaningful business challenges and drive data-driven decision-making processes.
? Actively work with the OTDHL and Product Supply (PS) Data Foundation teams to deliver Site Tianjin-specific data sets. Challenge existing processes and ways of working with a continuous improvement mindset to enhance efficiency and effectiveness.
? Integrate daily challenges, optimization suggestions, and local use-case-driven requirements into the design and implementation of data governance frameworks and the data Lakehouse.
? Efficiently leverage resources and insights from the global network, navigate through the enterprise data marketplace, and assist the MI team in boosting their awareness and adoption of data readiness initiatives.
4) Technical Research and Frontier Exploration:
? Continuously explore and evaluate emerging big data tools and technologies, implementing solutions that align with and address evolving business needs.
? Lead the selection and evaluation of technologies and tools to ensure they effectively support Site Tianjin's data management objectives and broader business requirements.
Education Background: 教育背景
? Bachelor’s degree in Life Sciences, Data Science, Computer Science, or a related field is required. Advanced degree preferred.
? Excellent command of spoken and written English