Berkeley Lab’s (LBNL) Environmental Genomics and Systems Biology (EGSB) Division is looking for a Data Engineer to join the US Department of Energy’s (DOE) Systems Biology Knowledgebase (KBase) team!
In this exciting role, you will develop the Central Data Model (CDM), a groundbreaking initiative set to redefine the approach to biology-focused data management. You will be responsible for the design, management, and optimization of large-scale data pipelines, ensuring the robustness and scalability of these systems. Collaborating with Scientists, Data Scientists, Analysts, and Engineers, you will drive efficiency and reliability across data operations, making a significant impact on the scientific capabilities and overall mission.
This position has an anticipated start date of March 3, 2025.
What You Will Do:
- Develop and maintain scalable ETL processes that support data integration and processing from multiple sources.
- Utilize Apache Spark to process large datasets, ensuring data is clean, well-organized, and easily accessible.
- Implement and manage data pipelines for both batch and stream processing using tools like Apache Spark, Kafka, and other relevant technologies.
- Develop and optimize data architectures within a Data Lakehouse environment, ensuring seamless data storage and retrieval.
- Monitor, troubleshoot, and resolve issues of diverse scope within data pipelines and the broader data infrastructure.
- Ensure data quality and integrity across the entire data lifecycle.
In addition to the above, the Data Engineer 3 will also:
- Contribute to the decision-making, setting of technical direction, and establishing best practices that ensure scalability, reliability, and efficiency within the data ecosystem.
- Design, develop, and maintain scalable ETL processes that support data integration and processing from multiple sources.
- Play a critical role in the design and architecture of the data platform subsystems and pipelines.
- Collaborate with Scientists, Analysts, and subject matter experts (SMEs) to understand their data requirements and deliver solutions that enhance data accessibility and performance.
- Provide strategic insights that enhance the overall quality and functionality of the data platform.
- Develop and maintain Agile development lifecycle ceremonies to ensure measurable progress.
- Stay informed about the latest industry trends and best practices in Big Data processing and Data Engineering.
What is Required:
- A Bachelor’s Degree (or equivalent knowledge/training) in Computer Science, Engineering, or a related field and a minimum of 5 years of relevant work experience in Data Engineering or an equivalent combination of education and experience.
- Experience working with Extract, Transforming, and Loading (ETL) processes.
- Experience with both relational and NoSQL databases.
- Demonstrated proficiency with Python, Git, and/or other version control technologies.
- Demonstrated proficiency with Apache Spark and its ecosystem.
- Excellent oral and written communication skills including experience organizing and presenting information to technical and non-technical audiences.
- Strong analytical skills including experience identifying and solving complex technical problems.
- Demonstrated interpersonal skills including experience collaborating with diverse teams of scientific, operations, and technical staff.
Additional Qualifications for the Data Engineer 3:
- A Bachelor’s Degree (or equivalent knowledge/training) in Computer Science, Engineering, or a related field and a minimum of 8 years of relevant work experience in Data Engineering or an equivalent combination of education and experience.
- Experience with Big Data processing technologies, including Hadoop, Kafka, and real-time data processing using Spark Structured Streaming.
- Hands-on experience with ETL processes, including the design and implementation of data pipelines.
- Experience with PySpark, Spark SQL, and object storage solutions such as MinIO and AWS S3 API.
Desired Qualifications:
- Familiarity with Data Lakehouse architectures and related technologies such as Delta Lake, Apache Hudi, or Iceberg.
- Deep knowledge of troubleshooting and tuning Spark applications.
- Demonstrated ability to translate technical requirements into functional code.
- Familiarity with bioinformatics and biological data.
Notes:
- Application Deadline: For full consideration, please apply with a resume and cover letter describing your interest in this position by January 6, 2025.
- Appointment Type: This is a full time, exempt from overtime pay (monthly paid), 2 year (benefits eligible), Term appointment with the possibility of extension or conversion to Career appointment based upon satisfactory job performance, continuing availability of funds, and ongoing operational needs.
- Salary Information: It is not typical for an individual to be offered a salary at or near the top of the range for a position. Salary will be commensurate with the final candidate’s qualification and experience, including skills, knowledge, relevant education, certifications, and aligned with the internal peer group.
- Level 2: This position is expected to pay $109,152 - $136,428 per year for job code C70.2.
- Level 3: This position is expected to pay $129,948 - $162,432 per year for job code C70.3.
- As a condition of employment, the finalist will be required to disclose if they are subject to any final administrative or judicial decisions within the last seven years determining that they committed any misconduct, are currently being investigated for misconduct, left a position during an investigation for alleged misconduct, or have filed an appeal with a previous employer.
- Background Check: This position may be subject to a background check. Any convictions will be evaluated to determine if they directly relate to the responsibilities and requirements of the position. Having a conviction history will not automatically disqualify an applicant from being considered for employment.
- Work Modality: This position is eligible for onsite, hybrid, or remote work. Remote workers are defined as individuals that reside within the United States, but 150 miles away from Berkeley Lab. Work schedules are dependent on business needs and may be required to be performed during traditional business hours within pacific standard time (PST). There may be an expectation to intermittently conduct work, attend meetings, and train on site at Lawrence Berkeley National Lab located at 1 Cyclotron Road, Berkeley, CA 94720.
- Relocation Assistance: This position is not eligible for relocation assistance.
- Eligibility: This position is not eligible for visa sponsorship now or in the future (e.g., H-1B, TN, STEM OPT, etc.). You must be legally authorized to work in the United States to be considered for this position.
Learn About Us:
Berkeley Lab (LBNL) addresses the world’s most urgent scientific challenges by advancing sustainable energy, protecting human health, creating new materials, and revealing the origin and fate of the universe. Founded in 1931, Berkeley Lab’s scientific expertise has been recognized with 16 Nobel prizes. The University of California manages Berkeley Lab for the U.S. Department of Energy’s Office of Science.
Working at Berkeley Lab has many rewards including a competitive compensation program, excellent health and welfare programs, a retirement program that is second to none, and outstanding development opportunities. To view information about the many rewards that are offered at Berkeley Lab- .
Berkeley Lab is committed to Inclusion, Diversity, Equity and Accountability (IDEA) and strives to continue building community with these shared values and commitments.
Berkeley Lab is an Equal Opportunity and Affirmative Action Employer. We heartily welcome applications from women, minorities, veterans, and all who would contribute to the Lab’s mission of leading scientific discovery, inclusion, and professionalism. In support of our diverse global community, all qualified applicants will be considered for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, or protected veteran status.
Equal Opportunity and IDEA Information Links:
Know your rights, "Equal Employment Opportunity is the Law" and the Pay Transparency Nondiscrimination Provision under 41 CFR 60-1.4.