4-7 years of experience in data engineering and architecture., Proficiency in Pyspark, SQL, and AWS cloud services., Experience with batch and real-time data processing systems., Strong understanding of data storage architectures like Data Warehouse and Data Lake..
Key responsibilities:
Design and develop data pipeline solutions based on requirements.
Provide technical leadership and mentorship to the data engineering team.
Ensure compliance with security and data governance policies.
Collaborate in an Agile environment to deliver optimized data solutions.
Report This Job
Help us maintain the quality of our job listings. If you find any issues with this job post, please let us know.
Select the reason you're reporting this job:
Trinity Life Sciences is a trusted strategic partner, providing evidence-based solutions for the life sciences. With over 25 years of experience, we are committed to solving our clients’ most challenging problems through exceptional levels of service, powerful tools, and data-driven insights.
Trinity’s range of products and services includes industry-leading benchmarking solutions, powered by TGaS. To learn more about how we are elevating life sciences and driving from evidence to action, visit trinitylifesciences.com.
Data Pipeline solutions based on the requirements and incorporating various optimization techniques based on various sources involved and data volume.
Understanding of storage architectures such as Data Warehouse, Data Lake, and Lake houses Deciding tech stack and development standards, proposing tech solutions and architectural patterns and recommending best practices for the big data solution
Providing thought leadership and mentoring to the data engineering team on how data should be stored and processed more efficiently and quickly at scale
Ensure adherence with Security and Compliance policies for the products
Stay up to date with evolving cloud technologies and development best practices including open-source software.
Work in an Agile Environment and provide optimized solutions to the customers and JIRA for project management
Proven problem-solving skills with the ability to anticipate roadblocks, diagnose problems and generate effective solutions
Analyze market segments and customer base to develop market solutions
Experience in working with batch processing / real-time systems using various
Enhance/Support solutions using Pyspark/EMR, SQL and databases, AWS Athena, S3, Redshift, Lambda, AWS Glue, and other Data Engineering technologies.
Proficiency in SQL Writing, SQL Concepts, Data Modelling Techniques, Data validation, Data quality check & Data Engineering Concepts
Proficiency in design, creation, deployment, review and get the final sign off from the client by following the best practices in SDLC of existing and new products.
Experience in technologies like Databricks, HDFS, Redshift, Hadoop, S3, Athena, RDS, Elastic MapReduce on AWS or similar services in GCP/Azure
Scheduling and monitoring of Spark jobs using tools like Airflow, Oozie
Familiar with version control tools like Git, Code Commit, Jenkins, Code Pipeline
Work in a Cross functional team along with other Data Engineers, QA Engineers, and DevOps Engineers.
Develop, test, and implement data solutions based on finalized design documents.
Familiar with Unix/Linux and Shell Scripting
Qualifications
Experience: 4-7 years of experience
Excellent communication and problem-solving skills.
Highly proficient in Project Management principles, methods, techniques, and tools
Minimum 2 to 4 years of working experience in Pyspark, SQL, AWS development
Experience of working as a mentor for junior team members
Hands on experience in ETL process, performance optimization techniques are a must
Candidate should have taken part in Architecture design and discussion
Minimum of 4 years of experience in working with batch processing/ real-time systems
Using various technologies like Databricks, HDFS, Redshift, Hadoop, Elastic MapReduce on AWS, Apache Spark, Hive/Impala and HDFS and NoSQL databases or similar services in Azure or GCP
Minimum of 4 years of experience working in Datawarehouse or Data Lake Projects in a role beyond just Data consumption.
Minimum of 4 years of extensive working knowledge in AWS building scalable solutions. Equivalent level of experience in Azure or Google Cloud is also acceptable
Minimum of 3 years of experience in programming languages (preferably Python)
Experience in Pharma Domain will be a very Big Plus.
Familiar with tools like Git, Code Commit, Jenkins, Code Pipeline
Familiar with Unix/Linux and Shell Scripting
Additional Skills:
Exposure to Pharma and life sciences would be an added advantage.
Certified in any cloud technologies like AWS, GCP, Azure.
Required profile
Experience
Spoken language(s):
English
Check out the description to know which languages are mandatory.