In any organization that depends on continuous batches of data for the purposes of decision-making analytics, it becomes super important to streamline and automate data processing workflows. Larger teams will usually consist of a Data Architect who carefully creates the blueprints of the data infrastructure, a Data Engineer who dives into the code to build out the data infrastructure, a Data Analyst who gathers and assesses the data needs across different functional teams and ensures the reliability of the data, and a Data Scientist who uses the data to create business value through machine learning. For a data science team to work cohesively together, I think it is super important that every person on the team has some knowledge of another data member’s role and functions. I also think this is the best way to really elevate yourself as a team player and become a well-rounded data professional.
In this blog post, I aim to demonstrate how a Data Scientist can expand their data engineering knowledge and skills through creating simple data pipelines using Apache Airflow. In addition to Airflow, this post includes Amazon S3, Snowflake and Slack as part of the technology stack to demonstrate how fruitful a Data Scientist’s toolkit can be. I hope to present how awesome and powerful these tools can be to better your data products and data science projects.