APPLY NOW

Data Engineer

Remote Tech Position

 

Tech Team

Data Engineer

As a data engineer, you will be working to improve our pure Python ETL pipeline alongside our data team. You will work with large amounts of unstructured (text) data, transform and enrich data, tune functions for speed and accuracy at scale. You will work on socio-economic and firmographics data and collaborate with both Product and Business stakeholders to prioritize new features or fixes that are needed in the data processing pipeline. You will be part of a team which follows agile practices with 3 week Sprints.

Responsibilities

  • Analyze and organize raw data 
  • Combine raw information from different sources
  • Build, improve and maintain data systems and pipelines
  • Build new algorithms and prototypes

See additional requirements below.

Company Description

Steppingblocks brings big data analytics to higher education with rich data and interactive visualizations. In addition to our student and administrative platforms, we also collaborated with the National Science Foundation to build a recruiting platform based on our unique student data. These three platforms use a hub-and-spoke data model to offer different flavors of our data for specific use cases. If you love change, fixing things, uncovering and building new processes, great culture and amazing teamwork….reach out to us today!

Job Description

As a data engineer, you will be working to improve our pure Python ETL pipeline alongside our data team. You will work with large amounts of unstructured (text) data, transform and enrich data, tune functions for speed and accuracy at scale. You will work on socio-economic and firmographics data and collaborate with both Product and Business stakeholders to prioritize new features or fixes that are needed in the data processing pipeline. You will be part of a team which follows agile practices with 3 week Sprints.

Additional Responsibilities

  • Analyze and organize raw data 
  • Combine raw information from different sources
  • Build, improve and maintain data systems and pipelines
  • Write efficient algorithms to run on dataframes and sequences at scale
  • Build new algorithms and prototypes
  • Optimize efficiency of ETL pipeline
  • Explore ways to enhance data quality and reliability
  • Identify opportunities for new data acquisition
  • Develop analytical tools and programs
  • Work with business stakeholders to create new custom reports and new features to ETL
  • Collaborate with data scientists and architects to engineer new features

Qualifications

  • Experienced programmer with strong Python knowledge
  • Mastery of Pandas ecosystem
  • Experience with distributed computation systems (Spark, Dask, etc..)
  • Familiarity with Dask
  • Test Driven development
  • Great numerical and analytical skills
  • Familiarity with the Machine Learning Lifecycle and processes

Preferred Experience

  • Econometrics or Social Sciences
  • Strong mathematical skills and statistical background
  • Knowledge of other programming languages (C, Cython, Numba, Rust)
  • Data engineering certification a plus
  • Web Scraping
  • Backend API knowledge (e.g. fastapi)
  • Familiarity with ElasticSearch (or other document based noSQL)
  • Familiarity with Graph Databases and algorithms
  • Bachelor's Degree, in Computer Science, Engineering or any other scientific related fields or equivalent experience.

Follow us!