Salary: $200,907 per year
MacroX is building the next generation of AI-powered macroeconomic intelligence. We combine alternative data, machine learning, and large language models to help investors, businesses, and policymakers understand economic trends in real time.
We are looking for a Staff / Lead Data Engineer to architect and scale our end-to-end data platform. You will lead the development of the infrastructure powering our AI, machine learning, and macroeconomic forecasting systems.
This role is ideal for an experienced engineer who enjoys building large-scale data systems, working with cutting-edge AI technologies, and shaping the technical direction of a rapidly growing company.
Design and manage scalable data ingestion, transformation, and consumption pipelines.
Build infrastructure supporting high-frequency economic and financial datasets.
Ensure compatibility with downstream AI, ML, and LLM applications.
Define and implement MLOps standards and best practices.
Develop automated feature stores and model input pipelines.
Build CI/CD workflows for model retraining and deployment.
Collaborate with Data Scientists and ML Engineers to productionize AI models.
Implement AI-powered data validation and anomaly detection systems.
Develop monitoring solutions for schema drift and data quality issues.
Create observability and lineage frameworks across the data stack.
Build intelligent systems that automatically detect and report pipeline issues.
Design and maintain vector database infrastructure.
Develop Retrieval-Augmented Generation (RAG) architectures.
Support internal research workflows and customer-facing AI products.
Recruit, mentor, and develop Data Engineering and ML Infrastructure teams.
Partner with Product, Legal, and Go-to-Market teams on AI and data strategy.
Drive technical excellence and responsible AI practices across the organization.
Master's degree (or foreign equivalent) in Data Science, Computer Science, Engineering, or a related field.
Two (2) years of experience as a Senior Data Engineer or related occupation.
Python
SQL
R
Pandas
NumPy
Spark (PySpark)
Airflow
dbt
scikit-learn
XGBoost
LightGBM
TensorFlow
PyTorch
MLflow
Weights & Biases
PostgreSQL
MySQL
BigQuery
Redshift
MongoDB
Cassandra
Google Cloud Platform (GCP)
Amazon Web Services (AWS)
Git
GitHub
GitLab
Docker
REST APIs
Jupyter Notebook
JupyterLab
Hybrid Schedule:
Four (4) days per week in our San Francisco office
One (1) day per week remote
Interested candidates should submit their resume to: