Stephen Golub

Staff/Principal Software Engineer, Leader, and Mentor

Senior engineering leader with 13+ years building scalable systems processing trillions of events daily. Expert in Python, Go, and cloud-native architectures (AWS/GCP). Proven track record migrating legacy systems to modern microservices, achieving 10x performance improvements. Former CrowdStrike and Rackspace engineer specializing in high-throughput data platforms, event-driven architectures, and technical team leadership. Active IT disaster responder with real-world crisis management experience deploying critical infrastructure under pressure.

Technical Skills

Languages
  • Expert: Python, Go/Golang, TypeScript/JavaScript, PHP, SQL
Cloud & Infrastructure
  • Expert: AWS (EC2, S3, Lambda, ECS), GCP (Compute, BigQuery, Pub/Sub), Kubernetes, Docker, Terraform, CI/CD Pipelines
Frameworks & Libraries
  • Expert: React, FastAPI, Flask, Django, Angular, Laravel
Data & Messaging
  • Advanced: Kafka, Redis, PostgreSQL, MongoDB, Elasticsearch, RabbitMQ, Event-Driven Architecture
Monitoring & Observability
  • Advanced: Datadog, NewRelic, Prometheus, Grafana, ELK Stack, Distributed Tracing
Engineering Practices
  • Expert: Microservices Architecture, RESTful APIs, GraphQL, gRPC, Test-Driven Development, Agile/Scrum, Code Review, System Design
Leadership & Special Skills
  • Advanced: Technical Team Leadership, Mentoring & Career Development, Cross-functional Collaboration, Disaster Response IT Operations, Crisis Management, Remote Team Management

Career Experience

OPSWAT, Remote

- Present
Lead Software Engineer
Architected and built high-performance network detection and response platform processing millions of security events per second. Led full-stack development using Python, Go, and React to deliver real-time threat detection capabilities.
  • Designed unified data ontology and microservice framework reducing new service deployment time from weeks to hours
  • Built event-driven architecture using Kafka and gRPC processing 10M+ events/second with sub-second latency
  • Led optimization initiative achieving 75% reduction in detection latency (from 4s to <1s) for critical security events
  • Implemented AI-assisted development workflows increasing team velocity by 40% while maintaining code quality standards
  • Established comprehensive observability using Datadog and Prometheus, reducing MTTR by 60%

Sojern, Remote

-
Manager, Software Engineering
Managed engineering team responsible for customer-facing portal serving 5,000+ enterprise clients. Led technical transformation of reporting infrastructure from legacy systems to modern cloud-native architecture on GCP.
  • Achieved 10x performance improvement in dashboard load times (from 12s to 1.2s) by migrating from iFrame to React-based configuration-driven architecture
  • Led migration to GCP-native solutions (BigQuery, Pub/Sub) reducing infrastructure costs by 35% while improving reliability to 99.9% uptime
  • Implemented comprehensive automated testing increasing code coverage from 20% to 85%, reducing production incidents by 70%
  • Managed team of 8 engineers plus 4 contractors across 3 time zones, delivering 15+ major features on schedule
  • Established Frontend Guild best practices adopted across 5 engineering teams, eliminating duplicate efforts saving 200+ engineering hours/quarter

NexHealth, Remote

-
Senior Platform Engineer
Built critical platform infrastructure processing 100M+ healthcare transactions monthly. Architected unified API layer supporting both legacy and modern microservices, enabling seamless migration without service disruption.
  • Designed backward-compatible API serving 50+ integrations, reducing technical debt by 40% while maintaining 100% uptime during migration
  • Automated documentation generation directly from code, reducing documentation drift by 95% and improving developer onboarding time by 50%
  • Reverse-engineered and optimized 20+ legacy integrations, reducing error rates from 8% to 0.3% and improving throughput by 3x
  • Mentored 5 junior engineers to senior level, with 100% retention and 3 promotions within 18 months
  • Built comprehensive observability stack capturing 2B+ events/day, enabling proactive issue detection and 65% reduction in customer-reported incidents

CrowdStrike, Remote

-
Senior Engineer
Developed core data ingestion platform processing trillions of endpoint security events daily. Key contributor to cloud infrastructure modernization supporting 17,000+ enterprise customers globally.
  • Built data pipeline processing 5T+ events/day with 99.99% reliability using Python, Go, and Kafka on AWS
  • Led migration from AWS ECS to Kubernetes reducing deployment time by 80% and infrastructure costs by 25%
  • Architected multi-cloud marketplace integration framework, accelerating partner onboarding from weeks to hours
  • Optimized critical path algorithms reducing processing latency by 60% and increasing throughput by 4x
  • Active Handshake Ambassador - recruited 12 university interns with 75% conversion to full-time offers

Mannapov, LLC, Boerne, TX

-
PHP Developer
Hired to be one of the new 'more senior' developers to take a company that has been going fast in software to a company that can do that with confidence.
  • Working to make changes to a piece-meal codebase of PHP to be more predictable and reliable.
  • Working with Java worker engines that leverage the ZeroMQ and Majordomo protocols for communciation.

GolubVentures, LLC w/ Rackspace, Inc., San Antonio, TX

-
Owner/Developer
Provided expertise to transition Python projects into maintenance mode, ensuring smooth handover to offshore development teams for ongoing support. Contributed to development of Rapid Provisioning Applications designed to integrate with Account and Data Center systems, enhancing operational efficiency.
  • Collaborated with previous developers to understand code structure and key design decisions, facilitating seamless transition and knowledge transfer for future maintenance.

ThreatQuotient, Remote

-
Manager, Software Development (Promoted from Senior Engineer)
Promoted from Senior Engineer to Manager. Led development of threat intelligence platform processing security data from 100+ sources. Managed team of 6 engineers building integrations and platform SDKs.
  • Architected plugin SDK adopted by 50+ third-party integrators, expanding platform ecosystem by 300%
  • Built configurable Python ETL pipeline processing 1M+ threat indicators daily with 99.8% accuracy
  • Reduced manual QA effort by 80% through comprehensive test automation framework
  • Scaled platform from 10 to 200+ enterprise customers while maintaining sub-second response times

Rackspace Hosting, San Antonio, TX

-
Software Developer II (Hired-on from Contractor)
Built high-performance ticketing system UI serving 10,000+ support agents. Developed using AngularJS, Python Tornado, and OpenStack infrastructure.
  • Created API handling 300K+ tickets with <2 second response time using Python 3.4/Tornado
  • Achieved 99.95% uptime through comprehensive monitoring with Scout, New Relic, and custom metrics
  • Led migration to microservices architecture reducing deployment time from hours to minutes

Additional Experience

Education

Bachelor of Science in Technology Management

Texas A&M University, College Station, TX,

Volunteer

Vice President, Therapy Animals San Antonio

Work with the President and the board to achieve the organization's goals, supporting its growth and success.

Webmaster/IT Support, Therapy Animals San Antonio

Maintain and update the organization's website with current events, teams, and other information. Managed the migration of email services to Google Workspace, saving $200/year, and transitioned two websites (crisisanimalresponse.org and therapyanimalssa.org) from GoDaddy's Managed WordPress to Gatsby Static Sites, saving $168/year for each. Built a tech "Go Bag" for the CARE Operations crew for crisis response situations.

Tech Ops Volunteer, Information Technology Disaster Resource Center

Deploy to disaster areas to support IT needs, such as Internet connectivity, Wi-Fi coverage, and devices. Deployed to multiple locations, including Eastern Kentucky (2022), Southwestern Florida (2022), Texas Panhandle (2023), Oregon (2023), Houston (2024), and North Carolina (2024) to assist with communication and recovery efforts. Contributed to training new teams at Texas A&M's Disaster City in March 2024.

Texas State Coordinator, Information Technology Disaster Resource Center

Develop volunteers and evangelize our organization with locals. Also coordinate with regional and national leadership to ensure Texas's volunteers are engaged and educated.