Welcome back to OCN! I this time I chat with CEO of Raygun, JD Trask. One of the cool parts of this podcast is meeting people from all over the world who have had some experience on-call, JD does his thing in New Zealand!
John-Daniel is the CEO and co-founder of Raygun.com, an application monitoring company that helps teams identify hidden performance bottlenecks and software bugs. With over 25 years of experience in software development, JD is a programmer at heart with unique insights into scaling software businesses and software team leadership, and he has a deep understanding of building healthy software that gives great customer experiences. He is known to enjoy a glass of whiskey now and then.
Welcome back to another podcast about downtime! Once again we meet with another technologist who's building a new product and getting it out to the world. This time we meet Damian of Auth0 who's been working with his team to ensure identity services.
Damian is an Software Engineer that loves to solve hard problems of any type, especially those related to making software and teams scale. He is a Director of Engineering at Auth0 helping make identity simple for developers. Before Auth0, Damian spent many years working for and at Microsoft on Azure, Media and patterns & practices related initiatives. He spends his spare time with family, friends, exercising and catching up on all things NBA.
Content Warning: This episode does contain some graphic description of the work done by an EMT - if you find this troubling you may want to check out another episode!
On this episode, I speak with the CTO and founder of VividCortex on his life down on the farm and as an EMT. Baron gives us some insight into how that prepared him for his time on-call in different roles to ensure databases are fast and reliable.
Baron is the CTO and founder of VividCortex, the best way to see what your production database servers are doing. Baron has written a lot of open source software, and several books including High Performance MySQL. He’s focused his career on learning and teaching about scalability, performance, and observability of systems generally (including the view that teams are systems and culture influences their performance), and databases specifically.
On this edition, Sam shares with me some scary moments from his time at DigitalOcean. Sam tells the tale of a database table that was dropped.
Sam Phippen is a Developer Advocate at Google, and previously an Engineering Manager at DigitalOcean. He's seen his fair share of deep, complex, incidents. He has strong opinions about incident management, postmortem culture, and on call practises. He's sad that he can't hug every cat.
In this episode, Jay and J. Paul Reed discuss the need for on-call practices and incident response in the world of software release engineering. Paul shares some great stories, including how the World Series can depend on a single line of code.
J. Paul Reed has over twenty years experience in the trenches as a build/release engineer, working with such companies as VMware, Mozilla, Postbox, Symantec, and Salesforce.
In 2012, he founded Release Engineering Approaches, a consultancy incorporating a host of tools and techniques to help organizations "Simply Ship. Every time." He's worked across a number of industries, from financial services to cloud-based infrastructure to health care, with teams ranging from 2 to 2,500 on everything from tooling, operational analysis and improvement, cultural transformation, and business value optimization.
He speaks internationally on release engineering, DevOps, operational complexity, and human factors and holds a Masters of Science candidate in Human Factors & Systems Safety at Lund University.
Infrastructure Week, Episode 2!
Charity and Jay sit down for a discussion on her career and a deep dive into a database incident. You'll get some interesting thoughts on how monitoring has changed in operations.
Charity is cofounder and CEO of Honeycomb.io, a startup aimed at debugging complex systems. (“It’s like strace for systems!”) Previously, Charity ran infrastructure at Parse and was an engineering manager at Facebook. She also worked with the RocksDB team to build and deploy the world’s first Mongo + Rocks in production. She likes single malt scotch.
Does this VM bring me joy?
Melissa is Product Strategy Technologist at Veeam and an information technology infrastructure enthusiast, with a focus on virtualization, security, and emerging technologies. Melissa is a VMware Certified Design Expert (VCDX #236), and has held roles such as VMware Engineer, Systems Engineer, Solutions Architect, and Technical Marketing Engineer prior to joining Veeam. You can find Melissa on twitter @vMiss33 or at her blog https://vMiss.net.
Jamesha "Jam" Fisher is an infrastructure engineer at Splice. Jamesha has worked in the tech industry for over 15(!) years, with a special interest in security. Graduating with a degree in information assurance and security engineering, they lent their experience to operations and systems engineering at companies like Google and GitHub. In their spare time, Jamesha queers it up, along with being a maker of things musical or delicious and objects that use binary numbers.
Ride The On-Call Lightning with Adam Jacob
Adam Jacob is a Board Member, CTO and founder of Chef. Adam joins us this week to discuss his world as an on-call engineer. Find out what happens when they call in the "Mr. Wolf" of Oracle on a private jet to get the database back online. Learn about Adam's passion for Open Source while we interject our mutual interest in heavy metal.
Fear, Chaos and Pain
Common subjects in the Christopher Nolan Batman films, especially when the Joker appears. How do we avoid the moments of fear, chaos and pain in real time? By preparing for it. Today we talk with Gremlin Inc founder and CEO Kolton Andrus.
Kolton is co-founder and CEO of Gremlin. Previously, he was a Chaos Engineer at Netflix improving streaming reliability and operating the Edge services. He designed and built F.I.T., Netflix's failure injection service. Prior he improved the performance and reliability of the Amazon Retail website. At both companies he has served as a 'Call Leader', managing the resolution of company-wide incidents.