DevOps Engineer, Observability Tools - Remote

Tucows has been working on the Internet since the days when people unironically called it the Information Superhighway.

Today, we’re the second-largest domain wholesaler in the world with tens of millions of domains under management (OpenSRS / Enom). We’re doing all kinds of interesting things, including running an MVNO cell phone service (Ting Mobile) and building true fiber to the premises networks in towns and cities across the US (Ting Internet). We offer individual and small business domains and integration with various popular platforms (Hover/Ascio).

We’re a team of over 600 people serving tens of millions of customers around the world. Our growth has been incredible, smart and measured (NASDAQ: TCX, TSX: TC). Our success is built on a solid technical and financial foundation.

About the team

Observability is crucial to maintaining complex systems. Whether its logs, application performance monitoring, health and readiness checks, you name it, without that data complex systems become a lot more difficult to manage. Equally important to the data is the ability to represent that data in useful forms.

The mission of the observability team is to create and run the platform that enables all teams, across the company to observe the health, performance, and trends of our systems

On this team, you'll gain experience working with enterprise-level applications and the challenges with dealing with high volume of logs, near real-time metrics.

Duties and Responsibilities

  • Define standards for message/log formats and tools that will be used throughout the company
  • Automate the deployment of the tools to our on-prem OpenStack cloud
  • Test and optimize the performance of those tools for large datasets and high volumes of messages

Qualifications and experience

  • Strong Python coding skills - 3+ years proven experience
  • Shell scripting (bash)
  • Experience with Terraform and Saltstack and the ability to extend these frameworks with new modules.
  • Experience with Infrastructure as Code principles and standard methodologies.
  • Experience with Grafana, Graylog, Elastic Stack, Prometheus, APM tools, Zabbix, Alerta, Grafana Loki
  • Deployment tools such as Terraform, SaltStack
  • Excellent verbal and written communication skills.
  • Good understanding of Unix Systems (Docker + virtualization)
  • Ability to learn quickly and comprehend new technologies.
  • Strong organizational and interpersonal skills, with an ability to build relationships.
  • Work effectively within a team environment.

Nice to have skills

  • Matomo
  • Ansible
  • OpenStack
  • InfluxDB

We believe diversity drives innovation. We are committed to inclusion across race, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status or disability status. We celebrate multiple approaches and diverse points of view.

We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.

We believe diversity drives innovation. We are committed to inclusion across race, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status or disability status. We celebrate multiple approaches and diverse points of view.

We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.

 

Apply now