Earlier this week, the unthinkable happened. Facebook suffered its longest and largest outage in years. For 30 minutes, users across the globe were unable to access their accounts.
The outage underscored the fact that no provider, no server, and no data center is immune to outage in some form. Two years ago, Amazon’s east coast data center outage was widely publicized after severe storms knocked out the primary and backup power generators, and brought down service from Netflix, Pinterest, and Instagram.
So when bad things happen – whether it’s a power failure, a network outage, hardware crash, a blue screen, or a misbehaving driver – what can you do, and what should you expect of your service provider?
I sat down with Eric Burns yesterday to discuss this from the perspective of a video platform provider that was affected by the 2012 Amazon outage, and that is responsible for the streaming of hundreds of thousands of hours of business and education video each month. In the conversation, Eric discusses Panopto’s investments in video platform reliability and data integrity, new availability features implemented after the Amazon outage, and unique Panopto functionality built to minimize downtime and data loss.
How does Panopto minimize downtime and data loss when data center outages occur?
A cloud-hosted video platform can’t rely solely on the availability of its hosting provider to ensure uptime.
After the Amazon outage in 2012, the engineering team at Panopto rolled out an update that protects our customers against data center outages and reduces the chance of data loss. The feature that we deployed is called “cross-availability zone failover.” It involves the continual replication of our entire video platform across multiple Amazon data centers in different geographic locations and on different electrical grids.
For our customers, this means that as new recordings are created and uploaded to our servers, or as existing videos are imported into Panopto’s video content management system, we create copies of the files and all associated metadata, and keep them at the ready on a standby system. The standby system replicates all elements of our video platform topology, including front end servers, load balancing, encoding servers, video search servers, and the master database. If at any point, connectivity to the primary system is interrupted, Panopto automatically redirects traffic to the standby system. The result is that our customers will still be able to access the Panopto VCMS website, view videos, record new content, do live broadcasts, and administer their systems.
Outages don’t just happen on servers. Local hardware and operating systems can fail during recordings or live broadcasts as well. What does Panopto do to protect against these “local outages?”
Local outages can take several forms. The laptop managing a recording or live broadcast could have a hardware failure. The operating system could experience a kernel error. Sometimes it’s as simple as someone tripping over the power cord.
Whatever the cause, there are two critical steps that every video platform must take in these situations:
- Automatically restore the recording or live broadcast as quickly as possible
- Automatically repair the video files that were impacted by the outage
In 2013, Panopto rolled out a feature called Failsafe Recording that does just this.
First, Panopto’s video capture software recognizes when a recording or live broadcast has been interrupted due to a power outage, hardware failure, or operating system crash. When the machine comes back online, or even if a separate replacement machine is brought online, Panopto gives the user the opportunity to simply pick up the recording or live stream where they left off.
Then, when the recording or live broadcast wraps up, Panopto recovers the original video file that was interrupted and automatically stitches it together with the second part of the recording as though the outage never occurred. This continuous monitoring and auto-recovery helps ensure against data loss, and requires no additional work on the part of our customers to repair and splice together potentially damaged video files.
Many of Panopto’s customers schedule presentation and lecture recording in advance. How are these scheduled or “automated” recordings impacted by outages?
When customers schedule recordings in the future, or they automate the capture of recurring lectures, Panopto’s video capture software downloads a schedule that resides locally on the recording machine. This enables scheduled recordings to take place as expected even if the server is unresponsive for any reason. Recorded information is stored on the local hard drive, and when connectivity to the server is restored, Panopto will automatically upload the offline recordings and begin processing them for playback.
Recently, one of our customers experienced this situation in production. In February, just before midterm exams, Thomas Jefferson University suffered a three-day server outage. During this time, Panopto’s automated recording software continued capturing all of the school’s lectures even though there was no connectivity to the server, and as a result, none of the valuable information covered in the lectures was lost. You can find out more about TJU’s experience in our case study.
Find out more
If your business or university is considering the use of a video platform, or you have questions about Panopto’s reliability and uptime features, we’d love to chat. Contact our team to request a free trial of our video software.