User:Senya/Account Backup And Restore: Difference between revisions

From diaspora* project wiki
 
(26 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{Work in progress}}
{{Note|This specification [https://the-federation.info/specs/backup-restore/ has a new home]. Please consider information here outdated since 7th February 2016 and only kept for historical purposes.}}
{{Note|This is a proposal, not an accepted specification.}}


== Description ==  
== Description ==  
Line 8: Line 7:
* Dying pods (ie losing your account due to pod disappearing)
* Dying pods (ie losing your account due to pod disappearing)
* Pod migration (ie not being able to move pods)
* Pod migration (ie not being able to move pods)
Subproblem solved:
* User wants to migrate from one pod to another for some reasons other than the pod's death


== High level concept ==
== High level concept ==


* Users are backed up automatically to pods who opt-in to receive backup data
* Users can be backed up automatically to pods who opt-in to receive backup data
* Users can download their backup manually, for importing to another pod
* The restore process migrates the user to the pod he/she is restoring on
* The restore process migrates the user to the pod he/she is restoring on
* Backup data is encrypted to protect the private key contained
* Backup data is encrypted to protect the private key contained
Line 24: Line 28:
* Pass phrase is stored in user table in plain text (so it can be used to encrypt).
* Pass phrase is stored in user table in plain text (so it can be used to encrypt).
* Backup pod is chosen from available pods automatically and stored in user data. In the case of no available backup pods, exit here.
* Backup pod is chosen from available pods automatically and stored in user data. In the case of no available backup pods, exit here.
* Jump to 'Initializing a backup'
=== Initializing a backup ===
* User is sent an email containing information regarding: Where to restore in case of problems? What information to store (the pass phrase)
* User is sent an email containing information regarding: Where to restore in case of problems? What information to store (the pass phrase)
* Schedule initial sync immediately.
* Schedule initial sync immediately.
Line 32: Line 40:
* User chooses to a new backup pod from a list of available pods.
* User chooses to a new backup pod from a list of available pods.
* User chooses a pass phrase.
* User chooses a pass phrase.
* User is sent an email containing information regarding: Where to restore in case of problems? What information to store (the pass phrase)
* Jump to 'Initializing a backup'
* Schedule initial sync immediately.
 
=== Manually downloading a backup archive ===
 
* User goes to settings
* User chooses 'Request my profile data'
 
=== User requests restore (automatically backed up archive) ===
 
* User follows link to restore page on backup pod (for example /restore_backup)
* User gives handle and pass phrase.
 
=== User requests restore (manually downloaded backup archive) ===
 
* User follow link to restore page on backup pod (for example /restore_backup)
* User uploads their backup archive
* Pod checks that user is known. Restore can only be made for known remote profiles.
* Jump to use case 'User data restore'


=== User data restore ===
=== User data restore ===


* User follows link to restore page on backup pod (for example /restore_backup)
* (arrival from 'User requests restore')
* User gives email and pass phrase.
* Private key in backup archive is verified against the currently stored profile public key.
* User is sent an email which needs to be verified by opening the link.
* User is sent an email (which is contained in the backup archive) which needs to be verified by opening the link.
* User is asked to provide a password and username for their account.
* User is asked to provide a password and username for their account.
* A local account is created on the backup pod and linked to a possible profile.
* A local account is created on the backup pod and linked to a possible profile.
* Backup pod sends out "I've moved" federation messages to all known pods, signed with the users private key.
* A new key pair is generated for the user on the backup pod, which becomes now the users home pod.
* Backup pod sends out "I've moved" federation messages to all known pods, signed with the users ''old'' private key. The moved message should contain the users new public key.
* Backup pod receives a reply from every pod that the moved message was proccessed well. If there is no reply, schedule repetition of the message. ''Probably we need a retry limit.''


=== Receiving a moved message - remote pod ===
=== Receiving a moved message - remote pod ===


* Pod verifies message against user public key (as normal)
* Pod verifies message against user public key (as normal)
* Pod stores new public key of user which is contained in the moved message
* Pod fetches new profile as per moved message
* Pod fetches new profile as per moved message
* Pod replaces local profile with new information
* Pod replaces local profile with new information
* Pod makes sure all locally stored content to this person points to the changed profile (should only require change of profile data)
* Pod makes sure all locally stored content to this person points to the changed profile (should only require change of profile data)
* Send a message to a backup pod that the moving was well.


=== Receiving a moved message - old home pod ===
=== Receiving a moved message - old home pod ===


* Pod verifies message against user public key (as normal)
* Handle use case 'Receiving a moved message - remote pod'
* Pod fetches new profile as per moved message
* Pod replaces local profile with new information
* Pod removes local user, making the user now a remote user
* Pod removes local user, making the user now a remote user
* Pod makes sure all locally stored content to this person points to the changed profile (should only require change of profile data)
* Send a message to a backup pod that the moving was well.


=== Backup pod receiving a backup for the first time ===
=== Backup pod receiving a backup for the first time ===


* Fetch user profile and public posts from source pod
* Verify backup message
* Send email to user that backups are being received and instructions on how to proceed if migration is wanted.
* Send email to user that backups are being received and instructions on how to proceed if migration is wanted.


Line 66: Line 94:


* Podmins enable "allow_backups_and_migration" in pod settings.
* Podmins enable "allow_backups_and_migration" in pod settings.
* NodeInfo gets a new metadata entry to indicate pod accepts backups.
* New endpoint .well-known/x-acc-backup-restore returns a hash that includes a setting indicating that pod accepts backups.


=== Receiving backups ===
=== Receiving backups ===
Line 75: Line 103:
=== Automatic sync ===
=== Automatic sync ===


<strike>
* Pods should schedule backups of user data once a week to the backup pods. This probably should not be configurable to avoid easy abuse. Do this as a sidetiq job.
* Pods should schedule backups of user data once a week to the backup pods. This probably should not be configurable to avoid easy abuse. Do this as a sidetiq job.
* When sidetiq job triggers, schedule a job backup sidekiq job for each individual user. Check when scheduling a job for a user that queue doesn't already have a job for this user (to avoid creating endless amounts of jobs is sidekiq processing has stopped for a long time).
* When sidetiq job triggers, schedule a job backup sidekiq job for each individual user. Check when scheduling a job for a user that queue doesn't already have a job for this user (to avoid creating endless amounts of jobs is sidekiq processing has stopped for a long time).


Rationale: one sidetiq job as master to trigger lots of small regular sidekiq jobs. Doing all backups through one job would be too heavy on large pods. Scheduling all jobs individually in sidetiq would be error prone and bloat the sidetiq scheduling.
Rationale: one sidetiq job as master to trigger lots of small regular sidekiq jobs. Doing all backups through one job would be too heavy on large pods. Scheduling all jobs individually in sidetiq would be error prone and bloat the sidetiq scheduling.
</strike>
''Note: Sidetiq is being removed from diaspora*. Actual implementation details of how and how often pods backup users could be left out of high level backup/restore spec. Pods should however aim to back up users weekly at minimum.''


=== Sync to backup pod fails ===
=== Sync to backup pod fails ===
Line 84: Line 116:
* Increase failed count in user data.
* Increase failed count in user data.
* If failed count is over threshold (default 3), choose a new backup pod from available pods.
* If failed count is over threshold (default 3), choose a new backup pod from available pods.
* User is sent an email containing information regarding: Where to restore in case of problems? What information to store (the pass phrase)
* Jump to 'Initializing a backup'
* Schedule initial sync immediately.


=== Pod reads NodeInfo from other pods ===
=== Pod reads .well-known/x-acc-backup-restore from other pods ===


* Update information of pods wanting to receive backups
* Update information of pods wanting to receive backups
* If no backup pods previously existed, add backup pod to all existing users who didn't opt out, send them a notification and schedule initial sync (see User creation -story).
* If no backup pods previously existed, add backup pod to all existing users who didn't opt out, send them a notification and schedule initial sync (see User creation -story).
* If a pod that was previously backup ready becomes not backup ready, choose a new backup pod to all existing users who have that pod currently, send them a notification and schedule initial sync (see User creation -story).
* If a pod that was previously backup ready becomes not backup ready, choose a new backup pod to all existing users who have that pod currently, send them a notification and schedule initial sync (see User creation -story).
* A backup ready pod may return status within .well-known/x-acc-backup-restore that new backups are not accepted anymore. In this case all existing backups work as usual, but no backups for new accounts will be accepted.
=== Maintenance of stale backup data ===
* Sidetiq job running every month to purge backup data that has not been updated in the last year.


== Pod schema changes ==
== Pod schema changes ==
Line 97: Line 133:
=== Backup data ===
=== Backup data ===


Pods need a new model "backups" to contain backups from remote pods. This should be uniqued on email and any new backups should replace the current one.
Pods need a new model "backups" to contain backups from remote pods. This should be uniqued on backed up diaspora handle and any new backups should replace the current one.


* email
* diaspora handle
* content (encrypted)
* content (encrypted)
Pods don't need to even know where the data is from - just email and the actual encrypted backup.


=== User data ===
=== User data ===
Line 112: Line 146:
* pass phrase
* pass phrase


Backup pod is the domain.tld where the backups are sent. If it changes, it should be updated.
Backup pod is the domain.tld where the backups are sent. If it changes, it should be updated. Passphrase should be ''unencrypted'' because it is used to encrypt the archives sent out.


=== Pod data ===
=== Pod data ===
Line 118: Line 152:
* backups allowed flag
* backups allowed flag


== Backup contents ==
== Backup package ==
 
=== Contents ===
 
Minimal version for automatic backups:


* Profile
* Profile
Line 125: Line 163:
* Followed tags
* Followed tags
* Public and private key
* Public and private key
Profile data export package contains also posts and comments<strike> which can be ignored during the restore process</strike>. Posts and comments are created on the backup pod if they don't exist there yet.
''Current diaspora* user export does not contain the private and public keys. These should be added there in order for the manual export process to work.''
=== Encryption for automatic backup process ===
* Encrypt the whole backup package using the given user pass phrase.
== Backup delivery message ==
To protect from arbitrary storage of data and to validate backup ownership, the backup delivery needs to be signed with the user private key, as the federation messages are signed. A receiving pod should check the signature against user public key before storing the backup.
The message thus needs to contain:
* Handle being backed up
* Encrypted backup content
This creates the following initial message schema:
    {
      "$schema": "http://json-schema.org/draft-04/schema#",
      "id": "https://wiki.diasporafoundation.org/Account_Backup_And_Restore",
      "type": "object",
      "properties": {
        "handle": {
          "id": "https://wiki.diasporafoundation.org/Account_Backup_And_Restore/handle",
          "type": "string"
        },
        "backup": {
          "id": "https://wiki.diasporafoundation.org/Account_Backup_And_Restore/backup",
          "type": "string"
        }
      },
      "required": [
        "handle",
        "backup"
      ]
    }
"backup" is signed and encrypted using the user private key. Once opened, it should contain the following schema:
    {
      "$schema": "http://json-schema.org/draft-04/schema#",
      "id": "https://wiki.diasporafoundation.org/Account_Backup_And_Restore/backup",
      "type": "object",
      "properties": {
        "email": {
          "id": "https://wiki.diasporafoundation.org/Account_Backup_And_Restore/backup/email",
          "type": "string"
        },
        "content": {
          "id": "https://wiki.diasporafoundation.org/Account_Backup_And_Restore/backup/content",
          "type": "string"
        }
      },
      "required": [
        "email",
        "content"
      ]
    }
A pod stores "handle" and "content" to the user backups table. "content" is encrypted with the user pass phrase.
== Improvement ideas ==
For additions after initial implementation. These should not be considered as required in an initial implementation.
=== Moved flag ===
When a restore is done and moved message is received by the old pod, a 'moved' flag in the person profile is updated on the old pod. The moved flag should contain the new handle of the user, ie full username@domain.tld.
When receiving federation payloads directed at the user directly, respond to the caller with a "moved to" message, giving the users new handle.
Sender should not trust these messages as is, instead it should send a "confirm moved" message to the new handle. The response should be a signed with old private key message that the caller can use the verify move. If the response is ok, the new profile should be fetched and generally "Receiving a moved message - remote pod" story followed.
Extra database schema changes due to the moved flag:
User:
* moved_to - for storing the new handle
* moved_from - for storing the old handle (needed when receiving "confirm moved" -messages)
* old_private_key - assuming private keys are regenerated after restore, needed for generating responses for "confirm moved" -messages


[[Category:Proposals]]
[[Category:Proposals]]

Latest revision as of 02:08, 23 March 2017

NoteNote:This specification has a new home. Please consider information here outdated since 7th February 2016 and only kept for historical purposes.

Description

Proposal specification to deal with two problems:

  • Dying pods (ie losing your account due to pod disappearing)
  • Pod migration (ie not being able to move pods)

Subproblem solved:

  • User wants to migrate from one pod to another for some reasons other than the pod's death

High level concept

  • Users can be backed up automatically to pods who opt-in to receive backup data
  • Users can download their backup manually, for importing to another pod
  • The restore process migrates the user to the pod he/she is restoring on
  • Backup data is encrypted to protect the private key contained
  • Backup is automatic and after the user chooses a pass phrase, the user doesn't have to do anything else, except store a piece of information

User stories

User creation

  • User arrives at user creation.
  • User is asked to give a pass phrase for backups. User can opt-out, which ends this user story.
  • Pass phrase is stored in user table in plain text (so it can be used to encrypt).
  • Backup pod is chosen from available pods automatically and stored in user data. In the case of no available backup pods, exit here.
  • Jump to 'Initializing a backup'

Initializing a backup

  • User is sent an email containing information regarding: Where to restore in case of problems? What information to store (the pass phrase)
  • Schedule initial sync immediately.

Manually choosing a backup pod

  • User goes to settings.
  • User chooses to a new backup pod from a list of available pods.
  • User chooses a pass phrase.
  • Jump to 'Initializing a backup'

Manually downloading a backup archive

  • User goes to settings
  • User chooses 'Request my profile data'

User requests restore (automatically backed up archive)

  • User follows link to restore page on backup pod (for example /restore_backup)
  • User gives handle and pass phrase.

User requests restore (manually downloaded backup archive)

  • User follow link to restore page on backup pod (for example /restore_backup)
  • User uploads their backup archive
  • Pod checks that user is known. Restore can only be made for known remote profiles.
  • Jump to use case 'User data restore'

User data restore

  • (arrival from 'User requests restore')
  • Private key in backup archive is verified against the currently stored profile public key.
  • User is sent an email (which is contained in the backup archive) which needs to be verified by opening the link.
  • User is asked to provide a password and username for their account.
  • A local account is created on the backup pod and linked to a possible profile.
  • A new key pair is generated for the user on the backup pod, which becomes now the users home pod.
  • Backup pod sends out "I've moved" federation messages to all known pods, signed with the users old private key. The moved message should contain the users new public key.
  • Backup pod receives a reply from every pod that the moved message was proccessed well. If there is no reply, schedule repetition of the message. Probably we need a retry limit.

Receiving a moved message - remote pod

  • Pod verifies message against user public key (as normal)
  • Pod stores new public key of user which is contained in the moved message
  • Pod fetches new profile as per moved message
  • Pod replaces local profile with new information
  • Pod makes sure all locally stored content to this person points to the changed profile (should only require change of profile data)
  • Send a message to a backup pod that the moving was well.

Receiving a moved message - old home pod

  • Handle use case 'Receiving a moved message - remote pod'
  • Pod removes local user, making the user now a remote user
  • Send a message to a backup pod that the moving was well.

Backup pod receiving a backup for the first time

  • Fetch user profile and public posts from source pod
  • Verify backup message
  • Send email to user that backups are being received and instructions on how to proceed if migration is wanted.

Becoming a backup pod

  • Podmins enable "allow_backups_and_migration" in pod settings.
  • New endpoint .well-known/x-acc-backup-restore returns a hash that includes a setting indicating that pod accepts backups.

Receiving backups

  • Pod receives a backup message from another pod and stores it.
  • If a previous backup for the user already exists, replace it.

Automatic sync

  • Pods should schedule backups of user data once a week to the backup pods. This probably should not be configurable to avoid easy abuse. Do this as a sidetiq job.
  • When sidetiq job triggers, schedule a job backup sidekiq job for each individual user. Check when scheduling a job for a user that queue doesn't already have a job for this user (to avoid creating endless amounts of jobs is sidekiq processing has stopped for a long time).

Rationale: one sidetiq job as master to trigger lots of small regular sidekiq jobs. Doing all backups through one job would be too heavy on large pods. Scheduling all jobs individually in sidetiq would be error prone and bloat the sidetiq scheduling.

Note: Sidetiq is being removed from diaspora*. Actual implementation details of how and how often pods backup users could be left out of high level backup/restore spec. Pods should however aim to back up users weekly at minimum.

Sync to backup pod fails

  • Increase failed count in user data.
  • If failed count is over threshold (default 3), choose a new backup pod from available pods.
  • Jump to 'Initializing a backup'

Pod reads .well-known/x-acc-backup-restore from other pods

  • Update information of pods wanting to receive backups
  • If no backup pods previously existed, add backup pod to all existing users who didn't opt out, send them a notification and schedule initial sync (see User creation -story).
  • If a pod that was previously backup ready becomes not backup ready, choose a new backup pod to all existing users who have that pod currently, send them a notification and schedule initial sync (see User creation -story).
  • A backup ready pod may return status within .well-known/x-acc-backup-restore that new backups are not accepted anymore. In this case all existing backups work as usual, but no backups for new accounts will be accepted.

Maintenance of stale backup data

  • Sidetiq job running every month to purge backup data that has not been updated in the last year.

Pod schema changes

Backup data

Pods need a new model "backups" to contain backups from remote pods. This should be uniqued on backed up diaspora handle and any new backups should replace the current one.

  • diaspora handle
  • content (encrypted)

User data

Users need some extra flags:

  • backup pod
  • backup pod failed sync count
  • pass phrase

Backup pod is the domain.tld where the backups are sent. If it changes, it should be updated. Passphrase should be unencrypted because it is used to encrypt the archives sent out.

Pod data

  • backups allowed flag

Backup package

Contents

Minimal version for automatic backups:

  • Profile
  • Aspects
  • Contacts
  • Followed tags
  • Public and private key

Profile data export package contains also posts and comments which can be ignored during the restore process. Posts and comments are created on the backup pod if they don't exist there yet.

Current diaspora* user export does not contain the private and public keys. These should be added there in order for the manual export process to work.

Encryption for automatic backup process

  • Encrypt the whole backup package using the given user pass phrase.

Backup delivery message

To protect from arbitrary storage of data and to validate backup ownership, the backup delivery needs to be signed with the user private key, as the federation messages are signed. A receiving pod should check the signature against user public key before storing the backup.

The message thus needs to contain:

  • Handle being backed up
  • Encrypted backup content

This creates the following initial message schema:

   {
     "$schema": "http://json-schema.org/draft-04/schema#",
     "id": "https://wiki.diasporafoundation.org/Account_Backup_And_Restore",
     "type": "object",
     "properties": {
       "handle": {
         "id": "https://wiki.diasporafoundation.org/Account_Backup_And_Restore/handle",
         "type": "string"
       },
       "backup": {
         "id": "https://wiki.diasporafoundation.org/Account_Backup_And_Restore/backup",
         "type": "string"
       }
     },
     "required": [
       "handle",
       "backup"
     ]
   }

"backup" is signed and encrypted using the user private key. Once opened, it should contain the following schema:

   {
     "$schema": "http://json-schema.org/draft-04/schema#",
     "id": "https://wiki.diasporafoundation.org/Account_Backup_And_Restore/backup",
     "type": "object",
     "properties": {
       "email": {
         "id": "https://wiki.diasporafoundation.org/Account_Backup_And_Restore/backup/email",
         "type": "string"
       },
       "content": {
         "id": "https://wiki.diasporafoundation.org/Account_Backup_And_Restore/backup/content",
         "type": "string"
       }
     },
     "required": [
       "email",
       "content"
     ]
   }

A pod stores "handle" and "content" to the user backups table. "content" is encrypted with the user pass phrase.

Improvement ideas

For additions after initial implementation. These should not be considered as required in an initial implementation.

Moved flag

When a restore is done and moved message is received by the old pod, a 'moved' flag in the person profile is updated on the old pod. The moved flag should contain the new handle of the user, ie full username@domain.tld.

When receiving federation payloads directed at the user directly, respond to the caller with a "moved to" message, giving the users new handle.

Sender should not trust these messages as is, instead it should send a "confirm moved" message to the new handle. The response should be a signed with old private key message that the caller can use the verify move. If the response is ok, the new profile should be fetched and generally "Receiving a moved message - remote pod" story followed.

Extra database schema changes due to the moved flag:

User:

  • moved_to - for storing the new handle
  • moved_from - for storing the old handle (needed when receiving "confirm moved" -messages)
  • old_private_key - assuming private keys are regenerated after restore, needed for generating responses for "confirm moved" -messages