JSON Data Ferret

Introduction

This is a Django application that manages JSON data with a full history of changes and moderation facilities.

What this provides

This Django application provides models and helper code to help a Django app manage some data.

There can be several types of data stored.

Each type can have:

  • A JSON Schema file against which data will be validated, as a way to provide a web form to edit the data.
  • A spreadsheet guide form file that lets users download spreadsheets with the existing data, edit them and import them.

Each type can hold a number of Records. Each record has a public ID (slug type) and one block of JSON.

The system keeps a history of all changes to the data, by way of Events. Each event has one or more Edits attached. Each event can optionally be linked to a user account and have a comment attached.

The system also provides a way for edits to be suggested, and a moderator to approve or reject these edits.

Each Edit can provide a whole new block of JSON to replace the current value with, or a smaller JSON value that will be merged into the current JSON value.

Edit’s can be approved straight away, or a future events can approve or reject edits.

Provided via a web interface

The application provides a admin web interface which any Django user with the correct permission can access.

This gives them access to call any operation on the data and a handy way to look at the current state of the data.

Provided via Python API’s

But it is anticipated most applications will include this code as a library and then provide their own application that will provide user-friendly interfaces and can build meaning on top of the JSON data.

In this repository an example application is included to illustrate this.

Explanation

Models

Type

The app can process data of several different types.

Types should be created as records in the database.

However most of the configuration of types happens in the Django configuration.

Each type is identified by it’s public_id, which should be unique.

Record

Each type can have multiple records.

Each record is identified by it’s public_id, which should be unique in it’s type.

Records are not created manually, but are instead created for you as soon as an Edit happens on a new public_id.

A record contains only basic information; it is a object/database row for Edits to link to.

It however also contains some columns of cached information, designed to make it easy for other apps to get data from the system.

These are:

  • cached_exists - boolean. A record is said to exist if there has ever been any data approved for it.
  • cached_data - JSON. The latest version of the actual data.
  • cached_jsonschema_validation_errors - If a JSON Schema is specified, this will contain JSON Schema errors for the latest data.

Event

Changes happen to data by way of Events. Think of them as a commit in git.

Each event can be linked to one or more edits in several different ways. See below.

Note:

  • an event may only contain new data to be moderated, and thus may not always change the actual data in the system.
  • an event may contain edits that are created and approved in the same event - data that was not moderated but approved instantly, in other words.

Each event is identified by it’s public_id, which should be unique - but these are set automatically for you.

Edit

Each edit contains actual data:

  • mode - Replace or Merge. This is how the edit is applied to the record.
  • data_key - Not currently used - leave as the default of /
  • data - The actual JSON data to merge or replace.

Each edit is linked to at least one Event by:

  • creation_event - The event that created it. This must be set.

Each edit can also have one (but not both) of these links set:

  • approval_event - The event that approved this data. (Note this can be the same event as the creation_event.)
  • refusal_event - The event that rejected this data.

Each edit is identified by it’s public_id, which should be unique - but these are set automatically for you.

How to Guides

Use as a library in your app

Include library

Add as a requirement:

-e git+https://github.com/OpenDataServices/json-data-ferret.git@v0.3.0#egg=jsondataferret

(Choosing the version you want)

In your Django settings file, add this to INSTALLED_APPS:

INSTALLED_APPS = [
    ...
    "jsondataferret.apps.JsondataferretConfig",
    ...
]

Set up Types

Now you need to set up the types you want to use. You do this by creating Type models. You can do this by any usual Django means - logging in to admin interface as a super user is probably easiest.

In your Django settings file you may also want to add a JSONDATAFERRET_TYPE_INFORMATION setting with extra information. See the Configuration reference for more

Use from your custom code

You can now use the Python API and read the models of this library as you require. See the Python API reference for more

Web UI

If you want people to be able to use the Web UI, you must first enable it and give the relevant user accounts permission.

In your Django app’s urls file add:

urlpatterns = [
    ...
    path("jsondataferret/", include("jsondataferret.urls")),
    ...
]

You need to set the correct permissions for each user of the web UI. You can do this by any means Django allows - e.g. logging into the admin interface as a superuser. See the reference for more

Reference

Configuration

Types

Some configuration options for each different type can be set in the normal Django configuration.

JSONDATAFERRET_TYPE_INFORMATION = {
    "type1": {
        "json_schema": {...},
        "spreadsheet_form_guide": "full filename"
    },
    "type2": {
        "json_schema": {...},
        "spreadsheet_form_guide": "full filename"
        "fields": [...],
    },
}

The key for each type (eg type1, type2) should match the public_id field in the Type record in the database.

JSON Schema

If this is set,

  • Any data will be validated against the JSON Schema and any errors or success will be shown to the user in the Web UI.
  • The user will be able to edit the data in a web browser using a JSON Schema widget.

The value should be the actual JSON Schema as a python dictionary. You will probably load it in the settings module.

with open(...) as fp:
    org_json_schema = json.load(fp)

JSONDATAFERRET_TYPE_INFORMATION = {
    "org": {
        "json_schema": org_json_schema,
    },

This is optional; if not set basic operations will still work.

Spreadsheet Guide Form

If this is set,

  • The user will be able to download and import a spreadsheet in the Web UI.

The value should be the filename of the spreadsheet. Ideally make it a absolute filename.

See the documentation for the Spreadsheet forms library for more on guide forms.

This is optional; if not set basic operations will still work.

Fields

If this is set,

  • The user will see a list of data from the JSON data presented as fields.

The value should be a list of python dictionaries.

If there is only one value in the data, the dictionary should look like:

{"key": "/project_name/value", "title": "Project Name (value)"},

The key should be the JSON path to the value and the title is what is shown to the user.

Where the data contains a list of dictionaries, you can also specify that. In this mode, you specify where the list is in the data then specify fields for each item in the list.

{
    "type": "list",
    "key": "/outcomes",
    "title": "Outcomes",
    "fields": [
        {"key": "/outcome", "title": "Outcome"},
        {"key": "/definition", "title": "Definition"},
    ],
},

This is optional; if not set basic operations will still work.

User Accounts

Normal Django user accounts are used.

The permission Admin - Can Admin All Data Managed by JSON Data Ferret is required for a user to access the admin web interface for the app. This will give them full permissions to change and moderate data.

This permission is not needed if a user calls some custom code in another app that calls one of the libraries Python API’s. In that case, it’s up to the calling code to check any user permissions as required.

Python API’s

Read from models directly

There are Django models with information, and currently you are encouraged to read from them to look up certain information.

In particular, the Record model contains some cached columns of the latest data.

Write to models directly

You can write to the Type model, to set up new types.

Do NOT write to the Record, Event or Edit models directly. Instead, use the Python API’s below (in particular jsondataferret.pythonapi.newevent) to write new data to the system.

jsondataferret.pythonapi.newevent

The function newEvent is used to write new data to the system.

It should be passed:

  • datas - an array of objects, described below.
  • user - A Django User
  • comment - A text comment

The items in the datas array should be instances of one of the following classes:

  • NewEventData - used to add new data to the system.
  • NewEventApproval - used to approve an edit that has previously been written to the system (moderate it successfully)
  • NewEventRejection - used to reject an edit that has previously been written to the system (moderate it and fail)

jsondataferret.pythonapi.purge

The function purge_record is used to delete a record and all associated data from the system permanently. There is no undo.

jsondataferret.pythonapi.runevents

The function clear_data_and_run_all_events clears all caches on Records, then tries to updates them all to the latest value.

This should not have to be run in normal operations, but may be needed to clear a problem.

Vagrant for Developers

A vagrant box exists to help developers.

Simply run vagrant up.

After vagrant ssh, run cd /vagrant and source .ve/bin/activate.

Run Web Server

python manage.py runserver 0:8000

Go to http://localhost:8000

Set up app for the first time

Run normal Django database migrations.

Create a superuser via the normal django command line tool:

python manage.py createsuperuser

Run the webserver.

Log into /admin.

Add some Types records in the Jsondataferret section, for use with the exaple app :

  • public id: project, title: Project
  • public id: org, title: Organisation

Python Packages Upgrade

pip-compile --upgrade
pip-compile --upgrade requirements_dev.in

Tests

Run tests (with Vagrant DB credentials):

JSONDATAFERRET_DATABASE_NAME=test JSONDATAFERRET_DATABASE_USER=test JSONDATAFERRET_DATABASE_PASSWORD=test python manage.py test

Code Quality

Clean up code before commit:

isort --recursive djangoproject/ jsondataferret jsondataferretexampleapp/ setup.py docs/
black djangoproject/ jsondataferret jsondataferretexampleapp/ setup.py docs/
flake8 djangoproject/ jsondataferret jsondataferretexampleapp/ setup.py docs/

Reset Database

sudo su postgres psql -c “DROP DATABASE app” psql -c “CREATE DATABASE app WITH OWNER app ENCODING ‘UTF8’ LC_COLLATE=’en_GB.UTF-8’ LC_CTYPE=’en_GB.UTF-8’ TEMPLATE=template0 ” exit python manage.py migrate