Data W(e)ave On The Beach: Synchroniser une liste / Synchronize a list

Pour mon grand retour, nous allons revenir sur un problème qui a déjà été traité dans un précédent billet. La solution proposée ici ne s'écarte de cette première mouture que sur la forme: nous allons utiliser ici les opérateurs "ensemblistes" sur les objets Dataweave (qui permettent d'ajouter ou retirer des attributs d'un objet). L'écriture en est un peu simplifiée.

Pour rappel, il est possible d'ajouter un attribut a un objet en utilisant l'opérateur "++":

a ++ (key):value

ou "key" contient le nom du champ et "value", la valeur à lui affecter. Pour supprimer un champ d'un objet, il faut écrire:

a - (key)

N'oubliez pas les parenthèses autour du token "key", pour que Dataweave comprenne qu'il s'agit d'une variable dans laquelle il retrouvera le nom de l'attribut (sinon, Dataweave utilisera "token" comme nom d'attribut !)

Encore plus intéressant, il est possible de supprimer de multiple attributs d'un objet grâce à l'opérateur "--", la variable keys étant un tableau de chaine de caractères contenant les noms des attributs à retirer.

a -- (keys)

Dernier point, il est possible d'accéder à la valeur d'un attribut dont le nom est donné de façon dynamique, en utilisant l'opérateur []:

a[key]

Revenons à notre problème. Prenons la paire de listes d'objets suivante (stockée dans la variable "vars" du "Dataweave Playground" afin de pouvoir facilement la reporter dans un composant "Transform" d'une application réelle). "L1" est la version initiale de la liste de départ et "L2" est la version suivante:

{
    "l1": [
        {
            "id": 1,
            "value": "1"
        },
        {
            "id": 2,
            "value": "2"
        },
        {
            "id": 3,
            "value": "3"
        }
    ],
    "l2": [
        {
            "id": 1,
            "value": "A"
        },
        {
            "id": 4,
            "value": "4"
        },
        {
            "id": 3,
            "value": "3"
        }
    ]
}

Un objet présent dans L1 et absent dans L2 sera considéré comme supprimé. Au contraire, un objet présent dans L2 et absent dans L1 sera considéré comme créé. Un objet présent dans L1 et L2 sera considéré comme modifié si son contenu a évolué entre ces deux versions. Notons que les objets sont identifiés par un attribut "id", qui doit évidemment être considéré comme un invariant (sinon ce ne serait pas un identifiant !). Le résultat attendu est donc le suivant:

{
  "created": [
    {
      "id": 4,
      "value": "4"
    }
  ],
  "removed": [
    {
      "id": 2,
      "value": "2"
    }
  ],
  "modified": [
    {
      "id": 1,
      "value": "A"
    }
  ]
}

La snippet qui permet de l'obtenir est la suivante:

%dw 2.0
import dw::Crypto
output application/java
fun toObject(l) = l default [] reduce (i, a={})-> (a ++ (i.id):i)
fun minus(la, lb) = (toObject(la) -- lb..id) pluck $
fun canonize(l) = toObject(l) mapObject (i)->(i.id): 
    Crypto::MD5(write(i) as Binary)
---
do {
    var created = minus(vars.l2, vars.l1)
    var removed = minus(vars.l1, vars.l2)
    var updated = minus(vars.l2, created)
    var previous = canonize(vars.l1)
    ---
    {
        created: created,
        removed: removed,
        modified: updated filter (i)->(previous[i.id as String] != 
            Crypto::MD5(write(i) as Binary)),
    }
}

L'idée essentielle est de créer une méthode (minus) qui permet de "soustraire" une liste à une autre. Elle permet d'isoler les objets qui existent dans une liste et pas dans une autre.

En soustrayant L2 de L1, on obtient la liste des objets supprimés,
L1 de L2, les objets créés et
les objets qui sont conservés, en soustrayant de L2, les objets qui viennent d'être créés.

Pour obtenir les objets conservés, on aurait pu retirer les objets supprimés de L1. Mais dans ce cas, on aurait eu la version initiale des objets et non la version finale (et donc à jour) de ces mêmes objets. Il ne nous reste plus qu'à vérifier pour les objets conservés, si leur contenu à changé. La façon le plus simple et la plus robuste consiste - cf. mes articles précédent - à comparer les checksums.

Est-ce que cette solution est meilleure que celle présentée dans le post précédent ? La réponse est ... non, bien au contraire. Le fonctionnement interne de Dataweave fait que la solution présentée ici présente une complexité de type O(N2) la rendant catastrophique pour les listes de grande taille. La très subtile raison à cela fera l'objet du prochain post.

_________________________________________________________________________

For my comeback, we're going back to a problem that has already been dealt with in a previous post. The solution proposed here differs from that first version only in form: here, we're going to use the “assembly” operators on Dataweave objects (which enable us to add or remove attributes from an object). The writing is a little simplified.

As a reminder, you can add an attribute to an object using the “++” operator:

a ++ (key):value

where “key” contains the name of the field and “value”, the value to be assigned to it.To delete a field from an object, write:

a - (key)

Don't forget the parentheses around the token “key”, so that Dataweave understands that this is a variable in which it will find the attribute name (otherwise, Dataweave will use “token” as the attribute name!).

Even more interestingly, it is possible to remove multiple attributes from an object using the “--” operator, the keys variable being a string array containing the names of the attributes to be removed.

a -- (keys)

Finally, it is possible to access the value of an attribute whose name is given dynamically, using the [] operator:

a[key]

Let's return to our problem. Let's take the following pair of object lists (stored in the “vars” variable of the “Dataweave Playground” so that it can easily be transferred to a “Transform” component of a real application). “L1” is the initial version of the starting list and ‘L2’ is the subsequent version:

{
    "l1": [
        {
            "id": 1,
            "value": "1"
        },
        {
            "id": 2,
            "value": "2"
        },
        {
            "id": 3,
            "value": "3"
        }
    ],
    "l2": [
        {
            "id": 1,
            "value": "A"
        },
        {
            "id": 4,
            "value": "4"
        },
        {
            "id": 3,
            "value": "3"
        }
    ]
}

An object present in L1 and absent in L2 is considered deleted. Conversely, an object present in L2 and absent in L1 will be considered as created. An object present in L1 and L2 will be considered modified if its content has changed between these two versions. Note that objects are identified by an “id” attribute, which must obviously be considered an invariant (otherwise it wouldn't be an identifier!). The expected result is as follows:

{
  "created": [
    {
      "id": 4,
      "value": "4"
    }
  ],
  "removed": [
    {
      "id": 2,
      "value": "2"
    }
  ],
  "modified": [
    {
      "id": 1,
      "value": "A"
    }
  ]
}

The snippet to obtain it is as follows:

%dw 2.0
import dw::Crypto
output application/java
fun toObject(l) = l default [] reduce (i, a={})-> (a ++ (i.id):i)
fun minus(la, lb) = (toObject(la) -- lb..id) pluck $
fun canonize(l) = toObject(l) mapObject (i)->(i.id): 
    Crypto::MD5(write(i) as Binary)
---
do {
    var created = minus(vars.l2, vars.l1)
    var removed = minus(vars.l1, vars.l2)
    var updated = minus(vars.l2, created)
    var previous = canonize(vars.l1)
    ---
    {
        created: created,
        removed: removed,
        modified: updated filter (i)->(previous[i.id as String] != 
            Crypto::MD5(write(i) as Binary)),
    }
}

The essential idea is to create a method (minus) that allows you to “subtract” one list from another. It isolates objects that exist in one list but not in another.

By subtracting L2 from L1, we obtain the list of deleted objects,
L1 from L2, objects created and
objects that are kept, subtracting from L2, objects that have just been created.

To obtain the retained objects, we could have removed the deleted objects from L1. But in that case, we'd have the initial version of the objects, not the final (and therefore up-to-date) version. All that remains is to check whether the content of the retained objects has changed. The simplest and most robust way to do this - see my previous articles - is to compare checksums.

Is this solution better than the one presented in the previous post? The answer is... no, quite the contrary. The internal workings of Dataweave mean that the solution presented here has O(N2) complexity, making it catastrophic for large lists. The very subtle reason for this will be the subject of the next post.

Data W(e)ave On The Beach

jeudi 2 janvier 2025

Synchroniser une liste / Synchronize a list

Aucun commentaire:

Enregistrer un commentaire

Pourquoi ce blog ? / Why this blog?

Signaler un abus