Data W(e)ave On The Beach: juillet 2024

Le but de cet article est d'expliquer comment peut-on administrer une application Mulesoft ou un ensemble d'applications Mulesoft. Bien que le Control Pane fourni par CloudHub propose de nombreuses pages de configuration / consultation - en particulier pour les logs - elle ne permet pas de répondre rapidement aux questions très liées au domaine traité, telles que: "pourquoi ma commande CI234325 n'a été importée dans SAP" ou "Est-ce que le fichier MVTOPT_1243.csv a été traité ?". Sans développement adapté, répondre à ce type de question nécessite une investigation - et de l'archéologie dans les logs - rarement négligeable en terme de temps passé.

Il n'est donc pas raisonnable de se contenter des fonctions fournies en standard par l'outil, il faut l'accompagner de fonctionnalités d'administration spécifiques, qu'il faut développer en parallèle des fonctionnalités qui portent la valeur métier. Certains rétorqueront qu'une utilisation soignée des logs peut palier au problème et ils auraient raison... et tort. Raison, car en effet, toutes les informations nécessaires pour alimenter un outil d'investigation peuvent s'y trouver et tort, car les meilleures logs au monde ont quand même besoin d'un outil sophistiqué pour pouvoir être exploitées efficacement. Les logs (comme on le verra plus loin) sont effectivement une solution technique à notre problème, mais en partie seulement.

Prenons un peu de hauteur: à qui et à quoi sert un outil d'investigation

Qui ? essentiellement deux types d'intervenants:

le Support Niveau 2 (c'est à dire la personne qui doit localiser OU le problème a eu lieu)
l'exploitant, qui doit s'assurer que les flux inter-applicatifs fonctionnent correctement.

Quoi ?

Répondre aux questions très concrètes posées plus haut,
vérifier le bon fonctionnement du système,
lancer des commandes pour assurer ce bon fonctionnement.

Pour qu'un tel outil soit vraiment utile et utilisable, il faut:

Qu'il soit unique : si le support niveau 2 doit farfouiller dans les logs de dix outils différents il risque de consommer beaucoup de temps à consulter ceux qui ne sont pour rien dans l'incident. S'il n'a qu'un outil, il saura immédiatement où l'incident a eu lieu.

Qu'il soit efficace : c'est à dire qu'il propose :

un filtre sur mot clé, généralement l'identifiant d'un objet fonctionnel (la commande, le produit, le client) ou technique (le fichier d'import MVTOPT_1243.csv)
d'ordonner les lignes récupérées
de mettre en exergue les informations les plus pertinentes
d'alerter en cas de dysfonctionnement grave
de surveiller l'usage de certaines ressources ou de détecter la présence ou l'absence d'artefacts ou de services
des méthodes simples d'intervention pour relancer un flux ou apporter une correction

On voit ici que les fonctionnalités demandées sont peu ou prou celles offertes par une base de données. Notons que si les fonctionnalités de recherches sont plus puissantes, le travail d'investigation s'en trouvera encore davantage faciliter.

En bref, une solution d'administration est composée des trois éléments donnés dans le schéma ci-dessous:

les applications Mulesoft qui alimentent le système en notifications,
une base de données (au sens général: un outil capable de conserver de l'information et de la structurer, pas forcément un SGBD)
un outil de consultation des notifications collectées capable aussi d'emmètre des commandes à destination des serveurs Mulesoft.

Les applications Mulesoft alimentent la base de données au fur et à mesure de l'exécution des API et des flux. Le support de niveau 2 ainsi que l'exploitation utilise l'outil de consultation pour vérifier l'état du système ou identifier la cause d'un incident. Notons qu'un tel système peut être utilisé par d'autres composantes du SI que Mulesoft et qu'il est même très utile que ce soit le cas: cela permet de suivre le déroulement d'un processus de l'entreprise de manière transversale aux outils.

Quelles informations conserver et comment doivent elles être structurées ? Je propose le principe suivant (vu chez un de nos clients et qui s'est révélé particulièrement efficace). La structure d'une notification est la suivante:

Un identifiant de session (ou de identifiant de transaction)
Le type et l'identifiant de l'objet technique/fonctionnel concerné par la notification
La date (estampille temporelle a la milliseconde) ou la notification a été émise
La source (= instance Mulesoft ou outil tiers) à l'origine de la notification
L'opération en cours (sachant qu'une session/transaction peut être une succession de ces opérations)
Une zone d'information plus générale.

La structure des notifications fait penser à celle de logs et effectivement certains logs peuvent être utilisées pour implémenter des notifications. Nous verrons dans la suite de ce document que plusieurs solutions de monitoring existantes utilisent ce type d'implémentation (ELK, Titanium).

L'identifiant de session (identifiant généralement un processus métier) est le liant entre les différents objets impliqués dans le processus (par exemple, le fichier csv et les commandes décrites dans le fichier csv que l'on crée) et pour cette raison, c'est un élément essentiel pour reconstruire l'historique de transactions complexes.

La partie information peut contenir n'importe quel type d'information, mais nous avons avantage à ce que cette information soit structurée (XML, JSON) car cela permet de construire des filtres plus sophistiquées.

Dressons une liste des implémentations les plus courantes:

ELK

C'est un peu la solution par défaut. Elle consiste à rapatrier les logs de Mulesoft (et des autres outils) à l'aide de Logstash, de les stocker dans une base Elastic Search et d'utiliser ensuite Kibana pour l'exploiter. Ces trois outils sont proposés en Open Source.

Les avantages de la solution sont :

Solution de référence du marché
Solution de référence de l’éditeur MULESOFT
Compétences disponibles sur le marché
Outillage disponible et éprouvé Open Source
Utilise les LOG Mulesoft
Utilise les LOG des autres outils du SI
Peut être utile même si aucune fonction de support/monitoring n’a été développée
Intégration avec les logs de DEBUG/TRACE

Elle présente cependant quelques inconvénients:

Les compétences ELK sont très techniques et chères
La maintenance d’une solution ELK est hors de portée d’un client non technique
Le coût d’une installation ELK est non négligeable
La maîtrise de Kibana est essentielle pour proposer un service adapté
Infrastructure ELK dédiée requise (Cloud ou OnPremise)
Ne répond pas au problématiques d’intervention/correction (=> Dev d’une API et d’une collection POSTMAN dediée)

C'est en général la solution qui sera choisie par les DSI disposant d'un personnel vraiment technique et qui ont déjà installé ELK pour d'autres portions de leur SI.

Titanium

Titanium est une offre optionnelle de Mulesoft qui propose diverses services et mises à disposition de davantage de ressources (place disque notamment). Elle s'accompagne de services supplémentaire d'exploitation de logs à partir du contrôle Pane, qui couvre en grande partie les besoins sus-cités.

Parmi les avantages, on compte:

Solution éditeur Clé en main
Utilise les LOG Mulesoft
Pas de coût d’installation, de configuration
Pas d’infrastructure dédiée
Pas de maintenance donc autonomie du client
Parfaitement intégrée dans la contrôle Pane
Peut être utile même si aucune fonction de support/monitoring n’a été développée
Intégration avec les logs de DEBUG/INFO

et pour les inconvénients :

Coût de l’extension Titanium
Service très « générique » non configurable
Ne répond pas au problématiques d’intervention/correction (=> Dev d’une API et d’une collection POSTMAN dédiée)
Ne sais pas utiliser les LOG des autres pièces du SI
Peut nécessiter le développement d’un API de LOG sur Mulesoft à mettre à la disposition des autres pièces du SI

C'est finalement une solution très facile d'accès pour ceux qui n'ont pas les moyens de construire/installer/maintenir une solution, c'est à dire beaucoup d'entreprise de taille petites ou moyenne dont l'informatique n'est pas le métier.

Solution "faite maison"

Il est bien sûr possible de se construire soi-même une solution à partir des outils logiciels du marché (SGBDR, Postman, ...). Cela peut paraître à priori une option à éviter, développer et maintenir soit même une solution d'infrastructure est rarement conseillé. Cependant, mon expérience montre que notre client le mieux "équipé" pour assurer un monitoring efficace l'est grâce à une solution de ce type. En limitant configuration et développement au minimum, cela peut faire sens.

Avantages:

Répond à toutes les problématiques (surveillance, investigation, intervention)
Permet l’implémentation de solutions de monitoring très adaptées et très efficaces
Ne nécessite que des compétences technique MULESOFT (et POSTMAN) « basiques »
Peut venir en complément d’une autre solution (pour la partie intervention notamment)

Inconvénients

Coût de la mise en place (tout est à faire…)
Maintenance de la solution (généralement hors de portée d’un Client)
Nécessite une infrastructure dédiée (en Cloud ou onPremise)
Instrumentation spécifique du code Mulesoft
Pas de possibilité de l’intégrer « automatiquement » aux LOG Mulesoft et des autres pièces du SI
Peut nécessiter le développement d’un API de LOG sur Mulesoft à mettre à la disposition des autres pièces du SI

C'est la solution de ceux qui ont une bonne maîtrise technique et qui préfèrent investir dans la fabrication d'un outil plutôt que de passer ensuite du temps dans l'archéologie de logs. Son adaptabilité à votre contexte en fait une solution vraiment efficace si vous savez rester sages dans l'effort de réalisation.

Une solution de monitoring du marché

Il existe des solutions de monitoring/exploitation sur le marché et qui vont de zéro à l'infini, que ce soit en terme de fonctionnalités ou de prix. La plupart des grandes entreprises en sont équipées et exigent des nouvelles solutions qu'elle viennent s'y intégrer.

En voici les avantages:

Permet une gestion très intégré des actions de monitoring et d’exploitation
Très adaptée aux problématiques des (très) grands clients
Offre un panel riche de fonctionnalités

Et les inconvénients:

Très structurant pour le SI du Client (en fait, cette solution n’est possible – et devient obligatoire – que si le client a DÉJÀ mis en place une solution de ce type.
Coût (cependant noter qu’il existe des solution Open Source)
Fonctionnalités et expérience entièrement dépendante de l’outil
Installation et configuration entièrement dépendante de l’outil (ce n'est parfois pas évident de trouver des compétences sur le marché)

En conclusion

Le type de la solution qui vous mettrez en oeuvre dépend beaucoup de votre contexte (le métier de votre entreprise est elle proche de l'informatique ? Quelle est la taille de votre entreprise ? Disposez vous déja d'outils de ce type ? etc). Généralement, le choix vous sera dictée par votre environnement. Le point majeur à retenir est qu'une solution quelle qu'elle soit est indispensable à moyen ou long terme. Si vous pouvez passer votre phase d'Hypercare en consultant des logs et en surveillant votre système avec l'attention d'un mère, vous risquez vite de vous épuiser si vous ne vous outillez pas.

_________________________________________________________________________

The purpose of this article is to explain how to administer a Mulesoft application or a set of Mulesoft applications. Although the Control Pane provided by CloudHub offers a number of configuration / consultation pages - particularly for logs - it does not provide a quick answer to questions that are very closely related to the domain being dealt with, such as: "Why has my CI234325 command not been imported into SAP?" or "Has the MVTOPT_1243.csv file been processed? Without appropriate development, answering this type of question requires investigation - and archaeology in the logs - which is rarely negligible in terms of time spent.

It is therefore unreasonable to be satisfied with the functions provided as standard by the tool. It must be accompanied by specific administration functions, which must be developed in parallel with the functions that deliver business value. Some will argue that careful use of logs can solve the problem, and they would be right... and wrong. Right, because in fact all the information needed to feed an investigation tool can be found there and wrong, because the best logs in the world still need a sophisticated tool to be exploited effectively. Logs (as we shall see later) are indeed a technical solution to our problem, but only in part.

Let's take a step back: who and what is the purpose of an investigative tool?

Who? Essentially two types of stakeholder:

Level 2 Support (i.e. the person who has to locate WHERE the problem occurred)
the operator, who must ensure that the inter-application flows function correctly.

What does this mean?

Answering the very specific questions posed above,
check that the system is working properly
issue commands to ensure that the system is working properly.

For such a tool to be truly useful and usable, it needs to be..:

It must be unique: if level 2 support has to rummage through the logs of ten different tools, it runs the risk of spending a lot of time consulting those that had nothing to do with the incident. If it only has one tool, it will know immediately where the incident took place.

It must be efficient: in other words, it must offer :

a keyword filter, generally the identifier of a functional object (the order, the product, the customer) or a technical object (the MVTOPT_1243.csv import file)
order the lines retrieved
highlight the most relevant information
alert in the event of a serious malfunction
monitor the use of certain resources or detect the presence or absence of artefacts or services
simple intervention methods to restart a flow or make a correction

We can see here that the functions required are more or less the same as those offered by a database. Note that if the search functions are more powerful, the investigation work will be even easier.

In short, an administration solution is made up of the three elements shown in the diagram below:

the Mulesoft applications that feed the system with notifications,
a database (in the general sense: a tool capable of storing information and structuring it, not necessarily a DBMS)
a tool for consulting the notifications collected, also capable of issuing commands to the Mulesoft servers

Mulesoft applications feed the database as APIs and flows are executed. Level 2 support and operations use the consultation tool to check the status of the system or identify the cause of an incident. It should be noted that such a system can be used by IS components other than Mulesoft, and that it is even very useful for this to be the case: it enables the progress of a company process to be monitored across all the tools.

What information should be kept and how should it be structured? I suggest the following principle (seen with one of our customers and which has proved particularly effective). The structure of a notification is as follows:

A session identifier (or transaction identifier)
The type and identifier of the technical/functional object concerned by the notification
The date (time stamp to the millisecond) on which the notification was issued
The source (= Mulesoft instance or third-party tool) of the notification
The current operation (bearing in mind that a session/transaction can be a succession of these operations)
A more general information area.

The structure of the notifications is reminiscent of that of logs, and indeed some logs can be used to implement notifications. We will see later in this document that several existing monitoring solutions use this type of implementation (ELK, Titanium).

The session identifier (generally identifying a business process) is the link between the various objects involved in the process (for example, the csv file and the commands described in the csv file that is created) and for this reason it is an essential element for reconstructing the history of complex transactions.

The information part can contain any type of information, but it is to our advantage if this information is structured (XML, JSON) as this allows us to build more sophisticated filters.

Here's a list of the most common implementations:

ELK

This is more or less the default solution. It involves retrieving logs from Mulesoft (and other tools) using Logstash, storing them in an Elastic Search database and then using Kibana to exploit it. These three tools are offered as Open Source.

The advantages of the solution are :

Market benchmark solution
MULESOFT's benchmark solution
Skills available on the market
Available and proven Open Source tools
Uses Mulesoft LOGs
Uses the LOGs of other IS tools
Can be useful even if no support/monitoring function has been developed
Integration with DEBUG/TRACE logs

However, it does have a few drawbacks:

ELK skills are highly technical and expensive
The maintenance of an ELK solution is out of reach for a non-technical customer
The cost of an ELK installation is not negligible
Expertise in Kibana is essential to offer a suitable service
Dedicated ELK infrastructure required (Cloud or OnPremise)
Does not address intervention/correction issues (=> Dev of an API and a dedicated POSTMAN collection)

This is generally the solution chosen by IT Departments with truly technical staff who have already installed ELK for other parts of their IS.

Titanium

Titanium is an optional offer from Mulesoft that provides various services and more resources (disk space in particular). It is accompanied by additional logging services based on Pane control, which largely covers the above-mentioned needs.

Benefits include

Turnkey editor solution
Uses Mulesoft LOGs
No installation or configuration costs
No dedicated infrastructure
No maintenance, so customer autonomy
Perfectly integrated into Pane control
Can be useful even if no support/monitoring function has been developed
Integration with DEBUG/INFO logs

and for the disadvantages :

Cost of the Titanium extension
Very generic, non-configurable service
Does not address intervention/correction issues (=> Dev of an API and a dedicated POSTMAN collection)
Does not know how to use LOGs from other parts of the IS
May require the development of a LOG API on Mulesoft to be made available to other parts of the IS

In the end, it's a solution that's very easy to access for those who don't have the resources to build/install/maintain a solution, i.e. many small and medium-sized businesses whose core business isn't IT.

Home-made solutions

It is of course possible to build your own solution using off-the-shelf software tools (RDBMS, Postman, etc.). On the face of it, this may seem like an option to be avoided, as developing and maintaining an infrastructure solution yourself is rarely advisable. However, my experience shows that our customer who is best 'equipped' to ensure effective monitoring is thanks to a solution of this type. By keeping configuration and development to a minimum, this can make sense.

The benefits:

Responds to all issues (monitoring, investigation, intervention)
Allows the implementation of highly adapted and effective monitoring solutions
Requires only "basic" MULESOFT (and POSTMAN) technical skills
Can be used in conjunction with another solution (particularly for intervention)

Disadvantages

Cost of implementation (everything has to be done...)
Maintenance of the solution (generally out of reach for a customer)
Requires a dedicated infrastructure (Cloud or onPremise)
Specific instrumentation for Mulesoft code
Cannot be "automatically" integrated with Mulesoft LOGs and other parts of the IS
May require the development of a LOG API on Mulesoft to be made available to other parts of the IS

This is the solution for those with a good technical command who prefer to invest in the manufacture of a tool rather than spend time on the archaeology of logs. Its adaptability to your context makes it a truly effective solution if you know how to be wise in the effort involved.

A monitoring solution on the market

There are monitoring/operating solutions on the market, ranging from zero to infinity, both in terms of functionality and price. Most large companies are equipped with them and require new solutions to integrate with them.

Here are the advantages:

Allows highly integrated management of monitoring and operational actions
Highly adapted to the needs of (very) large customers
Offers a rich range of functions

And the disadvantages:

Very structuring for the customer's IS (in fact, this solution is only possible - and becomes compulsory - if the customer has ALREADY implemented a solution of this type.
Cost (although it should be noted that there are Open Source solutions)
Functionality and experience entirely dependent on the tool
Installation and configuration entirely dependent on the tool (it is sometimes difficult to find skills on the market)

In conclusion

The type of solution you implement will depend very much on your context (is your company's business IT-related? How big is your company? Do you already have tools of this type? etc). Generally, the choice will be dictated by your environment. The main point to remember is that a solution of any kind is essential in the medium or long term. If you can get through your Hypercare phase by consulting logs and monitoring your system with the attention of a mother, you run the risk of running out of steam if you don't get the tools you need.

Data W(e)ave On The Beach

jeudi 11 juillet 2024

L'administration d'une solution Mulesoft / Administering a Mulesoft solution

Pourquoi ce blog ? / Why this blog?

Signaler un abus