Skip to content

Beyond Dataverse 5.0: make it pluggable (on a code level) #7050

@poikilotherm

Description

@poikilotherm

tl;dr: we should discuss how to make Dataverse more pluggable and open to third-party extensions.

While digging through the causes of why our WAR file is so big (will be ~135MB for Dataverse 5.0, was ~205MB for 4.20), I stumbled over a lots of things that could use a hand. This is mostly about refactoring old code, libraries and dependencies. Great - I like that ⛏️ . (See an older list to be updated at #5360)

But what is likely not to be shrinkable: Apache Tika. It's used for the great full text indexing component, provided by @qqmyers (thank you! 👍 ). But: that's a 45MB increase in WAR size for a feature which is completely optional. I have no figures how many installations actually enabled and use it. For "us" (Jülich DATA) it makes images bigger and adds to deployment time 😐 .

In many cases, new and great features are accepted by IQSS and merged (yeah! 🎉 ), also they might not be using it for Harvard. But is this a good approach? Maintenance effort is put onto people that focus on other features. Testing is necessary and I'm not the only one around that has some grey hair from the status quo 🤕 (looking at you @4tikhonov, @skasberger, @kcondon, @donsizemore, @pdurbin and others).

Back in the day of #4106 people started to fork just to add support for some functionality that was rejected (for good) by IQSS. But it's a pity you're forced into forking, which is always a big tradeoff. ⚖️

Now we have great new working groups! ❤️ Metadata! ❤️ More of that! Yes please! 👍 And it's likely that there's more ingest stuff coming down the road. Lot's of new features, which don't fit under "External Tool" or "Integration" like the shiny previewers, but code that needs access on a "Java-level".

Those "extensions" or "plugins" should be easy to install for admins (no compiling and fiddling with Maven like DSpace) and developed independently from IQSS, offloading maintenance, testing and development. Ideally people would start to share their plugins. Or even start selling them. Lots of new options.

There a few ways to do this 🧠 . One is using small frameworks like https://pf4j.org. There are more and others.
I am eager for more input. What do y'all think? Especially: what does our beloved architect @scolapasta think about this? 🙇

This should start small and where we all see fit. It can grow as we go, trying to find the sweet spot of best community support vs. refactoring burden. Is there a chance to find some funding to enable this next generation repository technology? (Yeah, I know this is nothing new from a technical perspective, but it seems to be from a community perspective.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: Suggestionan ideaUser Role: SysadminInstalls, upgrades, and configures the system, connects via ssh

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions