Single Package vs Multiple (-lib -doc -test) & Convenience

Convenience Authoring vs Using Packages

When I author a simple package with only a little bit of functionality, I really like the convenience of creating a single package and putting everything inside it.

When I want to use a simple package I prefer if it is split up into (-lib -doc -test) because that allows me to depend only on the -lib part of the package and thus avoid slow installation because for example the docs are unnecessarily built for that dependency.

These opposing "forces of convenience" irk me about the package system.
It seems silly to have 3+ packages for (-lib -doc -test etc.) +1 that combines those into the suffix-less version.

If I put on my devils advocate hat I would say:
Looking at the package site and package system there is a missing concept, instead of having one standardized way to define and split up packages into nicely usable "chunks", there are different user dependent adhoc ways to work-around that missing piece.

I am aware that it is possible to put those packages in one git repository but it still seems like a lot of work to create and register 4 packages, when you would expect to create 1 package.

Maybe this missing concept is having package variants?

Suppose a package could have N variants.
If you don't specify variants you want to use with a dependency then all are used.
If you specify them only the specified ones are used.
Maybe there could even be a syntax to say use all except those 2.

When you register a package in "variant-format", the old style multiple packages are implicitly created, for all the variants.


Brainstorming

All of this isn't very thought through my aim is to get the conversation about this going.
I am unsure about what is really needed to improve the situation, maybe the package system can mostly stay like it is and we need a newer package site?

For example if the site allowed to have all the "variants" defined implicitly.
I also would prefer if all those -lib -doc -test etc. would be grouped together.
Another thing is that the package website doesn't understand that -lib or -test aren't supposed to have documentation and displays "This package needs documentation" for those.

Overall I would like to have a discussion about things that could be improved and also concrete ideas how we could go about implementing something. I also propose (And if this is wrong I ask anyone to point it out) that it would probably be welcomed if we were to create some new system or package website that can do something about these things.

I think at first we should have some general discussion to have some common ground about what the problems and possible solutions are (also to allow people to point out previous related discussions). Then we could try to come up with possible prototypes/experiments.

10 Likes

When I want to use a simple package I prefer if it is split up into (-lib -doc -test) because that allows me to depend only on the -lib part of the package and thus avoid slow installation because for example the docs are unnecessarily built for that dependency.

I think, that it is a bad approach. It is, like, putting comments into another file to make smaller code file.
Perhaps, raco pkg should have mode to get only code, without docs and tests, or vice versa, but, in any case, they should be inside one package.

We will spare pair megabytes of traffic and disk space by the cost of having hard time to get documentation.

2 Likes

I think it would be great if you could create a package that has everything in it.
Then somehow from that different "artifacts" are auto generated so that you can either download everything or only the implementation, or also tests etc..

I agree, I would prefer a solution where these manual splits of packages aren't needed.
"chunking"/variant handling should happen internally, ideally you would not really care about what happens internally most of the time.

Do I understand you correctly that you don't like these explicit -lib -test packages and would prefer if it was just one package that declares what is what? (And the details of how to install it or depend on it, with or without documentation/tests etc. should be handled automatically?)

Just as a side note: raco pkg install already has

  -D, --no-docs
     Do not compile .scrbl files and do not build documentation

But I believe this still downloads the documentation.

1 Like

IIUC, binary package is an automated solution to this problem, but I don’t really see anyone using it…

1 Like

Do I understand you correctly ...

Yes. raco pkg already has not only --no-docs, but --binary and --binary-lib. Maybe there should be more options.

don't like these explicit -lib -test packages

Additional packages may have manuals or complex test suites (these test suits, of course, should have own documentation). Essentional documentation (like javadoc) and unit tests must be always with the code.

But I believe this still downloads the documentation.

Because it downloads whole archive of the git catalog. It could use git protocol instead, but, I think, in most cases, download size would increase.

Wouldn't it also be necessary to be able to depend on a package with e.g. --no-docs or --binary?
If I depend on a package maybe I use its documentation to generate a picture of that documentation and include it in my output, if I don't do that I should have a way of saying my package only depends on the code of the package not the various other stuff associated with it.

Said another way: I think command line flags for raco aren't enough, we also need more fine grained dependency declarations?

Wouldn't it also be necessary to be able to depend on a package with e.g. --no-docs or --binary ?

I think, library user should have an option to do it. Because, either one want all docs for all used packages or docs only for top-level packages. So the choice is upon him/her.

we also need more fine grained dependency declarations?

I would like to have upper bound for version and something like Suggests/Recommends/Enhances from dpkg. But it's OK without them.

My $0.02 on the status quo: I think it's sufficient to split project xyz into two packages, one xyz-lib with minimal dependencies and one xyz that includes tests and docs. I don't find it overly burdensome to deal with two directories. (To clarify: I think it would be great if we had a better solution, but until it arrives, that's what I recommend.)

I vehemently dislike mixing documentation and code in the same file, and I mildly dislike mixing tests and code; when I use test submodules, they're usually at the end of the file or at least the major section. I personally think the benefits of mixing them together are small, and I think they are overwhelmed by two downsides. First, I think that tying the organization of the documentation to the organization of the implementation is a bad idea; they aren't the same, and doing so is likely to harm one or the other. Second, I think that people who have to fit the docs and/or tests in with the implementation are likely to skimp on them. One outcome that I've seen is documentation that dutifully covers every individual function but fails to describe how they fit together. I understand that other people feel differently, of course.

8 Likes

I am curious here - is your dislike of mixing documentation and code in the same file based just on experience that you describe (covering individual procedures/forms but not documenting the package at all) or is there some other reasoning as well?

Because what I really like about scribble/srcdoc is the enforcement of inter-module contracts on programmer. Basically if you write (provide (proc-doc ...)), the contract is there, it has to be valid and it forces the programmer to write at least some description of given procedure.

My point is that srcdoc nicely complements separate scribblings which I find better suited for prose-like part of the documentation. Although I wholeheartedly agree that using in-source documentation as a replacement is problematic at best. The only counter-argument to that would be McFly[1]. Having include-extracted extended to include some kinds of non-form-related prose would be nice...

For tests I would probably go with similar argument - certain tests fall nicely into the same module (to ensure the module procedures work as designed), but others definitely don't.

[1] McFly Runtime: Embedded Package Documentation for Racket

2 Likes

I think there can be a useful distinction between "reference" and "guide" documentation (and in fact Racket documentation is a good example). Pretty clearly the latter wants to be in dedicated files, not mixed.

There's a similar distinction between "unit" and "integration" tests (although I might be misusing those terms).

In both cases where the "specificity" is high -- the reference docs and the unit tests -- I think there's potentially more value to mixing everything in the same file as the implementation. Most likely you will want to (or want to be reminded to) add/change all three (code, docs, tests) at once.

There's definitely a risk of information overload (although that can be mitigated by tools knowing how to selectively hide facets, and/or definer macros that can DRY parts of all three aspects).

Another downside is that the test/doc submodules will come along for the ride in a -lib package (defeating the purpose), given how packages/modules/compilation works today.

4 Likes

I have a hard time imagining a situation where you truly want to install a package and not install its documentation. Could someone provide specific examples where they find that valuable or wish they had it?

Even the .github/workflows file from raco pkg new has raco pkg install --no-docs, which is one case I can see that being relevant, but here explicitly requesting it makes sense to me. I want the default to be to include documentation because I use it. This is also true with private code—my coworkers can't rely on http://docs.racket-lang.org to have our internal documentation.

2 Likes

Two situations come to mind:

  1. Deployment on a web-server
  2. Machines with little resources (Raspberry Pi and the like)
4 Likes

Yes, my dislike started as the result of reading some especially bad javadoc- (and later oxygen-, I think) generated "documentation". But I do think the abstract point is valid too, about tying the structure of one to the structure of the other being bad.

I do think that the approach you describe (inline docs plus external prose) fixes some problems, but it scatters the sources of the documentation, and I prefer keeping that as one cohesive whole. The topology of text files doesn't permit everything related to be close together.

So I agree that there are pros and cons on both sides. It's a partial order that requires personal value judgments to map to a total order to make a decision :slight_smile:

2 Likes

Another downside is that the test/doc submodules will come along for the ride in a -lib package (defeating the purpose), given how packages/modules/compilation works today.

For tests, that would usually just add a dependency on rackunit-lib, which I'm willing to accept. For inline documentation, I think (but haven't tested), that you could get away with just a dependency on scribble-lib (also acceptable, I think), as long as the main Scribble document is in the other package. For example, suppose that project xyz uses project abc. Then xyz-lib just needs scribble-lib and abc-lib to express its documentation, but xyz needs the whole abc (or abc-doc) and racket-doc and all that to build and link the documentation.

So I think separate -lib packages are mostly compatible with inline tests and docs.

I have a hard time imagining a situation where you truly want to install a package and not install its documentation. Could someone provide specific examples where they find that valuable or wish they had it?

I seldom want the documentation of transitive dependencies. If I'm using some package, and that package decided to use some embedding of COBOL (to be hyperbolic) in its implementation (but not its interface), then I do not want to install the documentation for the COBOL embedding.

5 Likes

For what is reasonable with the status-quo existing system, I agree.

For what I would like it to be like:
I would want it to be, so me writing a single package with inline tests and maybe inline docs.
Someone else could use it as if I had put them in separate -lib, -doc, -test.
(But those may not be generated as separate status-quo packages and you may depend on them with a different syntax to make it work)
With -lib not being polluted by -doc or -test requirements.
So no scribble-lib or rackunit-lib in -lib unless I actually use them there too, e.g. I am writing an extension to scribble or rackunit.
For "Imagining a better future" think: 2 explicit packages seem like 1 too many.

Especially because that line of what is acceptable to put in which of those 2 packages, is probably very dependent upon who ends up using it.

So I think that choice of where to draw the line of separating pieces, shouldn't be drawn to begin with, at least not manually by a person taking an arbitrary (possibly wrong) choice.

I know some people apparently do want this for those reasons, but it hasn't completely made sense to me. For deployment, if you make an executable with raco exe/raco dist, IIUC the unneeded modules will be stripped away anyway. For a low-resource machine, if you install a built package, you shouldn't need to render the documentation locally, so the only cost is a few files on disk.

There probably are some cases when explicitly split packages make sense, but my intuition is that they are rare scenarios relative to the pervasiveness of using -lib/-doc/-test splits.

(On the other hand, I was wrangling with NPM a bit recently, and I would not like to have so muddled a notion as that of which dependencies are needed for which phase.)

One problem is that binary packages are only built for the current release.

3 Likes

Echoing what @soegaard said there are environments where you don't need the docs.

As things are currently as soon as you have a dependency on scribble most (if not all) of DrRacket gets installed. This is not so bad on your workstation where you probably already have DrRacket but is not great in a server environment.

For distribution you can (and I do) use raco exe and raco dist, but it does feel a bit like a workaround.

2 Likes

Coincidentally, I've been working on a blog post to document the lib/test/doc practice. Although it is primarily intended as a how-to, it does bring up some aspects that may contribute to the meta-discussion here. I'll get around to posting it soon (and fwiw in the post I take a relatively rosy position on it, based on my recent overall-positive experience in transitioning Qi to it), but to summarize in this context: One big plus I see for lib/test/doc is that upstream failures in non-essential packages won't sabotage your core library. For example, during the recent "memoize" outage, only Qi's docs became unavailable (for several days), while the core library was unaffected during this time.

Another positive for lib/test/doc is that Racket packages include all files at the package path. So you can't keep development scripts and other utilities in the package path without it being exposed to the entire Racket ecosystem. This initially seemed like a drawback to me, coming from the python world where you can provide your library in a single package and provide excludes at the config level via e.g. setup.py and MANIFEST.in. But I now feel that this is an advantage of Racket's way here, since it defers to an existing system for file inclusion and exclusion, viz. a filesystem, instead of inventing a new way. All you need to do is put files there or not. The configuration now shifts to simply pointing the package index to the right filesystem path, rather than pointing it to a path that includes additional ecosystem-specific configuration, and you can put your development scripts somewhere at the top level in your repo (I am now of the opinion that the top level in a repo should never be used as the package path, for this reason). In this respect, lib/test/doc could be seen as the logical conclusion of deferring to a filesystem to represent the package manifest.

The lib/test/doc thing can be tricky to set up though, and it would be great if this were improved. For one thing, the process isn't really documented (which, my upcoming blog post is intended to help here). Other things I am unsure of: do tests really need to be exposed in the collection namespace? It's unusual -- but I don't know if this is a good thing or a bad thing, or just a thing that's neither good nor bad. Re: docs I appreciate that Scribble is just another #lang and so its modules reside in a collection just like any other #lang such as Racket. Yet, there is already a higher-level link made somewhere between the source package and the documentation package which causes docs to be built and for these docs to be associated with the source package as "its documentation" -- so Scribble is already treated as special in some way. And the package index also runs tests and reports on their results for the corresponding source package, so tests are also already treated as slightly special. I don't know how these things happen and there clearly are some holes here since we see the "This package needs documentation" warnings when employing lib/test/doc. I agree that this part could be less opaque and ideally more explicitly modeled (as some have suggested above) so that it is both more convenient for users to form such associations -- possibly abstracting some parts of the lib/test/doc footwork -- as well as more explicitly understood by the package system itself, so that there are no false warnings.