I was hoping to make the Stan documentation available on https://devdocs.io/. This website is an open-source tool for developers to search popular code documentation through a single interface. It makes documentation available by scraping open-source documentations available as HTML or other markup formats and transforming them into a uniform searchable database.
Unfortunately, the license for the Stan documentation is CC-BY-ND 4.0 docs/LICENSE at master · stan-dev/docs · GitHub . The “ND” part of that license (no derivatives) to me pretty strongly suggests that the mc-stan documentation is not compatible with ingestion into devdocs.io, but I’m no lawyer. Another oddity: The PDF version of the documentation is already available as CC-BY 4.0. So I suppose I could scrape the text in the PDF, but not the underlying Rmarkdown or HTML files?
What was the purpose of using the no-derivatives license for the Stan documentation? Could the license be changed to CC-BY 4.0 instead? Why does the PDF have a different license than the Rmarkdown that was used to create the HTML and PDF files? Can I scrape text from the HTML files as long as the text also exists in the PDF? Could I run the process of transforming the Markdown into a PDF myself and use the intermediate HTML/Latex or other files instead of the final PDF for ingestion into devdocs.io?
The @SGB needs to weigh in here.
@Bob_Carpenter - do you remember why the “ND”?
@winni2k - devdocs looks great, but the Stan docs are versioned -
how would this work on devdocs?
It supports versioning, tho I don’t know how it’s deployed.
Not a lawyer as well, but what part of the ND license is the issue? Just reading from the link below it seems like changing the format does not create a derivative work?
Interesting point. I am just starting to get familiar with how devdocs ingests documentation, so I cannot yet speak to how much transformation actually occurs beyond reformatting. In the mean-time, here is the relevant excerpt from the full License:
Section 1 – Definitions.
Adapted Material means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image.
What if the metadata of the doc HTML files are rewritten or sections are merged/split for better presentation in devdocs. Is the material then being “arranged … in a manner requiring permission under the Copyright…”?
Now we need to talk to lawyers to figure it out. Sounds a bit like the license issue with Kallisto: I was wrong (part 2) | Bits of DNA
My experience has been that each time a new doc version is released, that version is scraped and added to the menu of available docs. It is up to the user to select which version(s) of the code they want to include in their personal library.