Jump to content

Wikipedia:Reference desk/Archives/Computing/2024 June 2

From Wikipedia, the free encyclopedia
Computing desk
< June 1 << May | June | Jul >> Current desk >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


June 2

[edit]

Fingerprint, Identification key, Recipe, Answer file, INI file, Bibliographic Record

[edit]

A couple of examples for context:

  1. ccache "caches compilations so that ... the same compilation can be avoided and the results can be taken from the cache... by hashing different kinds of information that should be unique for the compilation and then using the hash sum to identify the cached output."
  2. Saving a copy of an online webpage from within a Web Browser (File > Save Page As...)


What is the word for a set of parameters which attribute to an instance/snapshot the information required for it's reproduction?
In the examples above, ccache utilises "different kinds of information that should be unique for the compilation", similarly if I save a webpage from within a Web Browser, the only way for someone to be guaranteed to independently replicate the same file would be for the same URI to be accessed by the same version of the Web Browser with the same configuration (e.g. javascript enabled/disabled, identical installation+configuration of extensions which affect page retrieval/rendering) on the same Operating System with the same configuration.

In the case of ccache, the compiler version and flags are some factors of the "information that should be unique for the compilation", and during recompilation inputting the same selection of information results in an identical hash and therefore a cache match.
But what is the word to describe the information being input?
I'm not looking for a generic word like "metadata".

Some words I thought of which seemed to be candidate answers were:

  • Fingerprint (computing) - However fingerprint refers to an algorithmic output (e.g. ccache 'hash') whereas I am wanting to refer to the inputs, which the article simply defines as "a procedure that maps an arbitrarily large data item (such as a computer file) to a much shorter bit string, its fingerprint".
  • Key (cryptography) - This seemed very close, except that in the case of cryptography it is described as "a piece of information" whereas I am looking for a word to refer to a "set of information".
  • Identification key - "aids the identification of biological entities", rather than describing the parameters of the entities creation.
  • INI file - "a text-based content with a structure and syntax comprising key–value pairs for properties, and sections that organize the properties", so what would be the name of the section?
  • Answer file - Contains the data that is essentially what I am describing, except that an answer file is context-specific to computer program installation.
  • Recipe - Are configurations equivalent to 'ingredients'? I would have thought a recipe would include much more detail that just application version numbers and parameters.
  • Bibliographic record - This seems the most relevant as a name for the set of reproduction parameters, except that it is context-specific to library science.
  • Exif - Again, very similar, but the set of parameters is just referred to as EXIF metadata or tags.
  • User Agent/Generator - This is part of the information which would be included in the set.
  • Finite-state machine/Combinational logic - Wouldn't this be referring to the method/logic, rather than the input parameters?
  • Artifact - This refers to the File, rather than the attributes which contain the information required for the File's reproduction.
  • Snapshot (computer storage) - Again the File, rather than the attributes.

Mattmill30 (talk) 16:16, 2 June 2024 (UTC)[reply]

Would the correct generic word for "a set of parameters which attribute to an instance/snapshot" be the 'profile', which is then qualified with the context-specific word 'generator'?
Therefore making the "set of parameters which attribute to an instance/snapshot the information required for it's reproduction" the 'generator profile'?
If so, what would be the "different kinds of information that should be unique for the compilation" used by ccache? the 'compilation profile'?
So then the ccache article would be appropriately updated with "the next time compilation of the same project using the same compiler with the same compilation profile is attempted, the same compilation can be avoided and the results can be taken from the cache. Mattmill30 (talk) 17:27, 2 June 2024 (UTC)[reply]
The examples you give don't (as far as I understand the issue) help to reproduce an item. Is Unique identifier what you mean? It is a generic term; depending on the use, various types of unique identifiers have more specific names, such as the International mobile subscriber identity and International Standard Book Number.  --Lambiam 17:41, 2 June 2024 (UTC)[reply]
In the case of saving a copy of the same Webpage from multiple Web Browsers, a Unique identifier would be necessary in distinguishing between the multiple copies.
e.g. you could append the name of the Web Browser to the filename, or in the case of multiple copies of a webpage from different versions of the same Web Browser then using the Installation GUID, etc.
However, that wouldn't provide information specific enough to facilitate reproduction, or enable identification of other copies/instances of a particular resource which was generated using an identical system configuration.
Did my earlier response to myself provide clarity to my question?
If my question is still unclear, I can construct an example "solution" which may provide clarity Mattmill30 (talk) 18:19, 2 June 2024 (UTC)[reply]
Wikipedia pages have a revision ID. That of the version of this page, after you posted your question, is 1226936970 Can I use it to reconstruct a screenshot of what you saw? No. I don't know if you used a laptop or a smartphone. Suppose I know you used a MacBook. Which type of many types? Which size of screen? Produced in which year? (This makes a differences for some types.) Which release of macOS were you using, and which version of that release? Likewise, not only which browser, but also which version? Did your browser have customizations? Was the window full-screen? If not, what were its sizes? How far up or down was the page scrolled, and at which zoom level was it being viewed? Did you watch in dark mode? Knowing all this may still not be enough for a faithful reconstruction of what you saw. Only a screenshot will do.  --Lambiam 06:08, 3 June 2024 (UTC)[reply]
One distinction I do want to make is that in my examples I didn't go as far as "reconstructing a screenshot", though the word I am trying to obtain should be capable of also labeling the set of attributes which would be required in reconstructing a screenshot.

For example, if a Web Browser saves the Webpage of a URI to a file, and then that file is reopened in the same Web Browser, with the same configuration, and a screenshot is taken, then the set of metadata attributed to the screenshot file would include:

?label profile? = ('HTTP response','Web Browser "save as" filename+parameters {User-Agent+about:plugins+env+profile_config_diff}','Web Browser+Operating System "screenprint" filename+parameters [e.g. screen resolution, window size, etc]')
With the inputs+metadata for each step in the processing sequence, it would be possible to faithfully reconstruct a screenshot.

However, my question isn't specifically about reconstruction, it's about reproducibility using "a set of parameters which attribute to an instance/snapshot the information required for it's reproduction".
So the assumption is that the file is available, and I am asking what the correct label would be for a complete set of metadata that would enable reproduction (essentially a proof? - I'm not a mathematician). Mattmill30 (talk) 12:21, 3 June 2024 (UTC)[reply]
Can you clarify the difference between reconstructibility and reproducibility? In which aspects is a reproduced item allowed to differ from the original?  --Lambiam 15:35, 3 June 2024 (UTC)[reply]
Reproduction vs Reconstruction
Reproduction Reconstruction
A copy of something, as in a piece of art; a duplicate A thing that has been reconstructed or restored to an earlier state
(computing) A method for reproducing a bug or problem A result of an attempt to understand in detail how a certain result or event occurred
In reproduction the existence of the original production isn't necessary if a method for reproduction is known. Whereas in reconstruction, the original thing must exist in order for it to be reconstructed.

For example, let's say I and many others have archives of Webpages produced from a variety of time periods and Web Browser versions, but the file contents of my archive becomes corrupted.
I could "reproduce" my archive from the archives of others if they held copies of the same webpages with same "Last-Modified" or "ETag" HTTP headers, saved from the same Web Browser version, which completely satisfies my set of reproduction metadata; but I could only "reconstruct" my archive if I had taken a backup of my archive in advance.
Mattmill30 (talk) 18:33, 3 June 2024‎ (UTC)[reply]
Not everyone makes the distinction you make. For example, one of the senses Merriam–Webster gives for reconstruct is "to re-create or reimagine (something from the past)", offering this example: "reconstructing a lost civilization".[1] A civilization that is lost has ceased to exist.
The web pages of many websites do not have enough meta information in their URI + embedded in the file itself to enable unique identification of an archived version. If they do, the combined meta information forms a unique identifier. Without knowing the operational procedures of the people putting content on these pages and assuming they adhere to them, it is not possible to be certain of the uniqueness, though. They might for example fix an obvious spelling mistake while not changing the meta information.  --Lambiam 21:24, 3 June 2024 (UTC)[reply]
I agree that combined meta information potentially forms a unique identifier. But "I'm not looking for a generic word like 'metadata'", and as you have acknowledged "[Unique identifier] is a generic term; depending on the use, various types of unique identifiers have more specific names".
I am looking for a specific term.

I've given the two examples of a fully-automated cache and a semi-automated saving of a downloaded webpage, which have varied inherent terminology.
For example, the set of attributes which ccache utilises are called options, arguments, information and mode, because of it's command-line context.
ccache hashes "different kinds of information that should be unique for the compilation" and if the cache doesn't already hold a file named with the uniquely identifying BLAKE3 hash, then it completes the compilation and the output file is named "using the hash sum to identify the cached output"[2].
In this example the term 'unique identifier' is already used and more appropriately applies to the hash rather than the ccache-input-* information[3]. If the information were then attributed to the cached output file, the files now have a set of attributes containing production metadata (I previously considered a context-specific label of "compilation profile"), which would vary from the webpage archive example because, in addition to facilitating identification of other copies/instances of a particular resource which was generated using an identical system configuration, the production metadata would be of debug-quality and so also enable verification and validation.

I would expect the term for referring to the varied set of attributes to either be an umbrella term or have context-specific variability, in order to accommodate the nuances of different production procedures.

I'm unsure whether 'profile' would be an appropriate term, given it's definition: A summary or collection of information. Unless it was perhaps used as the umbrella term for all the various sets of context-specific attributes, referred to as the production profile. Mattmill30 (talk) 16:57, 4 June 2024 (UTC)[reply]
Given the focus of my question is metadata to facilitate reproduction, which is broadly covered by the fields of Science and Business, I've realised that the stages of an assembly line are co-ordinate to steps within a job stream, which appear to be forms of Workflow.

Since Workflow is a management term, it seems my original question imports the concepts of sequential sets of metadata, which I recognised in mentioning 'INI file' as one of my candidate answers, and hierarchy, which is inherent (URI) to the HTTP cookie variant of Magic cookies.
My reservation with the use of the word cookie, is that magic cookies are "used to identify a particular event or transaction; the data is typically not meaningful to the recipient program and not usually interpreted until the recipient passes the data back to the sender or another program at a later time"[4], whereas metadata to facilitate reproduction would have to record a particular event or transaction.
Therefore, although I recognise file metadata would likely be attributed by way of either Property Handlers/Alternate Data Streams or similar technologies, I think the universality of the INI format makes it suitable at this time to explore a solution to the concept of production workflow metadata, in order to realise the various elements.

Therefore, would the following definitions satisfy as terms for a "set of parameters which attribute to an instance/snapshot the information required for it's reproduction"?
Section = Profile (i.e. Production)
Key = Event (timestamped production job/event identifier [e.g. Firefox 125.0.3-1, 'Save Page As...'])
Value = Attributes presented in name/value pairs from the production event parameters (e.g. HTTP-response= ; User-Agent= & about:plugins= & env= & profile_config_diff= ;)

Mattmill30 (talk) 17:05, 4 June 2024 (UTC)[reply]
Do you want to coin a term, or are you looking for an existing term of art that covers whatever it is you are trying to describe? Inasmuch as I get what it is (hardly, I must confess), it seems a pretty generic concept, so a term covering it should be expected to have a pretty generic coverage. I mean, do we have a name for, "an information record that describes something"? Yes, it is a descriptor, which is a generic term because it is a term for a generic concept. It is also not too clear to me what the issue has to do with computing; it seems perhaps more related to information science. A few well-chosen realistic use cases might (perhaps) clarify the issue. The reproduction of archived web pages seems a less realistic use case, for the reasons I have given.  --Lambiam 18:57, 4 June 2024 (UTC)[reply]
Ideally, I am hoping to find an existing term, and not necessarily a term of art. But in the case that neither a general term can be contextualised, or a term of art doesn't exist, such that a new one must be coined, then I am trying to define the scope that the term should encompass.
I agree that my question is multi-disciplinary, but I have raised it in computing because I am looking for an answer which applies to computer processes and files, rather than, for example, bibliographic records in library science. It seems to be within the vein of Version control

The concept has general and specific elements. Similar to how MusicXML has a specific utility (Western musical notation) written in a general markup language (XML).
My question is both general, in that I am discussing standard File System and Operating System features, metadata constructs and a framework for measuring completeness of the metadata for enabling reproducibility, and specific, in that the context is specifically for the recording of production processes and parameters which contributed to be current state of a file/resource.

I will attempt to flesh out a more comprehensive solution than my INI example, and some realistic use cases. But I am still only asking for words which appropriately label the elements/fields for storing and identifying production metadata, within the Computing vocabulary or domain of discourse. Mattmill30 (talk) 06:18, 5 June 2024 (UTC)[reply]
I'm thinking environment parameters. (I'm also thinking about epigenetics, but that's the wrong domain, and I may have failed to understand your concept anyway. Still, epi- is a nice prefix.)  Card Zero  (talk) 10:17, 5 June 2024 (UTC)[reply]
Google Gemini suggested 'identical execution environment'. You can abbreviate it to iEE. manya (talk) 07:17, 7 June 2024 (UTC)[reply]
'identical execution environment' would be the state of the environment present during a particular execution of a program or thread (rather than the output), which is a subset of the information needed to record the process and environment which resulted in the production of a particular output.

Essentially, my question is:
What would be the appropriate label for the set of information which would enable the recreation of the environment and processes for the reproduction of the same output from the same input.

I think the word for "environment and processes which produce the same output from the same input" would be System.

So therefore, what would be the label for reproduction information?
Mattmill30 (talk) 12:00, 16 June 2024 (UTC)[reply]