Talk:PyTorch

This article was nominated for deletion on 11 December 2017. The result of the discussion was keep.

Computing: Software / Free and open-source software Mid‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
Mid	This article has been rated as Mid-importance on the project's importance scale.
	This article is supported by WikiProject Software.
	This article is supported by Free and open-source software (assessed as Low-importance).

Artificial Intelligence

This article is within the scope of WikiProject Artificial Intelligence, a collaborative effort to improve the coverage of Artificial intelligence on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Artificial IntelligenceWikipedia:WikiProject Artificial IntelligenceTemplate:WikiProject Artificial IntelligenceArtificial Intelligence articles

Removing examples and documentation

Is it necessary to have example code? This page contains some things which are more appropriate for the documentation. In general if people want to know this stuff they can read the Pytorch documentation, Wikipedia isn't really meant to be a hosting site for sample scripts — Preceding unsigned comment added by Gadget142 (talk • contribs) 00:16, 1 August 2023 (UTC)[reply]

Update release

The latest version is 1.7.1, while the article says 1.7.0. I want to update it, but it looks like this is using some template referring to wikidata. Where can I edit the original data behind this template? Thanks! Paritalo (talk) 11:43, 9 February 2021 (UTC)[reply]

Not sure if "See also" should feature Tensor

The mathematical and physical definition of tensor (with the latter being slightly different because it usually means tensor field) is not the same as the one used in Deep Learning. I think the link is both needlessly complicated and misleading, and therefore should be removed. --Svennik (talk) 10:30, 5 January 2022 (UTC)[reply]

Done. --Svennik (talk) 10:33, 5 January 2022 (UTC)[reply]

Confirm BSD License

I've been trying to figure out the license of PyTorch, and can see it's listed as BSD in this article. I am unable to find any other source that confirms this. Is there a citation for this information? 134.7.210.134 (talk) 07:56, 14 July 2022 (UTC)[reply]

is it just me, or is this output wrong/inconsistent somehow?

from the current version of the page...

> a = torch.randn(2, 3, device=device, dtype=dtype) > print(a) # Output of tensor A > # Output: tensor([[-1.1884, 0.8498, -1.7129], > # [-0.8816, 0.1944, 0.5847]])

[...]

> print(a.max()) # Output of the maximum value in tensor A

> # Output: tensor(-1.7129)

I think the above outputs were collected from separate runs wherein the randomized results were different. IMO this should be made internally consistent. Abe149 (talk) 03:38, 26 July 2023 (UTC)[reply]

tensors are closely related to the concept in linear algebra

On this page, there is currently the following text:

> Note that the term "tensor" here does not carry the same meaning as tensor in mathematics or physics. The meaning of the word in machine learning is only tangentially related to its original meaning as a certain kind of object in linear algebra.

I am a novice in this area, but I think the phrase "only tangentially related" is misleading.

In linear algebra, the tensor product of vector spaces V_1, ..., V_n is a new vector space V_1 ⊗ ... ⊗ V_n, and a vector in the latter is often referred to as a tensor. If each vector space V_i is equipped with a basis {b_i^j}_{1 ≤ j ≤ d_i} (where d_i = dim(V_i)), then the tensor product obtains a natural basis given by the tensors of the form b_1^{j_1} ⊗ ... ⊗ b_n^{j_n} (where each j_m varies between 1 and d_m). In particular, through this basis, specifying a tensor (i.e. an element of the tensor product) becomes equivalent to specifying coefficients for these basis vectors (of which there are d_1 * ... * d_n = dim(V_1 ⊗ ... ⊗ V_n) many in total).

Meanwhile, in PyTorch, one has "tensors" that consist of multidimensional arrays of (usually) floats, i.e. real numbers. The dimensionality of a tensor is specified by a tuple of positive integers. And I believe that a (d_1,...,d_n)-dimensional tensor in this sense is literally just an element of the tensor product R^{d_1} ⊗ ... ⊗ R^{d_n} (where R denotes the field of real numbers).

Assuming I'm correct about this, I'd propose that the above-quoted text should be updated accordingly. Amazelgee (talk) 18:41, 9 November 2023 (UTC)[reply]

Tensors in math and physics have multilinearity, while tensors in PyTorch are basically just higher-dimensional arrays without any extra properties.

I believe the difference is big enough, and suggest not to change the existing text. @Amazelgee Digital27 (talk) 01:39, 10 November 2023 (UTC)[reply]

@Digital27 Thanks for weighing in! I'm not totally following though. I'll try to explain; I'm happy for any further explanation or pointers.

A tensor product V_1 ⊗ ... ⊗ V_n receives a multilinear map from the product V_1 × ... × V_n -- indeed, it's the universal such vector space. But its _individual elements_ don't have any special sort of "multilinearity" property, and I believe that's what's under discussion here.

To (hopefully) clarify further: it is certainly true that we have e.g. λv⊗w = v⊗λw in a binary tensor product V⊗W, but that's referring to _presentations_ of vectors in terms of vectors in the tensor factors. In other words, inside of the set V⊗W, those are literally just two ways of describing the same element.

And I believe that is exactly what's going on here. Again: given a basis {b_i^j}_j of V_i for each i, we get an induced basis of V_1 ⊗ ... ⊗ V_n whose elements are of the form b_1^{j_1} ⊗ ... ⊗ b_n^{j_n}. These basis vectors are enumerated by tuples (j_1,...,j_n) where each j_m ranges from 1 to d_m. So, through this basis, vectors in V_1 ⊗ ... ⊗ V_n are uniquely expressible as m-dimensional arrays of scalars of size (d_1,...,d_n). Again: there's no sense in which an _individual vector_ in V_1 ⊗ ... ⊗ V_n carries a notion of "multilinearity".

And of course, what I've arrived at is precisely the notion of a tensor that I see e.g. in PyTorch. Amazelgee (talk) 20:32, 13 November 2023 (UTC)[reply]

@Amazelgee Thank you for the detailed explanation! My main point is that the scalars (d_1,...,d_n) they are inter-related in both math and physics, but there is no relation in PyTorch etc.. In other words, tensors in PyTorch are higher-dimension arrays without additional structure while they do have structure in math/physics.

I'm not an expert in this area but have studied differential geometry, general relativity and deep learning. Perhaps we can also get some opinions from other people. Digital27 (talk) 05:39, 14 November 2023 (UTC)[reply]

The term seems to mean very subtly different things in mathematics and physics, and now even programming. To physicists, it's some array of numbers that when given some invertible matrix as input will transform according to a certain rule. Also in physics, some of the indices are superscripted or subscripted, and this is supposed to affect the transformation rule. In mathematics, a tensor over a vector space V is just the result of taking the tensor product of a tensor power of V with a tensor power of the dual space of V. These result in some differences: To a physicist, a matrix is deemed a tensor, as a physicist treats both matrices and tensors as arrays. To a mathematician, a matrix is not a tensor; instead, it is a linear operator that is a tensor. The physicist's perspective also enables a physicist to consider whether a certain array of numbers "transforms" like a tensor when changing coordinates; to a mathematician, this is gibberish.

So, in conclusion, no one can quite agree on what a tensor is. --Svennik (talk) 07:01, 14 November 2023 (UTC)[reply]

Can I suggest that we either take the paragraph out or change it to be less specific? Something along the line of "they don't quite have the same meaning." Digital27 (talk) 07:05, 14 November 2023 (UTC)[reply]

@Svennik Thanks for sharing your understanding! Perhaps I should have mentioned that I'm coming from the math side, though I have some shallow familiarity with physics and programming.

I do still think that all of these notions are essentially the same, the only difference being whether we've chosen bases of the various vector spaces involved. This is akin to how a first course in linear algebra is typically about matrices, but the "high brow" perspective is that this is actually the study of "linear transformations among vector spaces equipped with bases". I'll explain.

The key point (as I described above) is that **bases beget bases**. More specifically:

- if you choose a basis of a vector space, you also get a basis for its dual (appropriately called the "dual basis");

- if you choose bases for a (finite) set of vector spaces, then you get a basis for their tensor product.

When you say:

> To physicists, it's some array of numbers that when given some invertible matrix as input will transform according to a certain rule. Also in physics, some of the indices are superscripted or subscripted, and this is supposed to affect the transformation rule.

I believe you are (perhaps implicitly) referring to a manifold $M$ and a vector in some tensor product $V := (T_p M)^{\otimes i} \otimes (T_p^* M)^{\otimes j}$ of copies of its tangent space and cotangent space at a point $p \in M$. As I described above, a basis of $T_p M$ gives a dual basis of $T_p^* M$, and together these give a basis of $V$. If $M$ is $n$-dimensional, then this basis allows us to equivalently specify a vector $v \in V$ as an $(i+j)$-dimensional array of scalars of shape $(d,...,d)$. Of course, this depends on the choice of basis of $T_p M$. Given this basis, a change of basis is tantamount to an invertible matrix, and here we arrive at the "transformation rules" that govern the description of the abstract vector $v$ in terms of arrays of scalars. If I recall correctly, the standard convention in differential geometry (and hopefully also mathematical physics, e.g. relativity) is that the subscripts correspond to the "covariant" tensor factors, i.e. the tensor powers of $T_p M$, while the superscripts correspond to the "contravariant" tensor factors, i.e. the tensor powers of $T_p^* M$.

When you say:

> In mathematics, a tensor over a vector space V is just the result of taking the tensor product of a tensor power of V with a tensor power of the dual space of V.

I would slightly disagree, as I typically take the term "tensor" to be more general: to me it can refer to elements of *any* tensor product $V_1 \otimes ... \otimes V_n$, not just one of the form $V^{\otimes i} \otimes (V^*)^{\otimes j}$.

===

@Digital27 Given the level of confusion here, I think it would actually be worthwhile to sort this out and then precisely articulate the distinction! But I'm not a regular contributor to wikipedia, so feel free to overrule this if you find the discussion too pedantic. Amazelgee (talk) 14:21, 14 November 2023 (UTC)[reply]

(And above, I should have also said: the prototypical way of obtaining a basis of $T_p M$ is from a local coordinate chart of $M$ around $p$. So, when one studies such "tensor fields" on Euclidean (or Minkowski) space, it could be said that one is equivalently studying the local structure of "tensor fields" on manifolds (perhaps of the corresponding flavor, i.e. Riemannian or Lorentzian).) Amazelgee (talk) 14:26, 14 November 2023 (UTC)[reply]

I think it's good that we're discussing this and I agree that we should make it as precise as possible. I know a little of everything involved here but not deep enough. Let me think about this some more and get back to you. @Amazelgee Digital27 (talk) 14:28, 14 November 2023 (UTC)[reply]

@Digital27 Sure, please take your time! I've set myself a periodic reminder to check back here, so you can rest assured that I'll see any further messages you post even if I don't respond immediately. Amazelgee (talk) 16:07, 20 November 2023 (UTC)[reply]

Thank you for your understanding! I'm trying to give an as objective as possible appraisal. @Amazelgee Digital27 (talk) 16:18, 20 November 2023 (UTC)[reply]

@Digital27 Have you gotten a chance to sort out your thoughts on this? I don't know your background, but given that I am a professional mathematician and have taught more than half a dozen courses on linear algebra and other topics that rely heavily on tensor products and related notions, it seems reasonably likely to me that I am correct in my assessment of this situation (see here: https://etale.site/). Obviously I am necessarily biased, though, and welcome external input. Amazelgee (talk) 15:28, 4 December 2023 (UTC)[reply]

I think we can simplify it to this: tensors in PyTorch are just higher dimensional arrays and have nothing to do with tensor products and there is no concept of a basis in PyTorch. I'd say the resemblance is only superficial.

If you want to know my background, I was a computer science major many years ago, math has been a hobby for me.

@Amazelgee Digital27 (talk) 15:03, 5 December 2023 (UTC)[reply]

@Digital27

As I've described at length (albeit a bit circuitously) above, I definitely disagree with that summary! I believe that your assertion is a generalization of the assertion -- which I also disagree with -- that ordinary matrices have nothing to do with vectors.

(I would argue that it is irrelevant whether the notion of a vector space basis is explicitly embedded in PyTorch. Analogously, matrices are intimately related to vectors, whether or not e.g. some particular computer program that implements matrix multiplication explicitly contains the notion of a vector space basis.)

Thanks for sharing your background. I certainly don't intend to argue by authority per se, but I do think it's worth considering our respective backgrounds here. I'd also be grateful for someone else who's more knowledgeable about this than I am to come weigh in, but I have no idea how to summon them... Amazelgee (talk) 08:59, 12 December 2023 (UTC)[reply]

They are definitely related, insofar as physicist-type tensors are multi-dimensional arrays which admit four operations:

- Addition

- Tensor product, which includes multiplying scalars by tensors

- Contraction

- Basis change

The last operation is ambiguous when not distinguishing covariant and contravariant indices. However, the other three operations are easy to implement in PyTorch. I think it is ultimately subjective how important the last operation is. Suggestion: It might be a good idea to characterise physicist's "tensors" using the above four bullet points, and show how to implement each of the four operations (or not) in PyTorch. Svennik (talk) 16:08, 12 December 2023 (UTC)[reply]

I've made an edit to the article along the above lines. Svennik (talk) 21:14, 12 December 2023 (UTC)[reply]

I still disagree on two key points:

I maintain the relationship between Pytorch tensors and physics tensors are tenuous, the most obvious manifestation being Pytroch users don't think about tensors operations (in the math/physics) sense at all.
In an article on Pytorch, I believe the difference/similarity is overemphasized. This should be a minor point to most readers of the article.

But I won't make changes at this point.

Thanks!

@Svennik @Amazelgee Digital27 (talk) 01:33, 13 December 2023 (UTC)[reply]

This sounds fine to me! Thank you both for your input, I appreciate it and it's been fun to see what goes on "behind the scene" here at wikipedia.

@Svennik @Digital27 Amazelgee (talk) 21:00, 24 December 2023 (UTC)[reply]

I just want to clarify: physical tensors are NOT necessarily multidimensional arrays. Multidimensional arrays are just an implementation detail! This is sorta like how the letter A is not necessarily the integer 65.

In my professional experience with continuum mechanics, only some physical tensors are implemented as multidimensional arrays (stiffness and compliance, which are 4th-order tensors implemented as 6x6 matrices). The rest are implemented as ordinary arrays (stress and strain, which are 2nd-order tensors implemented as 6-element arrays).

More importantly, a physical tensor after a rotation is still just the same tensor viewed from a different angle, even if that means that all the numbers are completely different. But you can't meaningfully rotate any old PyTorch tensor. 50.236.115.194 (talk) 19:34, 22 May 2024 (UTC)[reply]

To be honest, I don't think the very long explanation comparing different meanings of the word "tensor" is very useful to newcomers. I think the very brief remark from earlier could be rephrased somewhat and restored. I'm going to be bold and edit. --Svennik (talk) 18:27, 9 March 2024 (UTC)[reply]