Talk:Isotonic regression

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics
Low	This article has been rated as Low-priority on the project's priority scale.

Statistics High‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics
High	This article has been rated as High-importance on the importance scale.

Pool-Adjacent Violators Algorithm

I've created this here since I'm not a user. I apologize but I do not recall my sources. I found something on Google (I think I searched "pool-adjacent violators" and it was the first hit), and borrowed it for my purposes. I suppose I could claim the exact syntax of the psuedocode is somewhat original, although it is based on something I read, so that some symbol conventions are clearly borrowed. I'll let people who care about this stuff fix it up. All I personally care about is the math itself, not the people behind it. Math transcends people. I think it is silly for people to claim to "own" math. Still, I am very grateful for all who came before me and all deserve credit, but I would rather fill my brain with ideas, rather than record who created them (waste of disk space).

Suppose we have a set of $n\$ numbers $X_{1},...,X_{n}\$ with weights $W_{1},...,W_{n}\$ . The goal of the algorithm is to optimally shift these X-values so that the result, which we'll call $Y_{1}\ ,...,Y_{n}\$ , are monotone increasing; that is, $Y_{1}\leq Y_{2}\leq ...\leq Y_{n}$ . The shifting takes into account the weights, so that values with larger weights are shifted less than values with smaller weights. It is optimal since it can be shown (I don't know how) that the weighted mean squared deviation of the X's from the Y's is minimimzed.

We start with the first value and proceed right until we reach the first pair of values that are not in order. Suppose we call these $X_{i},X_{i+1}\$ . Note that $X_{i}>X_{i+1}\$ . Next, we "pool" these values and replace both with the weighted mean (call it $M\$ ) of the two values. Then, we proceed backward one value at a time, and check to see if anything that was in order before, has now become out of order. Suppose we reach a $j\$ where $X_{j}>M\$ . Then we pool again, yet this time we pool all the values $X_{j},...,X_{i+1}\$ and replace them with their weighted mean (also, we redefine $M\$ to be this new mean). Eventually this backtracking will terminate, and then we can jump back to position $i+1\$ , and proceed as we did initially. This back and forth process will terminate, leaving a monotone sequence.

A naive analysis of this algorithm is that it is $O(n^{2})$ . Is there a sharper analysis that shows it to be $O(n)$ as claimed in the main article? Eclecticos (talk) 17:17, 28 March 2021 (UTC)[reply]

Here is pseudocode

( Caution, it is UNTESTED!! )

 For i = 1 to n
 Do
   Y(i) = X(i)
 End
 For i = 1 to n-1
 Do
   If Y(i) > Y(i+1) Then
   Do
     TotY = W(i) * Y(i)  +  W(i+1) * Y(i+1)
     TotW = W(i)         +  W(i+1)
     MeanY = TotY / TotW
     j = i - 1
     While ( j > 0 and Y(j) > MeanY )
     Do
       TotY = TotY  +  W(j) * Y(j)
       TotW = TotW  +  W(j)
       MeanY = TotY / TotW
       j = j - 1
     End
     j = j + 1
     Do k = j to i+1
       Y(k) = MeanY
     End
   End
 End