Jump to content

Talk:V-optimal histograms

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Rewrite

[edit]

The article is not accessible at all to a general audience. The article fundamentally needs to be completely rewritten by sombeody familiar with the subject matter to make it readable. -- Whpq (talk) 17:34, 7 October 2008 (UTC)[reply]

Untitled

[edit]

with define below:

"A v-optimal histogram is based on the concept of minimising a quantity which is called the weighted variance in this context[1]. This is defined as

where the histogram consists of J bins or buckets, nj is the number of items contained in the jth bin and where Vj is the variance between the values associated with the items in the jth bin."

I can't caculate value of [Weighted variance]

Bucket 1: Average frequency 3.25 Weighted variance 2.28

could you give my how to have Weighted variance = 2.28???—Preceding unsigned comment added by Vanconglanh (talkcontribs) 15:56, 3 May 2010 (UTC)[reply]

I have the same problem. --Schubi87 (talk) 12:57, 7 November 2011 (UTC)[reply]

I have the same problem... I can not replicate Weighted variances 77ii (talk) 18:05, 8 November 2015 (UTC)[reply]

Introduction

[edit]

This article could use a better, intro with a bit more overview. It gets too technical too quickly. --mgarde (talk) 07:03, 25 March 2011 (UTC)[reply]

Inaccurate/Inadequate description of construction methods

[edit]

The editor who wrote this article implied that there is no good algorithm for construction of a V-optimal histogram and proceeded to go through different local search methods. For the 1-D case (which, I suspect is what many people are interested in), you can construct optimal buckets with dynamic programming. You can find pseudocode for the method at Emory (you can also see the syllabus that links to the rest of his discussion of V-optimal histograms). A quick glance at the pseudo-code indicates that the algorithm probably runs in O(B n^2) time where B is the number of buckets to create and n is the number of potential divisions between buckets (number training data values - 1). At the very least this algorithm should be mentioned. BrotherE (talk) 23:30, 6 January 2012 (UTC)[reply]

I have added a reference to the O(N^2 B) algorithm. However, I have not added a description of this algorithm yet. If someone has the time, I think having simple implementations of this algorithm (maybe pseudocode or python) would be useful as well. Slaymaker1907 (talk) 19:44, 14 September 2021 (UTC)[reply]