Talk:Mean time between failures/Archive 1
This is an archive of past discussions about Mean time between failures. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 |
Merge Alert
Another article on the same subject is at Mean time between failures. That text needs to be merged with this article (Mean time between failure is probably the best title; See Wikipedia:naming conventions). --mav
- I know we have the convention of naming articles in non-plural fashion, but how can you have a mean time between a single failure? -- Wapcaplet 19:12 3 Jun 2003 (UTC)
- Besides, "mean time between failures" is singular already; "mean times between failures" would be plural. -- John Owens 19:17 3 Jun 2003 (UTC)
I have moved this to Mean time between failures, per the discussion above. -- Wapcaplet 22:44 3 Jun 2003 (UTC)
Issue with "Common MTBF misconceptions" section
This part says that the parts will fail before their MTBF 67% of the time, which to me seems to forget the first point indicating a wear out time. Parts will fail even more often than 67% of the time before the MTBF, because that calculation assumes the constant failure rate model holds for the life of the part. I haven't thought a great deal about how to phrase this, but what I guess may be accurate is to say that, given a real known MTBF for a part, a lifetest of a number of parts would result in a calculation for MTBF that is lower than the known actual MTBF 67% of the time. 151.190.254.108 (talk) 20:37, 28 September 2010 (UTC)K. Domnitser
- The section was blanked on October 2010 with apparently no other discussion about it than this message above. I am going to restore it. If this was debated and I just missed it, just revert my revert of the deletion Carewolf (talk) 18:48, 1 January 2012 (UTC)
What units?
It is normally measured in hours. This concept is also applied to rotary machines like compressors and gas turbines etc.
Real-world example
I have no idea how to use the information in the article. For example, Seagate's largest HDD available is quoted as having a MTBF of 1,200,000. Obviously they don't mean it will last 1.2 million hours, because that is more than 136 years. It would help if someone could write an explanation of how I could work out how long I could expect a HDD to last, given a MTBF of 1.2m hours. Thanks Cagliost 21:03, 21 January 2007 (UTC)
The 1.2 million hours stated is only of limited use to people who are deploying or keeping track of many thousands of units in actual operation. The calculated MTBF is first done by putting all of the individual reliability information for all of the parts into a computer that uses the Belcore method for summing all of the values together into a "composite" figure. In the case of the disk drive, the motors, bearings, voice coils ASICS etc. all have values that are calculated together. Power components such as discrete power components will bring this composite figure down. So, the calculated MTBF for the HDD is most likely 1.2 million hours. Then the field population is tracked by how many confirmed failures are counted. I think many of these for disk drives are not counted because they are not returned to repair centers, more likely thrown away if they fail.
The operating hours of the total field population is added up and then divided by the Failures in Time or FIT which gives an "actual" MTBF figure. So it is this number that tells you the actual frequency of failure, in hours of run time across the field population. It is really meaningless to an individual user of a hard drive. The bottom line is that with a calculated MTBF of 1.2 million hours you should never see a functional failure of your hard drive unless it is being abused, overheated etc. In addition hard drives are rated differently depending on whether they are OEM (enterprise)models or consumer models. Enterprise models cost more and are rated at 100% duty cycle, where consumer models are rated at a much lower duty cycle.
ReliableEngineer 21:12, 22 January 2007 (UTC)
Thanks! So basically MTBF tells me, a home user, nothing about how long a HDD will last. I've been getting 5-8 years in desktops. But now I'm running a server, and I've heard claims that it's better to run an HDD constantly, because it is the spin-up and spin-down that wears the disks out. I've been unable to verify this, so have no idea if it's true or not! cagliost 15:14, 12 February 2007 (UTC)
- Well, MTBF tells you something about how many failures you can expect within the devices's rated lifetime, but as used by HDD manufacturers, it doesn't tell you much about the rated lifetime. (You can argue that this is somewhat disingenuous on their part, and I don't think I'll disagree with you.) "Five years" is a typical rated lifetime, by the way.
- Meanwhile, yes, spin-ups and spin-downs are bad for the life of the drive as they cause head touchdowns, and any given disk drive is only rated for a certain number of touchdowns (or its wearout life is stated in terms of, say "8 hour cycles with one take-off and on-landing" bracketing each cycle). But then again, while the disk is stopped, the spindle bearings aren't wearing out. So it's a tradeoff.Generally speaking, server disk drives tend to like long spin cycles with fewer touchdowns while laptop disk drives are rated for many more touchdowns so the drive can be spun-down to conserve power.
- MTBF has a lot of meaning, but it does not mean the device will last 136 years. All of these misunderstandings are really surprising. And it does not mean that you will not see a failure in home use. Saying you will never see a failure is just dumb. What does it really mean? It means that one out of every 136 devices will have a mechanical failure every year. Or, equivalently, if there were 1360 devices spinning, then on average, 10 would mechanically fail every year. That is the meaning of 1.2 million hours. But, this needs to be put into context. Infant mortality is not counted, because those devices never were in use. Software failures, user failures, infrastructure failures (brownouts), heat and overheating, and viral or deliberate attacks are not counted either. So, a home user is still going to see the failure every few years. Even with enterprise drives. What the home user should not see is driving the drive to death. It should be built for continuous action of the head, and that action not fail for, on average, 1.2 million hours. Marcwiki9 (talk) 21:43, 9 February 2014 (UTC)
Fix the Freakin' Title!
For some reason, Mean time between failures now redirects here. Someone needs to fix this. As was pointed out above, the one and only proper title for this article is Mean time between failures, with a final s. Rocinante9x 22:03, 28 January 2007 (UTC)
- --Chealer (talk) 17:42, 18 August 2014 (UTC)Resolved
Related: Software Safety
I deleted the short paragraph alluding to software in systems, since it was very crappily written and linked to a non-existent article. Rocinante9x 20:45, 6 February 2007 (UTC)
Merge suggestion
I believe that there is so much overlap between this MTBF article and Failure rate that it might be best to merge the two articles. Failure rate seems to be the more general topic, so I would suggest merging with Failure rate and redirecting MTBF. However, I would want to make sure all the points in the MTBF article are carried over to Failure rate, such as the section on Problems with MTBF. Any comments on this suggestion would be appreciated. Wyatts 21:25, July 21, 2005
Study: Hard Drive MTBF Ratings Highly Exaggerated
- http://www.dailytech.com/Article.aspx?newsid=6404
- http://www.usenix.org/events/fast07/tech/schroeder/schroeder_html/index.html --Root Beers 06:43, 11 March 2007 (UTC)
I removed the section MTBF Exaggerated and replaced it with MTBF and Life Expectancy. The two source you quoted above have more to do with life expectancy. Please also see the Usenet FAQ about MTBF. NYCDA 21:32, 3 May 2007 (UTC)
Sentence removed from "MTBF and life expectancy" section
I just removed this sentence from Mean time between failures#MTBF and life expectancy:
- Another way to look at this is, if there are 100,000 units of this drive and all of them are in use at the same time, then 1 unit is expected to fail by the first hour. All 100,000 of them are expected to fail after 2 years.
I'm pretty sure this is flawed reasoning. It is not necessarily the case that one unit is expected to fail by the first hour; this all depends on the distribution of failures. Likewise, if the life expectancy of the drive is two years, it is unreasonable to expect that all of the drives will fail after two years; some will fail sooner, and some will last longer. —Bkell (talk) 05:42, 18 May 2007 (UTC)
- I don't think it's flawed as you pointed it. It is extreme but still correct in that if you did ran 100,000 HD at the same time, then 1 is expected to fail by the first hour. The distribution of failures would actually support this. Assumeing the HD had a MTBF of 100,000 so it means there is a 1/100,000 chance per hour that a fail would occur. So you have 100,000 HD running for 1 hour each with 1/100,000 chance of failure per hour then you would expect 1 HD to have failed after the hour.
- If a HD has a life expectancy of 2 years, then you would expect that HD to fail after 2 years. It may be sooner or later but you would 'expect' it to fail after 2 years. So you have 100,000 HD, you would still expect all of them to fail after 2 years and still many would've failed before 2 years and many lasting longer. It does not say about all 100,000 failing after 2 years, just you are expecting them to fail after 2 years.
- I've put the sentence back with some edits. NYCDA 19:52, 5 June 2007 (UTC)
- No, see, you're assuming that there's a 1/100,000 chance of failure in the first hour. That's not necessarily a correct assumption. Perhaps the way these things really behave is that there's an almost zero chance of failure in the first 5,000 hours, but then the chances of failure begin to increase, so that there's a small chance of failure between 5,000 and 10,000 hours, a greater chance between 10,000 and 15,000 hours, and so on. In other words, it is entirely possible (and perhaps likely, in the real world) that the chances of failure within the first hour of use are much less than the chances of failure between the 60,000th and 60,001st hours of use.
- That's what I mean when I say that it depends on the distribution of failures. Suppose you take 100,000 drives and start running them, and each hour you count up how many have failed so far. The sentence in the article is making the assumption that this graph would be a straight line, increasing at a constant rate; roughly the same number of drives fail each hour. That's a faulty assumption. It could be that quite a few drives fail right off the bat, just because of a flaw in the manufacturing process; but if a drive makes it past the first 100 hours of operation, say, it stands a really good chance of lasting for at least 10,000 hours. So the graph may have a big spike at the beginning, and then level off for quite a while, before starting to increase again. This is far from a linear function of time. And there are many other imaginable scenarios. Knowing only the life expectancy does not tell you anything about the distribution of failures, so you can't logically make the conclusion that after one hour the expected outcome is one failed drive.
- Also there is a big difference between expecting one particular drive to fail after two years and expecting all drives to fail after two years. It is true that for any particular drive, you expect it to fail after two years; but you do not expect that after two years all of the drives will be dead. The sentence "All 100,000 of them are expected to fail after 2 years" seems to say that the expected outcome after two years is to find 100,000 dead drives. —Bkell (talk) 22:04, 5 July 2007 (UTC)
- I made up the 1/100000 chance of failure to demonstrate the problem. It makes the math easy. I could've picked 1/1000 and said out of 1000 drives, one is expected to fail at the first hour. I also made no claim life expectancy is related to MTBF. In fact my point was they are not related at all execept in the ways I have described which is why manufacturer can specify a MTBF far greater then life expectancy. Regardless the made up number, the rate of failure is constant it cannot change as you have indicated. The probability of failure is not lower in the first hour then it is in the 10000th hour. See the wiki. MTBF = 1 / failure rate. If the failure rate is not constant then MTBF is not constant. NYCDA 18:59, 17 August 2007 (UTC)
- Perhaps I don't even agree with the statement that you expect any particular drive to fail after two years. True, the expected lifespan of a particular drive is two years, but this is different from saying that you expect that drive to fail after two years.
- An example: I make a $10 bet with you that when I flip a coin it will come up heads. Now, there is a 50% chance that it does, and I win $10; and there is a 50% chance that it doesn't, and I lose $10. So my expected winnings are
- .
- But I certainly don't expect to break even, because it is impossible for me to break even. Either I will win $10, or I will lose $10, but I cannot break even. So "the expected value of X is Y" is different from saying "we expect the value of X to be Y". —Bkell (talk) 22:13, 5 July 2007 (UTC)
- If you take your 'flip a coin' question to any AMS (applied math statistics) class, then answer would be yes you expect to break even. Remember we're talking statistics, not just a one time event. See http://www.ams.sunysb.edu/~finchs/AMS311Lecture13.doc for some lecture notes. Please understand what it means to expect in statistics. Try this for size. Flip your coin 10 times. You expect to end up with 5 heads and 5 tails but the chance of getting exactly 5 heads and 5 tails is actually less then 25%. If you flip 10000000000000000000 times, your probability of getting 5000000000000000000 heads and 5000000000000000000 tails is practically 0 yet you must still say that is the outcome you expect. NYCDA 18:59, 17 August 2007 (UTC)
MTBF changed to MTTF
In the opening sentence, it's
- Mean time between failures (MTBF) is the mean (average) time between failures of a system, the reciprocal of the failure rate
While in the formal definition of MTBF it's
- The MTBF is the sum of the MTTF (Mean Time to Failure) and MTTR (Mean Time to Repair). The MTTF is simply the reciprocal of the failure rate.
These 2 sentences contradicts each other. If MTBF is the reciprocal of failure rate and MTTF is also the reciprocal of failure rate then MTBF = MTTF but the formal definition says MTBF = MTTF + MTTR. Personally I don't see how the repair time is a factor in MTBF. Can someone provide some sources to the formal definition of MTBF? NYCDA 20:08, 5 June 2007 (UTC)
- MTBF has NOTHING to do with MTTR..and MTBF is the reciprical of the failure rate not MTTF....i dont think the person that input this into wikipedia knows what he/she is talking about...148.78.243.24 15:50, 16 August 2007 (UTC)
- NYCDA, it's more intuitive (chronological) to define MTBF as MTTR + MTTF. Think about the period between the first and the second failure. It starts with the first repair, and it ends with the second work period. The longer it takes to repair, the less often a device will fail. --Chealer 03:51, 21 August 2007 (UTC)
- Chealer, mathematically it is not wrong to say MTBF = MTTF + MTTR. But in practical world, you should never use that as it gives confusion (pls refer to other section below: "Practically, MTBF = MTTF") —Preceding unsigned comment added by Andjohn2000 (talk • contribs) 03:22, 12 November 2007 (UTC)
Cleanup
I've tagged this cleanup due to concerns about accuracy and the treatment separate from failure rate. Someone suggested to merge this with Failure rate in Talk:Failure_rate. Accuracy concerns were discussed in the previous talk section.--Chealer 03:46, 21 August 2007 (UTC)
- I think the main interest in this wiki is with computer HD's MTBF which is just another way of saying the failure rate. NYCDA 22:28, 3 December 2007 (UTC)
I tagged it again. The problems I'm seeing
- 1. In Overview, MTBF=sum(downtime-uptime)/# of failures. This can't be right. If MTBF=MTTF+MTTR then MTBF=sum(uptime+downtime)/# of failures or MTBF=total time/# of failures.
- 2. In formal definition of MTBF. It states MTBF=MTTF+MTTR and it explains MTTF very well but does not explain MTTR at all. The failure density function f(t) also needs to be defined otherwise the integral function should be removed. is much harder to understand compared to 1/failure rate.
- 3. If readers are here to learn about MTBF because they saw that 100,000 MTBF on their new HD, then it would be better to de-emphansize all the other varaibles like MTTR, MTTF, MDT
- 4. Variations of MTBF and Problems with MTBF appears to be unsourced.
NYCDA (talk) 22:38, 5 December 2007 (UTC)
--
NYCDA, what's your point to make MTBF=sum(uptime+downtime)/# of failures?
Suppose a new item is up for operation at afternoon 13:00.
After a while, it gets a failure and down at the same day at evening 23:00
Assuming the item is still down when we are going to perform MTBF calculation.
From the graph, the parameters required for calculation will be:
- Uptime = 13:00
- Downtime = 23:00
- Number of failures = 1
- Is this what the school teach you on how to flip the coin?
-- Andjohn2000 (talk) 03:35, 16 December 2007 (UTC)
- To me, though there is no source to indicate "MTBF=sum(downtime-uptime)/# of failures", the graph seems to show the corollary to the stated formula. So, I can't comment here whether it is right or wrong to maintain this formula in this wiki page.
- However, proposing a new formula so called "MTBF=sum(uptime+downtime)/# of failures" is definitely violations of WP:NOT#OR, WP:NOT#OTHOUGHT, and WP:NOT#PUBLISHER
- It is also violation against WP:NOT#DEMOCRACY to try gathering mass support (by adding cleanup tag to this page) in order to expect other people to concur this new formula. Please note this is not a place to publish a research. Academic or research institute will be more appropriate to it.
Scuarty (talk) 12:58, 16 December 2007 (UTC)
- If you feel there is no source to back "MTBF=sum(downtime-uptime)/# of failures" then it is your duty to remove it according to WP:V It should be removed, aggressively, unless it can be sourced
NYCDA (talk) 21:51, 18 January 2008 (UTC)
- You don't need to pinpoint me as the guard of policy. For the subject that I am not familiar with, I will not take any actions to delete, undelete, or even dare to publish the theory. To me, this formula is like calculating the delta of two events. Nothing special or controvercy to it. Not sure if it requires sources. But when looking at your stated formula (uptime+downtime), it is definitely a controvercy theory: a delta can't be sum of two entities. So you do need to get some credible research/studies. That's why I stated academic or research institute should be more appropriate for your uptime+downtime formula.
Scuarty (talk) 14:33, 19 January 2008 (UTC)
- I would ask you two to read the whole wiki again then responde. I quote from the wiki
- Mathematically, the MTBF is the sum of the MTTF (mean time to failure) and MTTR (mean time to repair).
MTTF=sum(uptime)/# of failures. MTTR=sum(downtime)/# of failures. So one section says MTBF=sum(downtime-uptime)/# of failures while another says MTBF=sum(downtime+uptime)/# of failures. And MTBF can never be less then 0 but sum(downtime-uptime) can easily be less then 0 so there are a number of problems here. And this is precisly what cleanup tags are for. It draws other knowledgable editors to the wiki so that it can be addressed. NYCDA (talk) 16:31, 17 December 2007 (UTC)
--
NYCDA, please refer to the graph again:
sum(downtime-uptime) can never be less than 0.
- Practical formula (MTBF=MTTF):
- MTBF=sum(downtime-uptime)/# of failures
- Mathematical formula (MTBF=MTTF+MTTR):
- MTBF=total time/# of failures
- Wrong formula:
- MTBF=sum(uptime+downtime)/# of failures
Regarding the controvercies of practical vs mathematical formula, please keep the discussion in the talk section "Practically, MTBF = MTTF".
-- Andjohn2000 (talk) 16:09, 20 December 2007 (UTC)
- You crack me up, what is total time if not sum(uptime + downtime)? NYCDA (talk) 21:36, 18 January 2008 (UTC)
- NYCDA, do you have problem in mathematics?
- Suppose you go to your office at 8:00 and leave at 17:00.
- Your total time in office will be (17:00 - 8:00) = 9 hrs.
- With your (uptime+downtime) formula, it will become (8:00 + 17:00) = 25 hrs
- When your uptime is at 8:00 with downtime at 17:00, is it what you learn from school that your working hours at office are 25 hrs per day? -- Andjohn2000 (talk) 13:47, 20 January 2008 (UTC)
- NYCDA, do you have problem in mathematics?
- In your example, you give absolute times (e.g. 8:00, 17:00) but uptime and downtime are words for time intervals, which you would need to describe with durations, not absolute times. rakslice (talk) 20:59, 5 April 2009 (UTC)
The wrong calculation is the right one
The main calculation on this page is not accurate. In fact it conflicts with hard drive examples.
The page states that the MTBF should only do the sum of units that fail. Let me point out several problems with this. One in the example if Item 5 had not failed one would expect the MTBF to be better however according to the instructions it would 72 / 2 = 36 hours. So following the fomula less failure can make your MTBF worse. This is clearly wrong.
example 2. The page correcly describes a HD scenario. If I have a HD with an MTBF of 100,000 hours and I run 100,00 of them I would expect 1 to fail each hour. (assuming I replace it and keep at 100,000). Now lets reverse it and say I don't know the MTBF. I test 100,000 drives and in the first 24 hours I get 24 failures. I would calculate this as uptime/failures....
(100,000 drives * 24 hours) / 24 failures = 100,000 MTBF
But the article says this is wrong I should only count drives that fail so it would be
(24 drives * 24 hours) / 24 failurs = 24 MTBF.
This makes no sense.
I hope to get some feedback on this so that we can get this page updated. —Preceding unsigned comment added by 64.208.118.146 (talk) 13:13, 11 October 2007 (UTC)
- Your reasoning makes sense to me. I don't know the conventional usages in this area. I'll see what I can find out about that. Michael Hardy 14:11, 11 October 2007 (UTC)
I deleted the section. The MTBF=174 hours is correct even though there are only 168 hours in a week. The units measure for MTBF is time (hours typically) not (hours per week).
MTBF=MTTF+MTTR but in practice, MTTR = 0 for physical failures. Ex, the 5 weeks spend waiting for a tech to come with replacement part wouldn't be counted in MTTR. MTTR <> 0 for soft failures. Ex, during a phone conversation, you suddenly can't hear anything. The time to repair = the time you first couldn't hear anything to the time the other party repeats it.
HD manufacterers do not run 1 drive for 100,000 hours to get a MTBF. They do something like run 1000 drives for 100 hours to calculate MTBF. NYCDA 22:24, 30 November 2007 (UTC)
- I fixed typo of this title: The wrong calculation is the rigth one -> The wrong calculation is the right one. Cincaipatrin 10:02, 13 January 2008 (UTC) —Preceding unsigned comment added by Cincaipatrin (talk • contribs)
Actual MTBF vs Predicted MTBF
The (24 drives * 24 hours) / 24 failures = 24 is the actual MTBF based on the up-to-date data.
The remaining ones cannot be included into the calculation yet (as the data is not completed yet).
Some industry standards (like SEMI E10-0304) provide additional factors so called "confidence level" to provide estimation to the MTBF amongst total collected productive hours (so called MTBFp). This would enable you to get the MTBF against the zero failure conditions.
If you have E10-0304 (http://www.semi.org), you may refer to p10 section 7.6.3.1 and p11 section 7.6.4.1 It provides some good examples on how to estimate the MTBF amongst your equipment's performance
I have E10-0304, but can't paste its formulas here as it is under copyright laws.
Let me know if you have further doubts. —Preceding unsigned comment added by Andjohn2000 (talk • contribs) 04:54, 17 October 2007 (UTC)
- I cannot find the document in semi.org. Can you provide a direct link?
- You must take the remaining ones into consideration otherwise the MTBF is going to change as duration is increated.
- If MTBF is 24, then if I ran those same 100,000 drives for 48 hours, almost all of them would have failed since all of them ran for 2x their MTBF.
- If the 100,000 driver were test for 48 hours then 48 would have failed. But (48 drives * 48 hours) /48 failures = 48 so the MTBF is now 48.
- By only counting those that have failed, I can prove MTBF=1 FOR EVERYTHING by
- I will run 1 device for a hour to see if it fails. If it does my MTBF = 1. If not, I'll add another device and run for an hour and wait for 1 device to fail. If nothing fails, I'll add another device and repeat. Eventually I will have n devices where 1 will fail. But your reasoning, I only need to count the device that failed so the MTBF has to be equal to 1.
NYCDA 22:59, 30 November 2007 (UTC)
- As I said, you cannot use invalid data to do the calculation.
- You would need to understand the term "actual data" and "uncomplete data".
- You use actual data to calculate your actual MTBF.
- If you think the actual one is not accurate because your sample size is too small, then what you can do is to do prediction/estimation. This means: if you only have 4 valid data amongst total 100 data, you can only use those 4 data to do the calculation to get actual results.
- If you do insist to include all 100 of them (even though the remaining 96 data are still not available yet), then you will be doing data manipulation there.
Consider this example:
- 4 females have given a birth (i.e. 4 babies have been born successfully)
- 96 females haven't given a birth (i.e. 96 fetuses are still in their wombs)
- So: total new actual babies would be 4 babies, not 100 babies.
96 fetuses have not been born yet and there is no guarantee that all remaining fetuses would be born successfully.
Of course, you could use certain formula to "predict" the new babies (subject to your confident factor whether those 96 females would be success in giving their birth).
--
I deleted the section again. Here is a quote from *Summary including MTTF discussion
- Example: Suppose 10 devices are tested for 500 hours. During the test 2 failures occur. The estimate of the MTBF is: 10*500 = 2,500 hours / failure.
Out of 10 drives, only 2 had failed but the 8 drvies that didn't fail are still counted in the calculation. NYCDA 23:27, 3 December 2007 (UTC) {Edit: I deleted a previous reply from me}
- You don't understand with what you had written:
- - You are right to say: "the estimate of the MTBF is: 10*500 = 2,500 hours / failure."
- - Read again the title of this section: "Actual MTBF vs Predicted MTBF"
- - I am talking about "Actual MTBF", not "Estimated one"
- Estimated MTBF is beyond my scope here as it is subject to certain standards to fit different needs. The wiki contents should deliver accurrate info neutrally, not biasing to a certain company for a certain type of product.
- I have no point to deliver more arguments to you as yourself are not able to differentiate between actual and estimated data.
- It is up the maintainer on whether or not to accept it.
--
You should famaliarize yourself with wiki policy and guidelines. One does not resort to personal attacks just because another contributer doesn't agree with you. As I have stated before, I went to the link you provided and could not find anything on Actual MTBF. Check WP:V Quoting Jimmy
- It should be removed, aggressively, unless it can be sourced.
Source it, if your presentation is in line with the source, I won't have a problem with it. NYCDA (talk) 21:59, 5 December 2007 (UTC)
- I put the article into this talk page: Article Written by user Andjohn2000. Cincaipatrin 10:28, 13 January 2008 (UTC) —Preceding unsigned comment added by Cincaipatrin (talk • contribs)
Practically, MTBF = MTTF
It is impractical to say MTBF = MTTF + MTTR.
Mathematically, it is not wrong to get the middle point for the following questions:
- What is the point measurement for time of 1st failure?
- What is the point measurement for time of 2nd failure?
In reality, those above measurement points never exist.
If I were a mechanical designer and had to know the space constraint of a building for my design:
- What is my subordinate supposed to do?
- How to measure the distance between the walls?
- Does he/she have to tell me that the distance is { space + wall } ?
- How should I design my equipment to fit that building?
See the following picture as an illustration:
- Practically:
TBF = t3 - t2 (= TTF) MTBF = MTTF
- Mathematically:
TBF = t" - t' (= TTF + TTR) MTBF = MTTF + MTTR —Preceding unsigned comment added by Andjohn2000 (talk • contribs) 03:17, 12 November 2007 (UTC)
- So, which one should be used as reference? Boulty 17:03, 1 December 2007 (UTC)
- Boulty, no-one would concur that 110 meter is the distance between the walls. Use the practical formula to calculate your MTBF whenever you can, but don't ignore the fact in the mathematical field.
Reverting deleted MTBF and life expectancy
I reverted the delete because there is a real need to differentiate between MTBF and Life Expectancy. Everyone known human life expectancy is less then 100 years but MTBF of a middle aged person is way more then 100 years. The reason is MTBF has little to do with Life Expectancy. Life Expectancy may have little to do with this wiki but it's important reader do not get confused by them. NYCDA (talk) 18:38, 11 December 2007 (UTC)
- Removed back. Article does not comply with WP:NOT#PUBLISHER. It is based on a self-published source (WP:SPS) and has no original research (WP:NOR) Scuarty (talk) 00:11, 12 December 2007 (UTC)
- Why don't you do some research first. Did you bother to verify that there were no source besides the Usenet FAQ? If you didn't like the Usenet FAQ why not remove the link? Here are some additional sources on LE and MTBF.
- and many more. Usenet is not a publisher. You can't disqualify Usenet on WP:SRS because it is not a self-published book, personal website nor blog. And I certainly did not write the Usenet FAQ. Usenet would hardly qualify as self-published source. Usenet FAQ is acutally an aggregavated collection of posts to the FAQ bullitin. As the links above shows, the section you removed is not OR. I see no violation of WP:NOT#PUBLISHER nor WP:NOR here. NYCDA (talk) 20:50, 12 December 2007 (UTC)
- It is you that should make research first in a neutral way. Googling the web is not enough. There are also some sources which are not available thru the web, but certainly provide some info. You should also visit some bookstores to get more about it.
- MTBF can still be used as parameter in estimating the life expectancy:
- http://dearauthor.com/wordpress/2007/03/31/its-april-fools-day-have-you-backed-up-your-ebooks/
- http://www.maran.com/dictionary/m/mtbf/index.html
- http://www.repairfaq.org/REPAIR/F_monfaqf.html#MONFAQF_002
- http://www.cranecams.com/?show=article&id=11
- http://www.a-tca.com/atca_new/content/articles/fa518.htm
- http://www.irrigationcraft.com/b.htm
- http://www.crouzet-usa.com/techtalk/lofiversion/index.php?t114.html
- You cannot just write your own article and conclude the MTBF to be not related with life expectancy (WP:NOT#CBALL). The links you provided above may just follow your article as wikipedia always get hit on the top of search engines (it's also a question to me why you keep insisting to undelete your article over other contributors: to make the hit on the search engines?). Why should I remove the links if there is a fact of having difference interpretation between MTBF and life expectancy? As long as wikipage does not make any conclusions, having some actual links should be fine.
- Removed back the section. Scuarty (talk) 00:19, 14 December 2007 (UTC)
- The links I sited are from the military and vendors. If they are siting what I wrote (which is not the case here obviously) that would make me an expert on this subject. This would make even WP:SRS (also not the case here) acceptable. NYCDA (talk) 16:43, 14 December 2007 (UTC)
Edit, I moved the reply to the next section. NYCDA (talk) 22:34, 14 December 2007 (UTC)
==Deprecated RFC: Should MTBF and life expectancy be removed
This sections has been removed many times by user Scaurty siting these violations
- On 12/8/07 - Unrelated topic
reverted on 12/10/07 since the section did not discuss life expectancy (LE) exclusively. It discussed MTBF and LE together in the same context. For example, it's quite normal to discuss Laura Bush in a wiki on George Bush since Laura is part of George's daily life.
- 12/10/07 - WP:NOT#PUBLISHER
Wiki is absolutly not a publisher. The section/wiki is not publishing anything. I reverted and explained the information is sited in the Usenet FAQ in the links.
- 12/11/07 - Again removed siting WP:NOT#PUBLISHER and UseFAQ does not comply with WP:NOT#FAQ
WP:NOT#PUBLISHER was already mentioned before. He sited this again without any further explaination. WP:NOT#FAQ policy applies to the wiki itself. The wiki/section is not written in the form of FAQ. WP:NOT#FAQ does not preclude FAQ from other sources to be used for reference.
- 12/11/07 Removed again siting WP:NOT#PUBLISHER and WP:NOT#FAQ and (#WP:SRS, WP:NOR in talk)
I really don't see how any of these policies apply here. WP:SRS apparantly is referring to the Usenet FAQ. Don't see how this is SRS. I didn't write the FAQ and there's no indication anything referenced material was written by any editors. I also provided other sources on this from vendors and manufacturers to back this as not OR.
- 12/14/07 WP:NOT#CBALL
As I have indicated before, the information I presented was refactored from other sources. I think Scaurty is mis-understanding the policy here too. WP:NOT#CBALL referes to specutating future events. This section clearly is not that. NYCDA (talk) 16:38, 14 December 2007 (UTC)
- Strong keep -- There's no question in my mind that our article must deal with this topic as this distinction (artificial though it may be) is a reality in the industry and the example used in the article is precisely on-target. MTBF is not a constant over the life of a product and this section of the article is attempting to draw attention to that. Improve the language if you like, making it describe the varying MTBF as "wearout" mechanisms start to take their toll, but I'm certainly on the side of keeping this section in the article. -- Atlant (talk) 16:47, 14 December 2007 (UTC)
- This section is misleading and will definitely give benefit to the manufacturers that are unable to attain high MTBF in their products. It should be removed to prevent some parties to make use of it to promote their products using flawed info. Cincaipatrin 16:51, 14 December 2007 (UTC)
- NYCDA, the article of this topic has to be removed.
- Higher MTBF means higher LE (not vice versa as what stated in your article)
- When a certain type of product fails to achieve high MTBF, there should be no excuse to runaway from the fact (i.e. no excuse to advertise that product to have higher LE than its competitors')
- Be a gentle man to accept the victory of your competitors:
- Don't sabotage their success with unfairness and unhealthy competitions.
- If your product fails, improve it or sell it off at below normal price.
- Do more practical exercises and researches to extract granularities of the difference between actual MTBF and predicted MTBF (this is a generic term, can be looked-up in the dictionary, and is not a theory or corollary brought by me or someone)
- LE is an estimation of life span. Any kind of estimation will have strong correlations with its fact/actual data (i.e. actual MTBF) with some sort of confidence factors. So, read more carefully the talk section for my article that had been deleted by you above ("Actual MTBF vs Predicted MTBF")
- Study and learn in a different aspect/view to differentiate the baby and fetus in the human population (you'd need to distinguish between "actual event" that has occurred and "other event" that has not occurred yet)
- -- Andjohn2000 (talk) 02:40, 15 December 2007 (UTC)
- I never proposed higher MTBF means higher LE. I never presented any OR, SPS, CBALL, FAQ, PUBLISHER in that section. All I did was re-phrased what was in the Usenet FAQ. **Can commenters also discuss whether the section viloates WP:OR, SPS, CBALL, FAQ and PUBLISHER + any other policies? There is no point discussion the merits of the section if it violates so many policies. Andjohn2000, I appreciate your response, but I would also ask you to keep the discussion on predicted and actual MTBF to different section. And what's this about my competitors? I only see fellow editors here, no competitors. I do not work/representing the interest of any one. It sounds like you are implying I'm a vendor/manufacturer and more so, it sounds like you represent a vendor/manufacturer. I did not open a RFC to squash any competitors like you are implying. I opened the RFC to discuss wheter the section violates the wiki policy in Scaurty's claims. And more importantly, to determine the merits of the section. NYCDA (talk) 16:05, 17 December 2007 (UTC)
- I dug up more sources on this subject. This one http://www.ece.cmu.edu/~ganger/ece546.spring01/readings.html is a list of lecturs and readings for the advanced class at Carnegie Mellon University. (I trust I do not need to proof the prestige of this university to anyone). If you goto the page, you will find a link to MTBF Description (http://www.ece.cmu.edu/~ganger/ece546.spring01/papers/mtbf.description) which is exactly the same of as the Usenet FAQ. Is this proof enough to everyone that the section in question does not violate any of the wiki policies list above and is factually correct? Here's another paper http://db.usenix.org/events/fast07/tech/schroeder/schroeder_html/index.html. This one is from the Computer Science department of Carnegie Mellon University and is long but it does show LE and MTBF are not related. (If you question the realiability of Usenix, check out it's board of directors). To address some specific concerns raised so far, I think there are enough evidences from authoritative sources that shows:
- The section is not based on flawed info.
- Higher MTBF does not mean higher LE. They are unrelated.
- Violations of all the policies listed above by Scaurty, well either he doesn't understand the policies or he's griefing.
NYCDA (talk) 17:04, 19 December 2007 (UTC)
--
NYCDA, you are saying that you are not representing a vendor/manufacturer, but all of your actions here are clearly giving evidence that you are representing a certain product that probably fails to attain high MTBF.
I also fully trust the prestige of CMU university, however, I do not see any official statements regarding MTBF vs LE in those links.
Please read more carefully again:
- "...An 100,000 MTBF HD can have a life expectancy of 2 years while a 50,000 MTBF HD can have a life expectancy of 5 years yet the HD that's expected to break down after 2 years is still considered more reliable than the 5 years one..."
- This means:
- You are probably facing a condition where your HD cannot attain 50,000 MTBF.
- At the same time, your competitor can deliver the exact same product with 100,000 MTBF.
- With this fact, you are trying to find an excuse by saying that your HD can have LE 5 years and your competitor can only have LE 2 years.
- My deleted section and my talked section were trying to emphasize the common misconception in calculating actual MTBF.
- This means:
- Using actual data (as what I emphasized in my article) will make your HD to have less than 50,000 MTBF
- Using actual data, it will be harder for you to compete your HD to attain 100,000 MTBF HD
- As a result, you are very aggressive to undelete your section and delete my section:
- By deleting my section, you will bring failed product to attain high actual MTBF
- By undeleting your section, you will bring failed product to have higher LE compared to its competitive products.
If you'd like me to comment about policy, then I would comment this section is breaking the policy of article's neutrality. It is also breaking the mathematical fact with your own flipping coin concept. There is nothing wrong with the probability theories, but remember, estimated/predicted event is an event that has not occurred yet (i.e. it is not an actual event).
Once again, I strongly support that this section has to be removed.
-- Andjohn2000 (talk) 15:44, 20 December 2007 (UTC)
- I told you I don't represent any one. The whole section is a made up example to make the point higher MTBF does not mean higher LE. This point is in both the pages I listed earlier in CMU. You got a problem with HD? Change it a widget for all I care. In fact, I'm going to drop the HD now. NYCDA (talk) 16:10, 21 December 2007 (UTC)
- Please do not bring in actual MTBF into this argument. I just like to point out that today's HD has a MTBF of 1,000,000 hours > 141 years. No manufacturer obviously done an 'actual MTBF' here and obviously no HD purchaged today is going to last 141 years. NYCDA (talk) 16:17, 21 December 2007 (UTC)
- Please refer to my previous statement on the talk section of "Reverting deleted MTBF and life expectancy":
- "...Why should I remove the links if there is a fact of having difference interpretation between MTBF and life expectancy? As long as wikipage does not make any conclusions, having some actual links should be fine..."
- If the Usenet FAQ had become a part of reading material in a university, then I will also encourage you to include those CMU links into this wiki page.
- I will fully support you if you can publish your article at CMU, not at this wikipage (unless you are griefing and find no place to publish it other than this wikipage)
- =>As other contributors have given serious concerns against your article's contents, can you now remove this section by yourself?
- What new conclusion did the section make? It illustrate how LE and MTBF are not related by showing a higher MTBF HD is more reliable then a lower MTBF HD regardless of which has higher LE. NYCDA (talk) 16:10, 21 December 2007 (UTC)
- What research have you done to make this statement?
- Scuarty (talk) 14:48, 23 December 2007 (UTC)
- What new conclusion did the section make? It illustrate how LE and MTBF are not related by showing a higher MTBF HD is more reliable then a lower MTBF HD regardless of which has higher LE. NYCDA (talk) 16:10, 21 December 2007 (UTC)
- Here's a section from the FAQ which can be found by going to the links in CMU's 2001 lecturer and readings
- What does MTBF have to do with lifetime? Nothing at all! It is not
- at all unusual for things to have MTBF's which significantly exceed
- their lifetime as defined by wearout -- in fact, you know many such
- things. A "thirty-something" American (well within his constant
- failure rate phase) has a failure (death) rate of about 1.1 deaths per
- 1000 person-years and, therefore, has an MTBF of 900 years (of course
- its really 900 person-years per death). Even the best ones, however,
- wear out long before that.
Is this so different from the section if you replaced "thirty-something" American with HD and wearout with LE? The section reads MTBF is an indication of reliability and clearly states A device with MTBF of 100,000 hours is more realible then a device with MTBF of 50,000 hours. It makes a point that without looking at LE (wearout), one cannot say 100,000 MTBF will last 2x as long as 50,000 MTBF. It makes a final point that since MTBF and LE has nothing to do with each other and MTBF is an indication of reliabillity, a device with 100,000 MTBF with 2 years LE is still considered more reliable then a device with 50,000 MTBF and a 5 years LE.
User Andjohn2000 wrote
- "...An 100,000 MTBF HD can have a life expectancy of 2 years while a 50,000 MTBF HD can have a life expectancy of 5 years yet the HD that's expected to break down after 2 years is still considered more reliable than the 5 years one..."
- This means:
- You are probably facing a condition where your HD cannot attain 50,000 MTBF.
- At the same time, your competitor can deliver the exact same product with 100,000 MTBF.
- With this fact, you are trying to find an excuse by saying that your HD can have LE 5 years and your competitor can only have LE 2 years.
- This means:
Again, those were made up #'s. I don't think there are any HD with such low MTBF today. And I do not represent a HD manufacturer as you keep on accusing me of. Also the section actually states the device with a LE of 5 years is not as good as the one with LE of 2 years due to MTBF factor. For arguement's sake, even if I was making a drive with LE of 5 years, I'm stating it's inferior than that of my competitor which only has 2 years LE. At no time did I state there was a real HD with 50,000 MTBF and 5 year LE. I made up 2 drives, one with MTBF of 100,000 and 2 years LE and another with MTBF of 50,000 and 5 years LE and compared the differences between 2 made up devices.
If you think section is not neutral then let's re-write it so that it becomes neutral. Right now, I'd say we focus on if the section is factually correct.
NYCDA (talk) 17:18, 21 December 2007 (UTC)
- Well, my previous questions are: why don't you write the article at CMU? Are you griefing and find no place to publish it other than this wikipage?
- The FAQ does not belong to CMU. It's only a reference/reading material for their students. No one in CMU/research institute will even dare to publish a theory without having thorough research/studies as what you had done here.
- Scuarty (talk) 15:06, 23 December 2007 (UTC)
- I agree research institutes are more appropriate for it, however, this wiki belongs to public (this also makes difference with current conventional academic/research institutes that everyone can share the knowledge here). It should be fine to maintain this section if it can become neutral, with all actual/facts covered rightfully there (to me, it is still not neutral in terms of expressing MTBF and LE). Cincaipatrin 15:41, 23 December 2007 (UTC)
- I need to know what you think would be needed to bring a neutral point of view. The whole point of the paragraph is to make clear reliability is solely determined by MTBF and LE has nothing to do with it by demonistrating a high MTBF with low LE is more reliable than low MTBF with high LE. I also did not want readers to confusion MTBF with LE. It might help if you review the history to understand why I put this section in in the first place see here http://en.wikipedia.org/w/index.php?title=Mean_time_between_failures&diff=128060639&oldid=127122008. My use of HD, 50,000 hours and 5 years in the text was out of convience because I edited the original text which also used HD and 50,000 and 5 years. I felt the original editor is confusion MTBF with LE. I also wanted to keep LE short since that's not the topic of this wiki.
- I would like to also point out that for the good of wiki community, it's important to settle the question of WP:NOT#PUBLISHER, WP:NOT#FAQ, WP:NOT:CBALL, WP:SRS, WP:NOR and others as well. Althought there are grey areas, I think one of us here clearly don't understand it properly. To better serve wiki, we need to all editors to understand them correctly. NYCDA (talk) 21:50, 26 December 2007 (UTC)
- There is no gray area in enforcing the policies. You have to adhere to the rules and are not suppose to define new theory (or even your own policies) in the main wiki page. Anyway, I move the article to this talk section for neutrality. Scuarty (talk) 07:30, 5 January 2008 (UTC)
- Looks nice, thanks. I added another section as well for the previous topic written by Andjohn2000. His topic seems to have relevancy to explain correlation between MTBF and LE. Andjohn2000, pls feel free to add comments there. Cincaipatrin 09:56, 13 January 2008 (UTC)
RFC: Article Written by user NYCDA
Article is moved from main page to this talk page to avoid further disputes.
MTBF and life expectancyMTBF is not to be confused with life expectancy. MTBF is an indication of reliability. A device with a MTBF of 100,000 hours is more reliable than one with a MTBF of 50,000. However this does not mean the 100,000 hours MTBF device will last twice as long as the 50,000 MTBF device. How long the device will last is entirely dependent on its life expectancy. An 100,000 MTBF device can have a life expectancy of 2 years while a 50,000 MTBF device can have a life expectancy of 5 years yet the device that's expected to break down after 2 years is still considered more reliable than the 5 years one. Using the 100,000 MTBF device as an example and putting MTBF together with life expectancy, it means the device should on average fail once every 100,000 hours provided it is replaced every 2 years. Another way to look at this is, if there are 100,000 units of this device and all of them are in use at the same time and any failed device is put back in working order immediately after the failure, then 1 unit is expected to fail every hour (due to MTBF factor). |
Controvercies
Guidelines |
Discussion references
- Reverting deleted MTBF and life expectancy
- ==RFC: Should MTBF and life expectancy be removed===
- (please add more sub-sections if needed)
- ...
- ...
Scuarty (talk) 07:12, 5 January 2008 (UTC)
- Why did you title this Article Written by NYCDA? It's completely sufficient to title this MTBF abd LE Section Disputes or something similar. Why make this personal? It's not my section. It's owned by everyone. NYCDA (talk) 17:25, 16 January 2008 (UTC)
- BTW you still haven't discussed how those guidelines are violated. I have provided explaination why I don't think they were violated. You should do the same. Just blantanly say it violates WP:NOR, WP:NOT#FAQ etc does not help with dispute resolution. I could just as easy say it does not violate those guidelines without explaination but then that wouldn't get us anywhere.
- You also listed many Pros (MTBF is not related with Life Expectancy) and Cons (MTBF represents Life Expectancy) yet you are not disputing any of them? Do you dispute all the source on (MTBF is not related with Life Expectancy) so that the section should be deleted?
- On a quick glance I dipute these as as not a reliable source of information.
NYCDA (talk) 21:24, 16 January 2008 (UTC)
- All answers for your questions should be there if you have re-read whole discussion chronologies more careful. The FAQ does not belong to CMU. It's only a reference/reading material for their students. No one in CMU/research institute will even dare to publish a theory without having thorough research/studies as what you had done here. I will fully support you if you can publish your article at CMU, not at this wikipage (unless you are griefing and find no place to publish it other than this wikipage). If the article is owned by everyone, why is it only you who are having concern of getting it deleted? Moreover, some users had brought serious concerns against your article.
Scuarty (talk) 16:16, 17 January 2008 (UTC)
- If you dispute the FAQ as valid source and you should, then the only guideline in question is WP:V. But as you have already gathered so many other pro sources, what is your problem now then? For the record I'm not the only one with concerns about deleting it. The responses to the RFT was 2 votes for, 3 votes against. To remove this, you must show this does not meet WP:V. So dispute all the sources you listed in the table as inaccurate and show good cause. The other against votes besides yours seems to be about neutrality. The obviously needs fixing but please see the problem with Lack of neutrality as an excuse to delete here http://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_view/FAQ#Lack_of_neutrality_as_an_excuse_to_delete. It's better to add material to balance NPOV then to delete it out right.
NYCDA (talk) 17:14, 18 January 2008 (UTC)
- Come to think of it, aren't you being hypocritical? You challenged the use of a Usenet FAQ as a source while you are using repairFAQ to back up your challenge? All these smoke and mirror with those guidelines when the issue should've been WP:V. So disregard all the FAQs and let's focus on those source that are not FAQs. Which source are you challenging now?
- Let's take a closer look at the con reference you have provided.
- www.repairFAQ.org is WP:SRS
- dearauthor.com's credibility is in question. Do you honestly belive dearauthor knows much about technology?
- www.cranecams.com. The only referece here is they say they run 5,000 hours MTBF test to qualify LE. It does not say how or if MTBF is related to LE.
- www.irrigationcraft.com states sleeve bearings exhibit a longer life expectancy (MTBF) Maybe for them LE = MTBF, but this is because they do not renew the bearing. MTBF clearly states failed parts are renewed
- www.crouzet-usa.com is a forum. It does not pass WP:V and should've failed your own criteria for WP:SPS, WP:NOT#PUBLISHER.
- www.maran.com is a dictionary. Everyone knows dictionary meaning can differ greatly from technical/scientific meaning. Consider the dictionary meaning of speed and velocity and their physics counter part to see how much they differ. This is a poor source.
- www.a-tca.com says Liquid cooling represents a marked increase of the life expectancy of critical components (MTBF) again it does not say the components are renewed and this again should've failed your own WP:SPS criteria.
- Now for the pros
- Let's dis-regard these first: links to www.ece.mcu.edu and optoelectronics.perkinelmer, langa.com, stinet.dtic.mil since they don't apply to the specific issue here.
- That leaves
- db.usenix.org this report clearly states HD has a MTBF of 1,000,000 hours while nominal lifetime (LE)was only 5 years.
- www.solidkor.com it clearly states HDD life expectancy is shorter than its MTBF
- I'm going to include one more source from http://www.mathpages.com/home/kmath498.htm. First paragraph from this source is
- Suppose we're given a batch of 1000 widgets, and each functioning widget has a probability of 0.1 of failing (= MTBF of 10 days) on any given day, regardless of how many days it has already been functioning. This suggests that about 100 widgets are likely to fail on the first day, leaving us with 900 functioning widgets.
- Compare this to
- it means the device should on average fail once every 100,000 hours provided it is replaced every 2 years. Another way to look at this is, if there are 100,000 units of this device and all of them are in use at the same time and any failed device is put back in working order immediately after the failure, then 1 unit is expected to fail every hour (due to MTBF factor).
- the numbers are different but the math is completely identical. The other point the original text makes are
- MTBF is an indication of reliability. -- this has been covered within this wiki many times. This wiki shows MTTF/MTBF = 1/failure rate. Failure rate is an indication of reliability obviously.
- A device with a MTBF of 100,000 hours is more reliable than one with a MTBF of 50,000. -- obvious given the above.
- It then made up 2 devices with MTBF of 100,000 and 50,000 hours and LE of 5 and 2 years and says
- An 100,000 MTBF device can have a life expectancy of 2 years while a 50,000 MTBF device can have a life expectancy of 5 years yet the device that's expected to break down after 2 years is still considered more reliable than the 5 years one. -- this does not detract from above since reliability is determined by failure rate not LE. It correctly states a made up device with MTBF of 100,000 is more relilable then that of MTBF=50,000.
- Finally, even if all the cons are correct, this still does not mean the text should be removed. Consider 00, on one hand it's equal to 1. On the other it's undefined. Both can be proven, both are sourced so the wiki on it includes both sides. If you can prove LE=MTBF and source it, then the correct course of action for you to take is to add text to the wiki to show it, instead of remove the section because it contradicts with what you find. You have also previsouly stated you were unfamilar with technical aspect
NYCDA (talk) 21:21, 18 January 2008 (UTC)
- Well, I saw your post at Zero divided by zero = Error 2. Are you having problem on defining another new theory 00 = 0/0 = 1? I also saw other user's response (which I do agree with him): are going to create a new section - WP:RD/NYCDA'sNewKindOfMath?
- I hope you are not going to become a vandal here. All fellow editors will appreciate your long write-up here, but please note this is not a place to publish a research. Academic or research institute will be more appropriate to it. You are free to dispute all links stated in the cons. Other users can have it equally as well. I am welcome to accept the disputes as I do know there is a fact of having difference interpretation between MTBF and LE. However, disputing the links does not mean you can just make a conclusion and publish your own theory whereever you want, biasing to a certain interpretation without having thorough research/studies. Bring it to the academic/research institute (like CMU, Stanford, etc) and convince them to accept it. If your theory is right and supported by valid data, it should be easy to get official publication like IEEE trans or journals. But if it is wrong, then you will find no place to go, anywhere even in a public place like this wiki here.
Scuarty (talk) 13:33, 19 January 2008 (UTC)
- Whether there're any validity to the "con" links are irrelevant. I only need to show the "pro" sources pass WP:V. I disputed your "cons" to show the hypocrisy of dismissing one source as WP:SPS while using a source that's obviously WP:SPS to dismiss it.
- I belive I have shown the source I used are valid and therefore it cannot have violated WP:V much less any of the others you have brought up (which I feel is an effort to confuse me as I have spend the better of last 2 months trying to address this). If you feel the section is "make belief" then show it. Dispute the "pro" sources. And please show some etiquette, either list sources that are verifiable or don't list them at all. I have also laid out a detailed explaination of each and every line of the section and why I think they are correct and verified. Show and explain why you disagree with them. NYCDA (talk) 23:18, 22 January 2008 (UTC)
The concern against this article is the statement that "MTBF is not related with LE". It is biasing and will give benefit to the manufacturers that are unable to attain high MTBF in their products. If it can cover both pros and cons, it would be fine to have the article. When both topics are written together side by side, the article becomes neutral. Cincaipatrin 12:56, 20 January 2008 (UTC)
- Do you have any suggestions for neutrality? Isn't this the real problem and not some bogus guideline or WP:V issues? Do everyone agree this is the only problem? I have already pointed out the problem with Lack of neutrality as an excuse to delete so I really did not understand why the section was deleted again. I also know all editors are biased and realized that I'm biased but because I AM BIASED, I have a hard time seeing what's violating neutrality so it would be extremely helpful if you can point that out or offer suggestions. I cannot tell which statement is pushing "MTBF is not related with LE". I tell readers don't confuse MTBF with LE as in MTBF of 100,000 hours does not mean LE of 11.4 years. I also wrote the device with 100,000 MTBF is considered more reliable then the device with 50,000 so I believe I'm actually promoting higher MTBF=better product.
- If everyone just have a problem with neutrality, can we close this arguement with Scuarty? We're better off working on content to bring neutrality. Perhaps adding "consumers would find better value by purchasing the device with 50,000 MTBF but a LE of 5 years instead of the device with 100,000 MTBF but a LE of 2 years since LE will be reached far earlier then when MTBF will become a factor" or somewhere along this line?
- NYCDA (talk) 23:18, 22 January 2008 (UTC)
- I re-tag the RFC as I am not the editor and do not write any articles to it.
- But as a reader, I dispute the article as it is biasing:
- "MTBF is not to be confused with life expectancy" gives impression that MTBF is not related with LE.
- Statement of "An 100,000 MTBF device can have a life expectancy of 2 years while a 50,000 MTBF device can have a life expectancy of 5 years" can be exploited by some manufacturers that are unable to attain high MTBF in their products.
- Scuarty had said this wikipedia appeared easily at top of search engine hits (e.g. Google, Yahoo, etc). Andjohn2000 had also said that some parties could make an excuse to promote their failed products to be good/better with higher LE. This is where all the disputes come into it.
- Cincaipatrin 11:26, 26 January 2008 (UTC)
- I don't think Scuarty ever said anything about search engine in the talk pages. He's objection was purely on the wikipedia guidelines which is why I opened the RFC after our little edit wars. I never closed that RFC althought I think I might've done it incorrectly while I opened it. So given the way things are now, can I correctly assume the matter with Scuarty is resolved?
NYCDA (talk) 22:55, 28 January 2008 (UTC)
The distinction is simple: MTBF is life expectancy for a device that is on all the time. For things that you turn on and off, the MTBF divided by the fraction of the time they are on is their life expectancy. It really is that simple. MilesAgain (talk) 11:19, 27 January 2008 (UTC)
- By just trying to read this article at glance, one will suddenly dispute the sentence: "MTBF is not to be confused with life expectancy":
- Life expectancy is an expectation of life span.
- MTBF, which is mean/average time between failures), is a parameter to estimate/predict life span of the device.
- Sometimes the prediction can be wrong or not accurrate, but it does not mean they are not related each others. The sentence "How long the device will last is entirely dependent on its life expectancy" would be also misleading as it gives a message to the reader that MTBF is not related with life expectancy.
- Pintubigfoot (talk) 05:43, 31 January 2008 (UTC)
- This is precisly why I'm so admant on keeping this section. See this document (http://www.seagate.com/support/disc/manuals/scsi/savvio_pm.pdf) from Seagate. On page 27 (search for Service Life). It lists the service life as 5 years but MTBF = 1,400,000 hours. Do people really think today's HD with typical MTBF > 1,000,000 hours is going to last 100 years on average? In this next link (http://www.seagate.com/ww/v/index.jsp?vgnextoid=8c045cf536b43110VgnVCM100000f5ee0a0aRCRD&locale=en-US) you'll see the HD is run 24x7 with a MTBF=1.2Million hours, so in response to mileshigh, MTBF is not LE/fraction of time it's on. I'm not trying to mislead anyone. I'm trying to tell everyone MTBF is not related to LE. NYCDA (talk) 22:44, 31 January 2008 (UTC)
A manufacturer quoting an MTBF of 1,000,000 hours may have run 10,000 new items for 1,000 hours with just 10 failures. The alternative of trying to run 10 items for 1,000,000 hours (a better measure of LE) isn't only impractical. It also gives a less impressive result, because MTBF can decrease significantly as a product ages and its components decay. Although I can't cite sources, I think it's obvious that a disk that has already served for 100,000 hours will have an MTBF of much less than 900,000 hours and an even lower LE. MTBF and LE can't be connected by an exact mathematical formula, but they are related because good design and materials will tend to raise both. It's probably right to include a short section about this point, without suggesting that LE is more important than MTBF. It is true that stopping and starting can reduce LE but the rest time between can increase it. However, that is a point about how patterns of use affect LE (not how MTBF affects LE) and may not be relevant to an article on MTBF. Certes (talk) 13:51, 4 February 2008 (UTC)
- How do you justify the correctness of MTBF calculation in the advertised HD? (e.g. http://www.seagate.com/ww/v/index.jsp?vgnextoid=8c045cf536b43110VgnVCM100000f5ee0a0aRCRD&locale=en-US)?
- Most HD manufacturers obtain MTBF with wrong calculation. They sum all running hours from all HDs even though some of them do not get any failures yet. Of course this kind of "fake MTBF" will shut-up to a very high number. "Time Between Failures" requires at least 2 consecutive failures in a device to get its duration for the MTBF calculation.
- Let say you produce 2 billion LEDs (or even more) in 1 batch. Afterwards you run a test to all of them for about 1 sec and notice that 2 LEDs to get blown. It is silly that you then publish the MTBF of those LEDs to have 1 billion seconds for a small test that has just gone through for 1 sec duration run. So, solely using HD MTBF as an example is not appropriate here. The MTBF calculation of HD is wrong and confusing.
- To be more neutral, you have to take more examples rather than HD:
- MTBF is not only used by HD, but many more like compressor, heater, rotor, LED, etc.
- MTBF is a generic calculation for mean/average of time between failures.
- Time Between Failures requires at least 2 consecutive failures of a device to get its duration. The 2nd RFC (article by Andjohn2000) has provided a good example on how to prevent wrong calculation that makes MTBF to become very high or unrealistic.
- When MTBF is calculated with correct formula, it will show the relationship between MTBF and LE. For some users (like user MilesAgain), they can also say MTBF is LE for a device that is on all the time.
- Pintubigfoot (talk) 07:36, 13 February 2008 (UTC)
So, does it mean this article does not provide right content? Is there still a way to adjust it? Some readers may be still confused with MTBF and LE. If the HD manufacturers do have done wrong calculation in advertising MTBF, I think it would be good to have an article to cover this misconception. Cincaipatrin 14:04, 27 February 2008 (UTC) —Preceding unsigned comment added by Cincaipatrin (talk • contribs)
RFC: Article Written by user Andjohn2000
This topic is brought here for better neutrality against the discussion Article Written by user NYCDA.
- "Actual MTBF" calculated from actual events seems to be unaccepted by few users. There was a concern that calculation of this MTBF would result low MTBF to the devices when there was no failure yet in those devices. It was also deleted by user NYCDA several times as its contents seems to be contradicting with his controvercy topic "MTBF and life expectancy".
- User Andjohn2000 explained about actual and estimated data with an example calculation of human and fetus population. However, he did not discuss it more in detail on how to calculate estimated MTBF. He mentioned estimation formula was beyond of his scope and refused user NYCDA's request to source copyrighted material from E10-0304 doc (http://www.semi.org)
Mean time between failures versus mean operating timeSome confusions may arise when "operating time" is used as the parameter to obtain the MTBF calculation. The graph could give an impression that Example: Assuming each the item is fully running 100% during the operating hours: Correct MTBF calculation: Wrong MTBF calculation:
Some industries may use certain formula to calculate "estimated MTBF" from all collected operating hours. This may include estimation of zero failure condition for the items that have not got any failures yet. |
Discussion references
Cincaipatrin 09:41, 13 January 2008 (UTC)
- Please note I never disputed the neutrality of this section. I deleted the section because it wasn't sourced. I have attempted to verify this but couldn't. No additional material was provided so I deleted it.NYCDA (talk) 22:11, 15 January 2008 (UTC)
- I don't think this topic need a source, just only to verify generic mathematical formula like 1+1=2 and 1+1+3 != 3.
- It is only explaining actual MTBF and does not go beyond to that scope.
- For estimated MTBF, it is already out of scope.
- Interested readers may refer to E10-0304 doc, however, it will be wrong for me or anyone to paste its contents to this wikipage, as this doc has copyright belonging to SEMI. e-shop is available there for purchase, but after getting it, it should be used as personal reference only.
- Deleting this article as an excuse to force me or someone to show E10-0304 contents here can be considered as criminal act. I hope user like NYCDA do know what is copyright and can adhere to copyright regulations.
- Other than that, I am fine to share the thoughts.
--
- I tag the RFC. Cincaipatrin 11:38, 26 January 2008 (UTC)
You keep on bring stuffs that's irrelevant to the table Andjohn2000. Whether I want to get an illegal copy of this document or whether I represent the interest of a manufacturer even if they were true are completely irrelevant. BTW, I suggest you check out criticism of wikipedia while you're at it. Comments like "Is MS paying you to say that" is irrelevant and lowers the quality of the discussion. The only problem here is verifiability. If the source you used cannot be referenced due to copyright laws, you could easily provided another source. If there is no other source they you need to reconsider your position since wikipedia article also needs to meet critical mass and mustn't be on the fringe of theory. I will state the bottom line again, provide a verifiable source and then include it in the article. NYCDA (talk) 22:49, 28 January 2008 (UTC)
- I don't see any theories in this article to require a source.
- It is silly to look for a source for an article that just wants to say "4*6 is four multiply by six":
- MTBF, Mean Time Between Failure, is an average of time between failures.
- If there is no failure, then it should not be included in the calculation.
- That's it. Simple.
- Pintubigfoot (talk) 05:59, 31 January 2008 (UTC)
- Mathmatical formulas need to be sourced. Actual calculation need not be sourced. Anyone with a calculator can verify that. The parts that need sourcing here are there is no actual MTBF for total 5 items (as the remaining data from 3 items are not available yet) and the thory of Actual MTBF in general. Has Actual MTBF reached notability yet? NYCDA (talk) 22:28, 31 January 2008 (UTC)
- MTBF = mean time between failures. That's the reference.
- Those remaining 3 items do not possess any failures yet, thus they cannot be included in the calculation. Simple.
- Pintubigfoot (talk) 07:46, 13 February 2008 (UTC)
- Mathmatical formulas need to be sourced. Actual calculation need not be sourced. Anyone with a calculator can verify that. The parts that need sourcing here are there is no actual MTBF for total 5 items (as the remaining data from 3 items are not available yet) and the thory of Actual MTBF in general. Has Actual MTBF reached notability yet? NYCDA (talk) 22:28, 31 January 2008 (UTC)
What is the RFC about?
You guys posted an RFC. So apparently you want input from outsiders. However, from the mess on this talk page it is very difficult to figure out what the dispute is about. Please provide a concise, NPOV statement of the dispute together next to the RFC template, so that we can help out. --Zvika (talk) 19:29, 27 January 2008 (UTC)
- The original RFC was opened by me when Scuarty and myself had an edit war. It was whether a section now being labelled Article Written by user NYCDA violated these wikipedia guidelines: WP:SPS, WP:NOR, WP:NOT#PUBLISHER, WP:NOT#FAQ, WP:NOT#CBALL. I think no one agrees with Scuarty since all the discussion is about neutrality but Scuarty is still pressing this.
- A 2nd section for RFC is Article Written by user Andjohn2000 which I removed from the wiki because of verifiability. Andjohn2000 states there is a source but he cannot provide it since it's copyright by the publisher. NYCDA (talk) 22:53, 28 January 2008 (UTC)
- The 1st article ("Article Written by user NYCDA") is too controvercy. When it does not have enough research/credibilities to proof the theories, readers could easily see it as a potential violation of the policies.
- For the 2nd one ("Article Written by user Andjohn2000"), I don't think there will be any formal sources available to explain "4*6 is four multiply by six" (except the schools, of course). There should be no concern found in this article as it does not write any new theories to it.
- Pintubigfoot (talk) 06:28, 31 January 2008 (UTC)
There appears to be a misunderstanding over semantics plus a lack of real world examples.
It would help in my opinion to mention in the introduction that MTBF is a term that is given to a 'set' of metrics used to:
- a)measure actual failure in service or estimate the reliability of a device when built to a stated standard, or
- b)to a specified cohort of those devices in service when used within the terms of the manufactures warranty. etc., etc.
The definitions are designed to to suit the nature of the device and its application, etc., (the debate on the talk page suggests that some think there is only one). It would help to avoid confusion also by including terms MT before F and MT to F.
Most definitions as well as how to test and calculate are now laid out exacting detail, in recognised standards (e.g. MIL-STD-781C).
Also, it would help to expand and to explain more of the terms in the context that they occur. (not forgetting MT between & before & to)
Then give more examples (the section titled Overview might be better being called 'example' for say a food processing factory when one needs to know the TBF for the purposes of planing maintenance -in this case the formula given is OK if it suits the purpose). For another example: When an electrical product comes of the production line it very often does not work or fails very early. (infant mortality) therefore it is normal practice for equipment to be put on soak. This is to catch early failures. For a piece of industrial equipment this might mean being put in a rack along with many others, switched on and left for two days. A typical time for equipment to go in aircraft would be two weeks and for satellites two months or more is not uncommon.
Aero engines are bench tested and so on. Again these times are specified in the respective standards for that type of equipment.
[1] attempts to give another real world example to get another TBF across:
A "thirty-something" American (well within his constant failure rate phase) has a failure (death) rate of about 1.1 deaths per 1000 person-years and, therefore, has an MTBF of 900 years (of course its really 900 person-years per death). Even the best ones, however, wear out long before that. Of cause, an epidemiologist would not use the term TBF but instead use the phrase Incidence rate. Incidence: failure within a specified time interval. For the above example Incidence of mortality in the working population might be fitting. Whosis gives a whole range of them[2] if your interested in that sort of thing. This way of considering MBF is good for hard drives as explained in the linked to essay. Manufactures of those engines used by twin engine jets that fly the Atlantic like to advertise that they have never had twin failures on that engine type -ever. Then in the small print point out that this is only for un-related causes. So both can fail because they get swamped in very heavy rain or because they get snuffed out by volcanic ash but their PR can still claim that the TBF in service is still infinite. It depends on how one frames it. The article also badly needs the bathtub graph. That would help to illustrate the point that NYCDA is striving to make. This site explains it pretty well http://www.weibull.com/hotwire/issue22/hottopics22.htm See also Failure_rate for more info and to make sure your not duplicating.--Aspro (talk) 12:50, 3 February 2008 (UTC)
- I find the Bathtub curve is all ready on WP and 'soak' is covered by Burn in so they only need mentioning in the text.--Aspro (talk) 13:19, 3 February 2008 (UTC)
Overview formula - source of confusion
In this illustration, downtime and uptime refer to the absolute instantaneous time at which the event occurred. For each observation, the time of going down happens after the time of going up, so downtime > uptime. The difference (downtime - uptime) gives the time between failures for that observation.
Some readers seem to interpret uptime (downtime) as the amount of time being up (down) but that's not what is being illustrated here. Tayste (talk - contrib) 20:31, 24 March 2008 (UTC)
- I was going to tag the section accordingly... and since it wasn't already done, I did! I absolutely agree, never saw "down time" and "up time" used in that sense. I couldn't understand the section. The image should be revised. For "downtime", "moment of failure" can be used instead. "uptime" is less easily replaceable - it could be the moment of first start or the moment of restart. --Chealer (talk) 17:55, 18 August 2014 (UTC)
Untitled
There is a lot of confusion in this article with respect to differences between MTTF and MTBF. It should be noted that both these terms mean the same when we are talking about unrepairable equipments. In this particular case, as the equipment fails and cannot be repaired it doesn't make much sense talking in MTBF, still both terms are used. In this case we have that (1) and this sitution is not explained in the above illustration.
The above illustration reports to a second situation in wich the equipment can be repaired, hence it may have several failures and repairs. In this particular case, it is defined that . The mean time to failure (MTTF) has to be interpreted as the time that the equipment is functioning and accoording to the illustration above it gives
,
hence,
Formula 1:
The MTBF, as the name says, is the Mean Time Between Failures, which means that it should be
Formula 2:
It is important to note that the Time Between Failures is the time that lapses between two consecutive failures and not the time between the end of a repair and the next failure, the later would be the Time to Fail. It is also important to notice that for repairable items the equation (1) is no longer valid. For this case the equation should be
--Senalopes (talk) 11:14, 05 December 2008 (UTC)
--
- This topic should relate to previous section about "Practically, MTBF = MTTF".
- There is no right or wrong whether to use 1st or 2nd formula.
- 1st formula is the practical MTBF used in our daily life and the 2nd one is the fact in mathematical field.
- "time between failure" versus "distance between the wall" is a simple analogy of why we should accept that MTBF = MTTF. If someone ask you "what is the distance between the wall?", then your answer should be "100 meters" (not 110)
-- Andjohn2000 (talk) 14:53, 20 December 2008 (UTC)
- You shouldn't consider that both terms mean the same just because in "practical" life they are both used for the same meaning. There's a definition for each one of them and an encyclopaedic article should be made according to that definition, so that inconsistencies like the one observed in the topic "Still Confused" could not be found.
- So, in order to make the formulas and the definitions in the article consistent with each other I recommend the usage of
- and,
- The distance between walls analogy doesn't represent this situation: a wall has a certain width (in the example 10 meters) but a failure has not. A failure is something that occurs in an instant when an equipment becomes unavailable and should not be considered as the time being repaired. So, when considering the time between failures your answer should be "110 meters" (not 100), although this doesn't make much sense when talking about walls.
--Senalopes (talk) 14:57, 06 January 2009 (UTC)
Still confused
I think this article is very confusing. The picture and the equations make it harder, not easier to understand.
An example. There are three two-day failures within 30 days. What's clear to me is that MDF and MTTR both equal two days. I think the failure rate is 3/30 (three failures within 30 days). But what are the values of MTTF and MTBF in this case? Which one is 30/3 and which one is 24/3 ? Initial paragraph of the article doesn't make it clear. The picture suggests that MTBF is 24/3. So does the first formula for MTBF. The first formula for MTTF suggests that MTTF is 30/3. It is preceded by a statement that MTBF=MTTF+MTTR. So MTBF=30/3+2=12 ? --0xF (talk) 13:58, 25 November 2008 (UTC)
Please see the discussion in the above section --Senalopes (talk) 11:25, 05 December 2008 (UTC)
Your explanation makes it clear that in my example MTBF=30/3 and MTTF=24/3. The key is to understand "failure" as an event and not as a state - the link to failure increases confusion. --0xF (talk) 16:55, 23 March 2009 (UTC)
Disk drives
I feel the mention of disk drive failure rates should be avoided in technical detail, because it forces discussion of two different models of the drive life, which is distracting to resolve in the article itself. I take this article to be a more general article about MTBF in an idealized engineering/mathematical/modeling sense.
I'm not opposed to discussing the drive life issues, but it seems like that could easily be accommodated in an article about the drive manufacturer's metrics, and cross referenced with this article.
70.250.178.197 (talk) 03:09, 6 February 2009 (UTC)
I run into more persons who wish to be confused because accounting for repair time makes their numbers look worse. Thus, a desire to cook their stats drives the confusion.
MTBF is simple as a measure and extremely useful in product quality, insurance estimates and computing expense due to failures.
Most of the desire to not be confused comes from the consumer uses of MTBF.
If I have as stack of RAID drives and I want to know how many hot swap spares I might want, MTBF is very handy. But if I sell the RAID drives I might want to compute MTBF in a favorable light, especially if it is long enough that I might escape the customer's notice. I could be tempted to charge more for "higher" quality drives.
For Information Security Defects, the Time Between Breach, I might not want to include down time, clean up of affected systems. It just makes my reporting metrics look better. But Flaw Free Process time minus Flawed Process Time divided by breach events can give me good metrics on how often to cost out the damage of a breach for Process Quality Engineering and Cyber Insurance Reasons.
Also, MTBF is the inverse of Velocity as it is used in Risk Management.
Time/Failure is the inverse of Failure/Time.
Lambda/Time: Odds * Trials/Time from Poisson Statistics Fame.
MTBF = 1 / (Lambda/Time) = 1/ (Failure_Odds/Trial * Trials/Time) = <Average Time> / <Average Failure Events> hence "Mean" Time between Failures.
Statistics thus become possible.
Odds of no failure in interval Time: exp(-Lambda/Time * Interval_Time) = exp(-Interval_Time/MTBF)
Confidence Intervals in Failure: Lambda/Time * Interval_Time + Normsinv(Confidence_Level) * sqrt(Lambda/Time * Interval_Time)
Note: for Lambda below 5 Normsinv(Confidence_Level) drifts toward an approximation.