Talk:Write amplification
This is the talk page for discussing improvements to the Write amplification article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
Write amplification has been listed as one of the Engineering and technology good articles under the good article criteria. If you can improve it further, please do so. If it no longer meets these criteria, you can reassess it. | |||||||||||||
| |||||||||||||
Current status: Good article |
This article is rated GA-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||
|
References
[edit]There are a lot of dead links in the refernces, mostly PDFs that have vanished. What is the policy about archive or wayback links here? Someone needs to update the citations, with dead link if nothing else. If I have time, I'll see what I can do as well. — M3TAinfo (view) 12:29, 10 November 2016 (UTC)
Data is vs. data are
[edit]I wanted to clarify my use of the term data in this article before anyone proposing to change my original use of "data is" vs. "data are" was challenged.
The original definition of Data (plural) and Datum (singular) is very clear. However there is an additional definition for Data as a "mass noun" that is not often cited.
http://www.merriam-webster.com/dictionary/data
- “Data leads a life of its own quite independent of datum, of which it was originally the plural. It occurs in two constructions: as a plural noun (like earnings), taking a plural verb and plural modifiers (as these, many, a few) but not cardinal numbers, and serving as a referent for plural pronouns; and as an abstract mass noun (like information), taking a singular verb and singular modifiers (as this, much, little), and being referred to by a singular pronoun. Both constructions are standard. The plural construction is more common in print, perhaps because the house style of some publishers mandates it.”
Water, wheat, and information (mass nouns) are similar examples to the use of data in this article. You can have “four cups of water”, “five pounds of wheat”, “nine pages of information”, and “500 bytes of data”. Water, wheat, and information are not plural in this case, but really mass nouns as is data in this usage. You can say “water is wet”, “wheat is grown”, “information is vital” and “data is saved” because they are all mass nouns.
Also note the following comment from AskOxford.com:
http://www.askoxford.com/asktheexperts/faq/aboutgrammar/data
- “Strictly speaking, data is the plural of datum, and should be used with a plural verb (like facts). However, there has been a growing tendency to use it as an equivalent to the uncountable noun information, followed by a singular verb. This is now regarded as generally acceptable in American use, and in the context of information technology. The traditional usage is still preferable, at least in Britain, but it may soon become a lost cause. Compare with agenda.”
Please review any comments about this topic here before making any general changes to the main article. § Music Sorter § (talk) 16:32, 14 July 2010 (UTC)
Calculation of Write Amplication correct?
[edit]with some right it could also be:
Simple write amplification formula
instead of:
Simple write amplification formula
using the second formula writing 16kBytes into 256kBytes sectors would result in WA of 1, whereas the first formula would give 16. —Preceding unsigned comment added by 95.117.255.6 (talk) 07:39, 26 July 2010 (UTC)
- The math you stated would be true if the SSD did not consolidate writes in any way. The point of the title "simple formula" is just that: simple. As well the article is not covering primary research, but secondary research on what is already published on this topic. If the formula proposed can be found from a reliable source we should add it as an alternate formula, but we should consider it a more "advanced formula" vs. the current "simple formula." All the current resources cited describe math that produces the currently listed formula. § Music Sorter § (talk) 15:24, 9 August 2010 (UTC)
- I probably should have worded "writing 16kBytes and no bit less, and no bit more into 256kBytes sectors". The formula was meant to address the bytes actually written on the physical level (so true independent of any level above it like write consolidation or garbage collection). It could be a matter of concern that the result of the first formula and the second formular in this case deviate by a factor of 16. Unfortunately I do not have a reliable source of primary research to point to. As I side note, I'd like to mention that I would not expect to have primary research referring to the term write amplification other than is commonly named as. I do not see how this term could match non marketing wise (amplified? stronger? enhanced? ample?) nor marketing wise (amplification likely is positively connoted whereas it is strictly negative here). Note also the relevant charasteristic of flash named in datasheets in this context is the number of PE-cycles (program-erase cycles) so a formula not containing program-erase or erase (or a directly related term (bytes written alone does not do)) is not likely to adequately address the scarce resource. 95.117.246.80 (talk) 21:24, 17 August 2010 (UTC)
GA Review
[edit]- This review is transcluded from Talk:Write amplification/GA1. The edit link for this section can be used to add comments to the review.
Reviewer: -- Cirt (talk) 10:58, 20 September 2010 (UTC)
- I will review this article. -- Cirt (talk) 10:58, 20 September 2010 (UTC)
Successful good article nomination
[edit]I am glad to report that this article nomination for good article status has been promoted. This is how the article, as of September 24, 2010, compares against the six good article criteria:
- 1. Well written?: Writing quality is pretty good throughout, however, would strongly suggest going for a peer review post-GA, where input could be solicited from copyeditors and users previously uninvolved with the article.
- 2. Factually accurate?: Duly cited throughout. Good use of secondary sources, on a difficult and esoteric subject matter.
- 3. Broad in coverage?: Covers main aspects, however, going forward towards peer review and upwards in quality in the future, would recommend expanding subsections: Impact on performance, Product statements, and perhaps add in some additional analysis and commentary from secondary sources.
- 4. Neutral point of view?: Passes here. Presented in a neutral manner.
- 5. Article stability? Passes here. No major conflicts or issues, upon inspection of article edit history, and article talk page history.
- 6. Images?: I moved some images from being hosted locally, to instead be at Wikimedia Commons. These check out as appropriate.
If you feel that this review is in error, feel free to take it to Good article reassessment. Thank you to all of the editors who worked hard to bring it to this status, and congratulations.— -- Cirt (talk) 20:58, 24 September 2010 (UTC)
- Cirt, thanks for the review, updates, and promotion of the article. When I get a chance I will take it through additional peer review as recommended. Thanks. § Music Sorter § (talk) 04:06, 29 September 2010 (UTC)
Articles to create
[edit]Redlinked articles from this article's page, to create at a future point in time. Cheers, -- Cirt (talk) 20:51, 24 September 2010 (UTC)
Astroturf and meaningless statements
[edit]Skip the blatant SandForce astroturf commercial about 0.5 write amplification. According to the formula in the article itself, that would mean that the drive stores only half of the bytes given to it by the operation system. In other worse, according to the formula in the article, the drive is losing half the information which clearly is not a good feature. The claim is deceptive if it is done by using compression since compression is a concept that is orthogonal to write amplification and should not be counted when calculating write amplification. The links given on this SandForce commercial don't even mention the word "amplification" at all, so there needs to be real references instead. Wikipedia should not be a commercial for SandForce and if we are going to talk about write amplification levels below 1.0 then we must define what that means. By the definition of write amplification in the article, the claim makes no sense other than as a claim that a drive is losing information. 24.59.190.64 (talk) —Preceding undated comment added 19:18, 23 March 2011 (UTC).
Edit: OK, so the first reference does talk about a 0.5 write amplification, but it does so on a later page. The direct link is http://www.anandtech.com/show/2899/3. However, that page merely documents that SandForce claims to have a 0.5 write amplification, and then only speculates about what they might mean by that. The speculation is indeed that there must be compression going on. That is deceptive. In that case the SandForce controller may cause actual write amplification of 100x, yet have a write/deception amplification of just 0.5 when writing highly compressible data. Since there is no linked hard information about what the claimed 0.5 write amplification means, we should not repeat such loose statements. —Preceding unsigned comment added by 24.59.190.64 (talk) 19:30, 23 March 2011 (UTC)
- 24.59.190.64, I am not sure I understand the reference to astroturf in your statements above. Both Intel and SandForce make claims about write amplification. Are you saying one is more relevant than the other?
- I see in your follow up edit you realized the source referenced had more than one page and you now understand how a drive can get a write amplification below 1 and you no longer feel it is deceptive. Since many online articles will appear on multiple URLs, there is no requirement to explicitly state what page of an article the reference is made unless it is a particularly large article. I'm sure if we think it is confusing we can update the link to be more specific.
- Your follow on comment says the source speculates as to how SandForce might be achieving the reduced write amplification. I agree it is. The paragraph you are referencing in the article is citing what product claims companies are making. They are fully sourced so I don't think they would be considered meaningless statements. SSDs with flash memory must do everything possible to reduce the number of times they write and rewrite data to the SSD. During garbage collection and wear leveling the SSD will write data more times than the host requested. This gives rise to the "amplification" of those writes. An SSD that can do things to reduce the number of time it writes data to the flash or how much data it writes to the flash would be better because it will enable the flash to "last longer" and if it is writing less to the drive initially then it gets done writing data sooner than other drives. That would give that SSD a faster write time than any other drive. Last time I checked third party reviews of the SandForce drives were showing actual performance tests which are much higher than many other drives. I don't know of any simple way to be faster than other SSDs than to write less data, which would be the result with a lower write amplification. Since this article is all about write amplification I think any information related to that subject is likely relevant. § Music Sorter § (talk) 05:47, 17 June 2011 (UTC)
marketing gag
[edit]I moved the following text here, because it is a replication of a marketing gag and no information:
- Until 2009, it was assumed that write amplification could not drop below one, but that year SandForce made the claim they had a write amplification of 0.5.<ref name="Anand_WA">
There already has been some discussion about this, see preceding section Astroturf and meaningless statements. I should have simply deleted this, but I already wrote the edit summary for the article edit -- Tomdo08 (talk) 18:21, 16 June 2011 (UTC)
- Tomdo08, I reverted your changes for the moment while we discuss your proposed modifications. Your additions appear to violate the WP:NPOV, WP:VERIFY, and WP:NOR rules. Your modification to the Product statements section appear to be your opinion without any source reference and are appear to me as being worded in a controversial and non-encyclopedic manner. You said there had already been some discussion about this in the section you mentioned above, but I only see one unregistered user made some opinionated statements as well. That would not constitute a discussion. You may not realize, but this article already passed the review criteria for WP:GA which required it be free of any WP:NPOV elements. Certainly there is always room for improvement and it is very possible someone missed something in that review. I would be happy to discuss how you and other editors think we should update the article if there is any debate about the truth of the content in the current article as it is worded now. (Note that I did not revert your other edits to the article which did not violate the rules I mentioned above.) § Music Sorter § (talk) 05:19, 17 June 2011 (UTC)
Gibibyte vs Gigabyte
[edit]It would be helpful if in this text ISO standards of binary and decimal prefix notification would be adhered to. gibibyte would be the binary standard were gigabyte would be the decimal standard of prefixing. — Preceding unsigned comment added by Theking2 (talk • contribs) 12:35, 25 February 2012 (UTC)
Free user space
[edit]I think this statement might be incorrect: "requires TRIM, otherwise the SSD gains no benefit from any free user capacity". As I understand it, most file systems will reuse deleted blocks first in order to avoid fragmentation, and therefore the SSD will gain benefit from free user capacity even if there is no TRIM. Also, I don't see any mention of TRIM in the reference (1), so it seems to be WP:OR --sciencewatcher (talk) 02:48, 20 June 2012 (UTC)
Mistake in over-provisioning illustration
[edit]That image should clearly state "10^9", not "10^12", for the comparison with 2^30 to work. --78.104.125.200 (talk) 16:04, 3 February 2013 (UTC)
- This has been driving me crazy, how can I help fix it?? — Preceding unsigned comment added by 24.21.161.214 (talk) 08:48, 27 March 2015 (UTC)
- Hello! Got the File:Over-provisioning on an SSD.png image fixed, thank you for pointing it out! As a note, you may need to hit Ctrl+F5 to have newer version of the image displayed in your web browser. — Dsimic (talk | contribs) 14:21, 27 March 2015 (UTC)
- I think much of the section under over-provisioning is pretty well garbage. What is referred to as "Over-provisioning Level 1" is better known as "rounding".
- What is referred to as "Over-provisioning Level 3" is rather messy. This is not over-provisioning per se, but instead the OS is telling the controller that space is unused and need not be preserved thus reducing write-amplification. A result similar to what over-provisioning achieves, but not actual over-provisioning.
- Lastly, declaring smaller partitions may have worked with the older MBR partitioning, with GPT the backup GPT must be written at the end of the medium, which will prevent a controller from grabbing that space for additional over-provisioning. Unless something happened that I'm unaware of, vendor-specific tools must be utilized to increase over-provisioning. 207.172.210.101 (talk) 01:06, 21 June 2016 (UTC)
- Hello! I wouldn't say that it's garbage, but calling those methods "levels" might be debatable; this might be a slightly better word choice. The level three is simply not using all the space on an SSD, on the logical level, so the controller has more never-to-be-used space to play with. Regarding your last remark, having a secondary GPT header at the end of an SSD (in terms of LBAs, of course) doesn't mark anything in-between as used as well; for the controller, it's all about used/unused blocks, not about setting boundaries around them. — Dsimic (talk | contribs) 02:34, 21 June 2016 (UTC)
- No, that does not improve the situation. "Source 1" is simply creating a new technobabble term for Rounding; unless some manufacturer states the precise useable capacity there will be rounding; the existing, well defined word should be used. "Source 3" is talking about a method to avoid write-amplification, but does not meet the definition of over-provisioning (perhaps "ad hoc over-provisioning", since the OS can still access it?). Only "Source 2" is actual over-provisioning. My comment was about this which is talking about one method of increasing the over-provisioning, but won't work for all devices. 207.172.210.101 (talk) 03:24, 21 June 2016 (UTC)
- To which page in the PDF file you're referring to? Page 14? Could you, please, explain why wouldn't creating a smaller-than-the-available-space partition work on all devices? — Dsimic (talk | contribs) 09:33, 21 June 2016 (UTC)
- That page of that paper is extremely easy to misread. Using the TRIM command to blank most of a SSD and then only using a subset of the space will in fact have the effect of reducing write amplification, but this is better described as "short-stroking". 207.172.210.101 (talk) 20:31, 21 June 2016 (UTC)
- That comparison with HDD-related short-stroking is a very good one! However, what's actually easy to misread on that page? It's perfectly fine to assume that a brand-new SSD is going to be "short-stroked" that way, which eliminates the need to TRIM or "factory erase" the whole drive. — Dsimic (talk | contribs) 11:02, 24 June 2016 (UTC)
- My first read of that paper I thought what it was claiming was if you stuck on a MBR/GPT that avoided allocating the full size, the SSD firmware would make that portion disappear and turned into over-provisioning. Instead, upon further thought this sounds like it really is effectively "short-stroking", mainly that area is left trimmed and the controller is allowed to use it as scratch space. My main concern with this section is it should use the existing terms for these things and not invent new terminology in order to stick everything under the banner of "over-provisioning". Only the "source 2" meets the correct definition of "over-provisioning". 207.172.210.101 (talk) 21:56, 25 June 2016 (UTC)
Explanation for WA comes too late in the article
[edit]"Data is written to the flash memory in units called pages (made up of multiple cells). However, the memory can only be erased in larger units called blocks (made up of multiple pages)." - this is the first reason given in the article why existing unchanged data has to get re-written sometimes, but it is way too late in the article. If you start from the top, you are constantly wondering "why should we have WA at all? Just erase some space no longer in use and write your data there". I don't have the knowledge to fix it, please try to change the order somehow. --mfb (talk) 02:06, 20 September 2014 (UTC)
- Hello there! Got the lead section slightly expanded (together with doing a few other cleanups, as spotted), please check it out. Hope it's better now. — Dsimic (talk | contribs) 06:54, 23 September 2014 (UTC)
Updating without erase
[edit]"...flash memory must be erased before it can be rewritten,..."
Is this necessarily true for writing ?
This is actually a basic flash question, not necessarily SSD related.
Consider a log which is only ever appended to with small fixed length entries.
Nth write: bytes 0-3F are written to an erased page. Can the Nth+1 write, which will occupy bytes 40-7F, be written to the same page with out erasing or using another page?
A similar question for counting.
Consider the counter 32 bit word initialized by erasure to FFFFFFFF. The first "pass" the counter is "incremented" resulting in FFFFFFFE. The second "pass" FFFFFFFC, third FFFFFFF8. The value FFFF0000 would be interpreted as 16 completed passes.
Can then same word in flash be re-written without erasures?
- Wikipedia good articles
- Engineering and technology good articles
- Old requests for peer review
- GA-Class Computing articles
- Mid-importance Computing articles
- GA-Class Computer hardware articles
- Mid-importance Computer hardware articles
- GA-Class Computer hardware articles of Mid-importance
- All Computing articles