Talk:CDATA

This is the talk page for discussing improvements to the CDATA article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

General questions

…A couple questions still in my mind after reading this page:

The article says you can't have ]]> in a CDATA section -- is there no escaping at all in these sections? Does this basically mean that CDATA sections are only useful for hand-coding XML, when you know what's in them?

Well W3Schools.com says "A CDATA section cannot contain the string "]]>", therefore, nested CDATA sections are not allowed." So I guess something like "<![CDATA[ The following statement is CDATA: <![CDATA[hello world]]> ]]>" would be a no-no. Be this would be ok: "<![CDATA[ The following statement is CDATA: <![CDATA[hello world]]> ]]>" Notice, you can use the '&' 'g' 't' ';' characters as a right angle bracket. If you look at the two examples below, you'll notice that they look exactly the same. But click edit this page and you'll see the bottom one uses '&' 'g' 't' ';'.

  <![CDATA[ The following statement is CDATA: <![CDATA[hello world]]> ]]>
  <![CDATA[ The following statement is CDATA: <![CDATA[hello world]]> ]]>

Also, W3Schools.com says "Everything inside a CDATA section is ignored by the parser." so I'm guessing it's for humans only. (Source) -Hyad 16:38, 13 December 2005 (UTC)[reply]

If W3Schools.com says that CDATA is ignored, then it is completely wrong and should not be trusted as a source. Stick to the specs. — mjb 01:02, 31 January 2006 (UTC)[reply]

this would be ok: "<![CDATA[ The following statement is CDATA: <![CDATA[hello world]]> ]]>" Notice, you can use the '&' 'g' 't' ';' characters as a right angle bracket.

Not within a CDATA section you can't, surely? I must say, this lack of escaping sounds very bizzare and somewhat dangerous. PeteVerdon 17:27, 10 March 2006 (UTC)[reply]

Is whitespace significant in CDATA sections, but not elsewhere? Thus, are "<![CDATA[foo bar]]>" and "foo bar" parsed differently?

No, the only difference between a CDATA section and character data out in the open is that within a CDATA section, "&" means "&" (not the start of an entity or character reference) and "<" means "<" (not the start of a tag). Nothing more! — mjb 01:02, 31 January 2006 (UTC)[reply]

CDATA section in attribute?

A further point to clarify: can CDATA appear as an attribute? I've only ever seen it used as a child node, but nothing I've read here says it has to be used that way. Could one write

<Formula Name="saltLimit" Text="<![CDATA[salt < 3]]>" Units="tons" />

? 195.212.29.83 17:01, 27 September 2006 (UTC)[reply]

Short answer: No, a CDATA section can't be used in attribute values. And the fact that this isn't mentioned is a good point, and should be addressed!

Longer answer: The XML spec says a CDATA section can appear "anywhere that character data can appear", but also says that attribute values are not considered character data; they're just part of the markup for an element start-tag. So that pretty much sums it up. As far as I know, SGML is the same way.

You should keep in mind that these constructs are defined and operate at the lexical level, which is beneath any logical abstractions such as "nodes". That is, the W3C DOM Core, for example, represents a CDATA section with what they call a CDATASection node, but that has no bearing on what a CDATA section is or where it can appear. Other node-based object models like XPath's data model and the XML Infoset do not represent CDATA sections as discrete types of objects, which is good since there's no guarantee that a parser will preserve such distinctions (i.e., a parser is free to report sequential character data in chunks any way it likes; it won't necessarily be based on markup boundaries and definitely won't indicate whether the data was in a CDATA section).

FWIW, in SGML, a "CDATA section" is, I believe, an informal term by which you would refer to just one of several kinds of marked sections which take the form <![foo[…]]> where foo is a status keyword like CDATA, RCDATA, IGNORE, INCLUDE, or TEMP. In XML, a CDATA section is a formal construct and is the only kind of marked section that is inherited from SGML. So in XML we don't even have or need a concept of 'marked sections'. —mjb 21:26, 27 September 2006 (UTC)[reply]

Thanks. 195.212.29.92 09:45, 28 September 2006 (UTC)[reply]

CDATA = PCDATA?

What's the difference between CDATA and PCDATA? If you search for PCDATA you get redirected here but PCDATA isn't mentioned in this article at all. --Stefán Örvarr Sigmundsson 04:27, 28 October 2007 (UTC)[reply]

I think it wouldn't hurt to mention it. I did a quick Google search and had trouble finding anything very accurate or definitive, though, since it's an SGML concept (ISO 8879 has yet to be published online). Something I posted to a mailing list in 2001 comes up pretty high in the results, but is actually wrong; the XML spec introduces "#PCDATA" at the same time as "mixed content" and I mistakenly thought the two were synonymous. So feel free to find some better references and mention PCDATA in the article.

What I can tell you now is that in an SGML or XML DTD, "#PCDATA" signifies that an element's content is "parsed" character data. It's the same as a CDATA-type attribute value in that some of its characters may comprise markup like entity references or numeric character references, and "<" is also considered to be markup: in this case, markup that will be recognized as not being part of the element content. —mjb 03:39, 29 October 2007 (UTC)[reply]

Uses of CDATA

This article states that CDATA is used in XML, but does that refer to all XML-based languages or some? I know that it can be used in XHTML document as well? --Stefán Örvarr Sigmundsson 00:15, 31 October 2007 (UTC)[reply]

Example CDATA-type attribute values

In the following sample:

<foo a="1 &amp; 2 are &lt; &#51; &#x10;" />

I'm wondering at the significance of the . (The ASCII/Unicode code point U+0010 is Data Link Escape, whatever that means.) I assume it was actually meant to be 
 (new line — which would be 
); and based on that assumption I've changed it to 
. The example for how an XML parser would interpret it has no indication of anything after the 3, either way. If this edit is in error, please change it back. --Matty K 03:42, 16 October 2008 (UTC)[reply]

Neutrality of section Avoid CDATA in program output

This section does not appear to be supported by content found at any of the external links. It may be the original thoughts of the author and should either include appropriate citation to authorities or be removed.--Ded.morris (talk) 15:11, 8 July 2009 (UTC)[reply]

Bias/neutrality and original research are different topics. It's probably more of a WP:NOTHOWTO violation than any kind of bias. I agree it should be sourced and edited down a bit. —mjb (talk) 04:55, 9 July 2009 (UTC)[reply]

I've edited it quite substantially to remove the "original research" and instructions to the reader (the "HOWTO" flavour). I removed the stateful/stateless parsing distinction on the grounds that I expect that anyone trying to implement it will quickly be able to decide which approach is easier for them, and that at higher levels it's (marginally) easier to do a single string replacement than two. Anyway, I think this marker can now go, and intend to remove it shortly if no-one complains. -anon, 10 Oct 2009 —Preceding unsigned comment added by 217.155.139.146 (talk • contribs) 14:34, 10 October 2009 (UTC)[reply]

CDATA Deprecated in DOM4?

Perhaps a mention of this? As far as I understand, CDATA has been deprecated in DOM4, and it's use is strongly discouraged in all existing or new projects, and not supported in HTML 5 web development. However, I am still confused on that point. Mozilla says it is deprecated. ( https://developer.mozilla.org/en-US/docs/Web/API/CDATASection ) Mozilla represents 3 out of 5 of the editors of the W3C DOM4 Recommendation. ( https://www.w3.org/TR/dom/ ) However, there is still an interface description for Character Data. ( https://www.w3.org/TR/dom/#interface-characterdata ) But, later in the same document, it shows a list of things removed, one of which is CDATASection. ( https://www.w3.org/TR/dom/#dom-core ) Clarification on this point might be of importance for people understanding CDATA, especially if it is to be made completely obsolete in the near future with DOM4. I do not know if the CDATA will still remain part of SGML/XML, or if this DOM deprecation is merely removing a way to get at the CDATA using the DOM. If it is to be replaced with something, or an alternative construct, that might also be mentioned. Possibly include a range in which it was valid, i.e. year range or DOM1-DOM3 range, etc. Warp9pnt9 (talk) 11:34, 9 June 2016 (UTC)[reply]