Jump to content

Wikipedia:Reference desk/Archives/Language/2021 January 11

From Wikipedia, the free encyclopedia
Language desk
< January 10 << Dec | January | Feb >> January 12 >
Welcome to the Wikipedia Language Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


January 11

[edit]

Parsing

[edit]

Using common sense to parse this sentence:

Caren was born and raised in Rustenberg, South Africa with her three older siblings. 

It's clear that Caren was born alone, but raised with 3 siblings. Both things happened in Rustenberg.

But, isn't the sentence actually saying that she was born and raised with 3 siblings? If we keep the structure and use other words, would it be so clear that the first act was alone and the second with others? Like: --Bumptump (talk) 14:16, 11 January 2021 (UTC)[reply]

Yes, it's poorly worded. Where did you see it? ←Baseball Bugs What's up, Doc? carrots15:56, 11 January 2021 (UTC)[reply]
It's found here. I don't find the sentence problematic. Bus stop (talk) 16:33, 11 January 2021 (UTC)[reply]
Formally, the sentence has not two but three different potential meanings: 1) [born] and [raised with siblings]; 2) [born and raised] with siblings; 3) [born-and-raised] with siblings. The difference between 2) and 3) is that 2) may be expanded to 2a) [born with siblings] and [raised with siblings], but 3) cannot be. Your common-sense interpretation is either 1 or 3, but your claim that the sentence is "saying" 2 is false, unless you mean that that is one of several things it is saying.
With some choices of words, the sentence would be actually ambiguous; but language is used by real people in the real world, and both discourse pragmatics and real-world constraints are crucial parts of understanding language, as AI researchers have often found. --ColinFine (talk) 17:14, 11 January 2021 (UTC)[reply]
We don't need ultimate clarity in all contexts. In a legal contract ultimate clarity is necessary. But in other contexts it may not. Bus stop (talk) 18:26, 11 January 2021 (UTC)[reply]
Alternatively we may say "At the point of Caren's birth three other siblings were in existence. The birth of Caren transpired in Rustenberg, South Africa. The upbringing of Caren transpired in Rustenberg, South Africa, in the presence of her three older siblings." Bus stop (talk) 19:10, 11 January 2021 (UTC)[reply]
Or simply, "Caren was born in Rustenberg, South Africa and raised with her three older siblings." ←Baseball Bugs What's up, Doc? carrots21:30, 11 January 2021 (UTC)[reply]
That works too. Bus stop (talk) 21:58, 11 January 2021 (UTC)[reply]
Also consider this: "born and raised in Rustenberg, South Africa with her three older siblings, while her seventeen younger siblings were sold to passing merchants". Obviously not intended, and also not how the sentence would normally be understood, but the wording does, strictly speaking, not imply that the subject was the youngest of the litter.  --Lambiam 11:07, 12 January 2021 (UTC)[reply]
Good point. We aren't trying, in this sort of writing, to remove all ambiguity. Legal writing aims to leave no loopholes. This source writes: "Contract language is limited and stylised," says Adams. He compares it to software code: do it right and everything works smoothly. But make a typo and the whole thing falls apart. When errors are introduced into legal documents, they’re likely to be noticed far more than in any other form of writing, he says. "People are more prone to fighting over instances of syntactic ambiguity than in other kinds of writing." Bus stop (talk) 13:49, 12 January 2021 (UTC)[reply]
It needs a balancing comma after South Africa. —Tamfang (talk) 02:06, 15 January 2021 (UTC)[reply]

Vowel location frequency within words

[edit]

Playing word games sometimes leads my mind to weird places. For example, if you're playing a Scrabble-like game, where you need to form words based on a set of letters, and you have an "A" and an "E" and a selection of consonants, it makes sense to put the "A" closer to the front and the "E" closer to the back because there seem to be more words that follow that pattern. Of course there are lots of words where the "E" comes before the "A", like "beat" and "break", but many more where the opposite is true, like "bake" and "fake" and "mashed" and so on. Further, the patterns are probably even more distinct if you control for prefixes and post-fixes and so on. My question is: has this been studied using a corpus of English words? Matt Deres (talk) 23:30, 11 January 2021 (UTC)[reply]

If you have a Unix/Linux system, you can do some basic checking yourself:
grep -ic 'e.*a' /usr/share/dict/words
grep -ic 'a.*e' /usr/share/dict/words
etc... AnonMoos (talk) 05:37, 12 January 2021 (UTC)[reply]
And if you restrict the count to words that contain just two vowels:
grep -ic '^[^aeiou]*a[^aeiou]*e[^aeiouy]*$' /usr/share/dict/words
grep -ic '^[^aeiou]*e[^aeiou]*a[^aeiouy]*$' /usr/share/dict/words
(an approximation, because of the ambiguity of the letter (y)), the discrepancy is even more pronounced; I find a-e : e-a = 3000 : 1356.  --Lambiam 10:57, 12 January 2021 (UTC)[reply]
Neat! Thank you both; maybe I should have asked this on RD/C. :) I guess I was assuming this was a common way of describing languages and someone had already done this kind of work, comparing languages or variants and so on. Like, how does the pattern change between EN-variants. Unfortunately, I don't use Unix or Linux. Matt Deres (talk) 16:59, 12 January 2021 (UTC)[reply]
I would think that "o" before "u" would show a lot more frequency variation between UK and American English spellings than "a" before "e". Traditional cryptanalysts often compiled tables of digraph frequency, and sometimes trigraph frequency, but such tables referred to combinations of adjacent letters... AnonMoos (talk) 21:57, 12 January 2021 (UTC)[reply]

I finally rebooted my computer from Windows into Linux to do some tasks, and also did some searches. The "words" file there has 45425 words, and according to the simplest search, 6753 words have "e" before "a", while 10831 words have "a" before "e" (of course, some words have both). On another Unix-style system I have access to (not Linux), the "words" file has 235970 words, and 54427 have have "e" before "a", while 63392 have "a" before "e". AnonMoos (talk) 05:58, 15 January 2021 (UTC)[reply]