Computing desk
< March 16	<< Feb \| March \| Apr >>	March 18 >

Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.

March 17

Counting email domains

For a Perl assignment, I have to take a guest list which has first name, last name, and email address on each line and perform certain counts. One thing we've been asked to do is figure out how many of each domain is used. So, something like 5 people use gmail, 6 use yahoo, 3 use comcast, etc. I'm a bit stuck on how I can do that. Is it possible to load them into an array and then perform counts on the array to determine how many of each there is? So far we've covered arrays, hashes, and regex. Thanks for any help you can provide. Dismas|^(talk) 15:57, 17 March 2012 (UTC)[reply]

For each email address, first parse it to extract the domain (worry about yahoo.com and yahoo.co.uk, and the like - don't end up mistaking the latter for a domain called co.uk). Then you have a hashtable (e.g."domaincounts") with domain name (strings) as keys and an integer count for each as values. In pseudocode:

   d=getdomain(address) # you write this
   if d in domaincounts
     domaincounts[d] += 1
   else
     domaincounts[d]=1 # add a new entry, set its count to 1

(as a python person, I'm using the [array] syntax, even though domaincounts is a hashtable - I don't know the syntax for perl hashtables) 87.113.82.247 (talk) 16:29, 17 March 2012 (UTC)[reply]

(I don't mean to Shanghai you off into detail that's beyond your class, but) Strictly speaking, parsing an email address reliably and extracting the domain name from a host name are both exceedingly involved procedures, that go way beyond a trivial regexp. Surely for a beginning programming class like yours all your instructor wants is some simple "search for the dots" stuff; if your instructor gets picky and complains this isn't rigorous enough, come back here and I'll show you a regexp, and a domain parse algorithm, that are fully rigorous, and that will make anyone who reads them die a little inside. 87.113.82.247 (talk) 16:59, 17 March 2012 (UTC)[reply]

Well, it should not be so difficult. Just look for the '@' and grab the right characters (~~domain names~~ host names contain only a small set of characters) and this looks like a trivial task using regexes, actually --151.75.23.121 (talk) 19:44, 17 March 2012 (UTC)[reply]

The thing to the right of the @ is the host name, not the domain name. 87.113.82.247 (talk) 19:53, 17 March 2012 (UTC)[reply]

Exactly, thanks for pointing that out --151.75.23.121 (talk) 19:58, 17 March 2012 (UTC)[reply]

Strictlier speaking, for mail purposes, the right side of an address really is a domain name, since it usually points to an MX record and is only interpreted as a host name if the MX record is absent. Speaking strictliest, in the language of the DNS specification, every node is a domain name, even leaf nodes with nothing but an A record. 68.60.252.82 (talk) 22:56, 17 March 2012 (UTC)[reply]

Looking for a simple self-hosted blogging solution for collaboration

I'm looking for a simple, self-hosted (i.e. hosted on a private server) blogging solution as a collaboration tool. Any suggestions? — Preceding unsigned comment added by 98.114.146.146 (talk) 20:00, 17 March 2012 (UTC)[reply]

Hope these links help you: if you're looking for

... something self-hosted: AMP (solution stack) (but maybe, you should first choose the blog/wiki engine to run on the server)

... a blog: Blog software

... a collaboration tool: Wiki engine

--151.75.23.121 (talk) 20:17, 17 March 2012 (UTC)[reply]

Fossil (software) is very easy to set up and use. It combines a wiki, distributed source control, and a ticket system that you can use (with some limitations) as a blog. Its requirements of machine resources are also very low by today's standards. 67.117.144.57 (talk) 21:08, 17 March 2012 (UTC)[reply]