Jump to content

Wikipedia:Reference desk/Archives/Computing/2017 August 31

From Wikipedia, the free encyclopedia
Computing desk
< August 30 << Jul | August | Sep >> September 1 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


August 31

[edit]

Geometric mean of numbers > 2^32

[edit]

I have a program that needs to take pairs of integers that are each > 2^32 and see if their geometric mean is an integer. The larger of the two numbers may be > 2^53. It will be done a large number of times, so it must be fast (i.e. no arbitrary-precision arithmetic). (This will run on a 64-bit system.)

If these criteria can't all be met, is there a fast method that will work for numbers up to some known size? (My simple program works on known data with means up to 81,000,000, but it is not going to work for numbers that are too large.) Bubba73 You talkin' to me? 06:28, 31 August 2017 (UTC)[reply]

What programming language are you using? C0617470r (talk) 06:55, 31 August 2017 (UTC)[reply]
If they are arbitrary integers, then I imagine you could use modular arithmetic to quickly show some pairs of integers don't have an integer geometric mean. For example, given numbers a and b, if prime p divides a then p^2 must divide ab, which implies either p^2 divides a or p divides b. Depending on how slow the full computation is, such an approach with a series of small primes might be helpful in narrowing the field. Dragons flight (talk) 07:44, 31 August 2017 (UTC)[reply]
You can multiply the numbers, take the integer square root, square again, and compare to the original product. If your language doesn't provide an integer square root, you can write one using Newton's method in a few lines (Java example). This should work for arbitrarily large numbers. For large numbers, the multiplications and square root will take around O(n^1.5). The optimization that Dragons flight mentions (pull out small primes) is definitely worth adding once you have a basic implementation. C0617470r (talk) 08:59, 31 August 2017 (UTC)[reply]
Obtain
Calculate the product of the numbers. If their product is in the perfect square list then their geometric mean is an integer. Beyond that range, if their geometric mean is in the prime list then their geometric mean is not an integer. Blooteuth (talk) 10:48, 31 August 2017 (UTC)[reply]
It will be on an Intel i7. Thanks for the answers - I need to take some time to digest them. Bubba73 You talkin' to me? 18:39, 31 August 2017 (UTC)[reply]
I'm still thinking about this. The problem with multiplying the numbers is that they will overflow 64-bit integers and using multi-precision arithmetic will probably be too slow for the number of pairs I'm targeting. Bubba73 You talkin' to me? 02:04, 2 September 2017 (UTC)[reply]
have you actually tried using a multi-precision package? I doubt there's a faster way except to do the multi-precision arithmetic inline, you only want 128-bit integer arithmetic so you'd be cutting down quite a bit of overhead. Dmcq (talk) 10:08, 2 September 2017 (UTC)[reply]
Have you tried something like GNU GMP[1]? Very fast. Otherwise, you could take the square root of the two numbers as double-precision values and see if their product comes close to an integer (tricky, perhaps). Or, if you think that you can factor them quickly enough, just make a merged list of the powers of their prime factors; if any entry on the list is not an even power then the test fails.73.232.241.1 (talk) 13:30, 4 September 2017 (UTC)[reply]
I like the idea about using the floating point square root. I believe that should give the actual square root if the individual numbers to be multiplied are are just a little smaller than 53 bits long and it is fairly easy to program a multiply of two numbers to a double length number. So overall the program should be quite small. Dmcq (talk) 22:27, 4 September 2017 (UTC)[reply]
85+ bits are needed to represent the product without losing precision, and taking the square root first doesn't change this. When multiplying 64-bit floats, the less significant bits are silently discarded, resulting in incorrect answers (example). Also 64-bit factorization is magnitudes slower than 128-bit arithmetic. C0617470r (talk) 01:58, 5 September 2017 (UTC)[reply]
How fast do you need? A minimal implementation is able to process 600,000 pairs per second (a data rate of 80 Mb/s). If that's still too slow, you can add some tests for small primes, and if even that's too slow, I'd be very curious about what you're working on. C0617470r (talk) 01:58, 5 September 2017 (UTC)[reply]
Thanks for the new replies (I was away for the weekend). (1) I need to implement this in my own program. (2) My aim is to do about 10^14 of these, so speed is important. The 32-bit version of Delphi has an 80-bit floating-point type with a 64-bit mantissa (called "extended"). The 64-bit version doesn't have that, but there is an implementation of extended type that I've acquired. I'm thinking that taking the F.P. square root of both numbers first should work. Bubba73 You talkin' to me? 03:44, 5 September 2017 (UTC)[reply]
I just realized an efficient way to do this strictly using integer math. The geometric mean can only be an integer iff one of the following criteria is met: (1) both numbers are perfect squares or (2) they share a common factor and dividing out this factor yields two numbers which themselves pass criteria #1. I put together a proof-of-concept of the idea here [2]. The part of the program that checks the correctness of the algorithm limits the size of operands, but the actual geometric mean test works for values spanning the entire range of the integer type you're using.73.232.241.1 (talk) 05:26, 5 September 2017 (UTC)[reply]
Shoot, I didn't think of taking the gcd. That's definitely the way to go. Congrats. C0617470r (talk) 06:16, 5 September 2017 (UTC)[reply]
Here's another implementation[3] that actually calculates the mean (if it exists). I did change the test program so that it just cycles through a range of numbers, simply because randomly-generated test values rarely ever seem to have an integer mean.73.232.241.1 (talk) 09:02, 5 September 2017 (UTC)[reply]
I could be wrong, but I thought calculating the gcd got quite slow when both numbers are large. Dragons flight (talk) 09:08, 5 September 2017 (UTC)[reply]
Not really. The absolute worst case performance for a gcd calculation would be if both operands were consecutive fibonacci numbers. In that case you're looking at a time complexity of O(log[phi](N)), which is pretty close to O(log[2](N)). In other words, the number of operations required is at maximum (roughly) equal to the number of bits in the largest operand. In the general case, however, it converges much faster. 73.232.241.1 (talk) 09:18, 5 September 2017 (UTC)[reply]
gcd might be slower than double precision because it has more branching, so the CPU spends more time idle. But it seems decisively faster than quad precision. There are some more optimizations possible, like checking that the product has an even power of 2 (example). The speedup depends on the distribution of your input numbers. C0617470r (talk) 19:08, 5 September 2017 (UTC)[reply]
But consider the case where a = 3 and b = 12, the geometric mean obviously being sqrt(3 * 12 = 36) = 6. The condition (((a & -a) + (b & -b)) % 2 != 0) evaluates to ((1 + 4 = 5) % 2 != 0) which is of course true, so the function returns false and misidentifies 36 as being a non-square! Come to think of it, why check that the product contains an even power of two at all? 73.232.241.1 (talk) 20:20, 5 September 2017 (UTC)[reply]
I agree though, the algorithm really could be optimized. Switching to a fast integer-square-root method and removing recursions and swaps from the gcd function, for example. 73.232.241.1 (talk) 20:29, 5 September 2017 (UTC)[reply]
@73.232.241.1: Oops, that line is not correct--revised example. We immediately know that the geometric mean won't be an integer if the highest power of two in the product is odd. This makes the program nearly twice as fast, depending on the distribution of inputs. Also, I tried iterative gcd and it wasn't faster--maybe the compiler already optimizes it. There is no need to remove swaps because they are not slow. C0617470r (talk) 06:21, 6 September 2017 (UTC)[reply]
Ah, right. So basically a super-fast test to determine if the sum of exponents for powers of two is even or not. Very clever! And certainly justified, as that will cut a huge chunk out of processing time overall. 73.232.241.1 (talk) 08:28, 7 September 2017 (UTC)[reply]

Deleting old files on Linux

[edit]

I recently had to restore my Linux system from a back-up drive. As a result I have thousands of files I no longer need, because my back-up was incremental and only added and updated files, not deleted them.

So therefore my question is: I have a folder containing over 50 thousand files. I'd like to issue a command from the shell that goes through every file with a specific ending (such as .jpg) and deletes every matching file whose timestamp is old enough, i.e. before a specific date (such as 9 January 2015). How do I do this? JIP | Talk 13:37, 31 August 2017 (UTC)[reply]

To list (recursively) all the jpg files, starting at the current directory (.) older than that date:
   find . -type f -iname "*.jpg" ! -newermt "9 January 2015" -ls 
and (if you're really sure, don't complain to me if things go bad) to delete them, change -ls to -delete
...but find is super powerful, so really be careful. -- Finlay McWalter··–·Talk 13:52, 31 August 2017 (UTC)[reply]
To be safe, consider first moving the files to a temporary folder (example). That way you can check the results before deleting the folder. C0617470r (talk) 17:42, 31 August 2017 (UTC)[reply]
In Finlay's answer, note that -iname will match case-insensitively, so *.jpg will also match foo.JPG, foo.Jpg, etc. If that's what you want, fine; if not, use -name for case-sensitive matching. --69.159.60.147 (talk) 22:11, 31 August 2017 (UTC)[reply]
That would otherwise be fine, but I want the finding and deletion to not be recursive. I want it to affect the current directory only. Is this possible? JIP | Talk 22:55, 31 August 2017 (UTC)[reply]
Just limit the depth of find with '-mindepth n' '-maxdepth n'. --B8-tome (talk) 01:53, 1 September 2017 (UTC)[reply]
Note that this will only work with GNU find, because -ls and the -newermt arguments are GNU find features, as are -{min,max}depth. Chances are you're using GNU find, but noted just in case. If you do need a POSIX version, just ask! --47.138.161.183 (talk) 08:28, 1 September 2017 (UTC)[reply]
(EC) This doesn't directly help your current situation, but you may want to reasses your backup situation. Incremental backups can still store backup sets with only current files at the time of of the backup listed in the set (whether they are new files, or only partof the oldest backup since they have not changed since then). This allows you to only restore the most recent set (or close to most recent set) unless you have some reason to want deleted files, or older versions of extant files, so you avoid this mess. Since your current software can't evidentally handle this, you may want to look for better software. Nil Einne (talk) 13:57, 31 August 2017 (UTC)[reply]

How to block "about:blank" web pages in Google Chrome ?

[edit]

I seem to have these pop up often as the first step to launching ads I don't want. This page is actually blank, or at least appears to be. Is there a way to block those, or would that block good pages, too ? (I've tried blacklisting it in the hosts file, but that doesn't work, as it's not actually a URL.) Windows 7, 64 bit edition. StuRat (talk) 16:25, 31 August 2017 (UTC)[reply]

I don't know any practical way to block about:blank in Chrome. I'm also not sure that Chrome is officially opening about:blank--that might just be the default thing you see as the browser downloads and loads the popup. The best method is to block the ad directly using uBlock. Or, you may need to update your antivirus if you're seeing popups on reputable sites such as Wikipedia. C0617470r (talk) 17:37, 31 August 2017 (UTC)[reply]
I am blacklisting the ads in the hosts file, but they keep changing URLs, so this is a cat-and-mouse game. It would be far better if I could stop them earlier in the process. StuRat (talk) 15:06, 2 September 2017 (UTC)[reply]
@StuRat: NoScript + Ad Block Plus (and/or Privacy Badger + uBlock) (((The Quixotic Potato))) (talk) 14:21, 4 September 2017 (UTC)[reply]
Thanks. This will block about:blank pages ? StuRat (talk) 04:36, 5 September 2017 (UTC)[reply]
@StuRat: No, you cannot block about:blank. Read this for info about about:blank. But those plugins will block most ads. If you use NoScript you'll probably have to whitelist some pages you frequently visit that require Javascript to work (e.g. Google Maps). (((The Quixotic Potato))) (talk) 21:42, 5 September 2017 (UTC)[reply]

ISO country codes for historical countries?

[edit]

Are there any proposed amendments or widely-used extensions for ISO 3166-1 that would let it cover countries that ceased to exist long before ISO 3166-1 was established? A major advantage of doing so is that it would become possible in HTML documents to specify classical Latin pronunciation (which differs from both the host-language-based pronunciation used in medicine and law, which might be specified as e.g. la-US, and the pronunciation used by the Catholic Church, i.e. la-VA) for TTS rendering. They can't currently do that, because there's no ISO 3166-1 code for the Roman Empire. Another advantage might be standardizing the emoji sequences for defunct countries' flags, although I suppose it's debatable whether the existence of a Confederate-flag emoji would be an "advantage". NeonMerlin 22:14, 31 August 2017 (UTC)[reply]

There's ISO 3166-3, which is codes for countries defunct since 1974, but I suppose that's not really what you're looking for.
|ISO 639-2 has language codes for ancient languages. Which is probably closer in intent to what you're looking for. ApLundell (talk) 19:28, 1 September 2017 (UTC)[reply]

Java: does readObjectNoData() need to call super.readObjectNoData()?

[edit]

For a Java child class whose parent implements Serializable and has a readObjectNoData() method, does the child's readObjectNoData() method need to call that of the parent to initialize the parent class's fields, or will this happen automatically (assuming there's never been a window where the child implemented Serializable and didn't descend from the current parent, and that we don't care about compatibility with third-party forked versions in which it might have been made to do so)? NeonMerlin 23:16, 31 August 2017 (UTC)[reply]