Jump to content

Wikipedia:Reference desk/Archives/Computing/2015 September 25

From Wikipedia, the free encyclopedia
Computing desk
< September 24 << Aug | September | Oct >> September 26 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


September 25

[edit]

Windows 10 - search inside files

[edit]

I have a folder with my Word documents and I want to find all of them that contain a specified word. If I search the folder in Windows 10, it only shows files that have that word in the title. (I thought that Windows 7 and 8 would look in the files.) I have indexing on for that drive, if that matters. How can I easily search for files that have a specified word inside? Bubba73 You talkin' to me? 03:25, 25 September 2015 (UTC)[reply]

Earlier versions of Windows (XP ?) allowed you to explicitly specify where you wanted to search (the contents or the title). I think they dropped the option to search the contents of files at some point, or at least made it harder to find. Most likely this was due to the extreme amount of time it can take to search the contents of lots of files. StuRat (talk) 04:35, 25 September 2015 (UTC)[reply]
The ability to search within file contents was never dropped. In fact, it's been the default since Vista to always search within file contents for indexed files. (This can be annoying at times, although you can use search options like "name:" to avoid it.)

For non indexed files, searching with the file contents was never dropped either although the ways to access it have been a bit more varied. For starters, you have always been able to access it via the advanced search options, although the precise name may have varied and that sets the default. In Explorer versions with the ribbon, you can probably change the default option by clicking on the search ribbon and then "Advanced options". With some versions of Windows, you can also use "contents:" or "content:" to search within files (but I have heard this didn't work in certain versions like Windows 7 SP1) and in others, you may be offered the option after searching.

However content searches need to be understood. Under the indexing options, you can choose whether to index a file contents or properties only. Not all file types will have the contents indexed by default. Also, for binary files even if you turn on indexing, you may not have satisfactory results unless you have an IFilter for the file, Windows needs to know how to understand the file to properly index it.

On some, I think all versions of Windows since Vista (see Features new to Windows Vista#Windows Search), content searches for non indexed files will also work the same way. This includes file types which don't have their contents index not having their contents searched. (Nominally Microsoft could have used Ifilters in addition to a simple text search for either files which don't have their contents indexed at all, or for all files in addition to any ifilters that may be used, but I believe this wasn't done, or at least it isn't now.)

This means you can't search files whether binary or text which Windows hasn't been told to treat as text files for indexing. You will need to adjust your indexing file type options to search the contents of files (which will treat them as text normally) for those you want to search which Windows hasn't been set up to use an IFilter or treat as text. (To give an example, text subtitles like .srt and some .sub probably won't be treated as text by default. mkv and avi likewise.) Since you can't add wildcards, I think this means you have to add all extentions to treat them all as text. So if you want to do a GREP like search within all files even binary files for a text string, as was the case with Windows XP and 2k, I'm not sure if Windows Search is the best option.

As to where this leaves the OP, Office should provide IFilters so if the OP has Office installed, Word documents should be set up to be indexed with appropriate Ifilters. It may be worth checking the various document types, particularly the ones the OP is having problems with (.docx?) to make sure. If the OP doesn't have Office, then they may not have suitable Ifilters so this definitely should be checked. The latest IFilters from Microsoft publicly available appear to be the Office 2010 ones [1] since the 2013 ones disappeared [2]. The 2010 ones should however work fine with all files made with 2013 or even 2016. You can also look here [3] although from what they list I don't think it's been updated for a while.

Presuming Word documents are having their contents indexed with the appropriate ifilters and since the file locations are being indexed their contents should be searched, and there's no need to worry about searching in contents for non indexed files. One possibility is the index isn't complete yet, if there has been a major change recently the index may be being rebuilt and when this is happening, search results can be confusing. It may be worth searching with contents for non indexed files just to check although I'm not sure if this will help if the index is being rebuilt.

Nil Einne (talk) 06:39, 25 September 2015 (UTC)[reply]

If they are .docx files, then they have a lot of human-(and machine-)readable plaintext inside an xml wrapper. I think you can then search inside files with grep, installing cygwin if necessary. If they are the older binary .doc files, I don't think this approach will work. SemanticMantis (talk) 14:34, 25 September 2015 (UTC)[reply]
Office Open XML is actually a zip of the XML files. I'm not sure whether compression is compulsory, but from what I've seen Microsoft Office at least does normally compress at least most of the XML elements in the file. So unless you decompress it, I'm pretty sure the only text elements are the names of the files and perhaps a few other incidental text parts (possibly images aren't compressed so if they contain text metadata, I guess that will be there). If you decompress the file (e.g. with zgrep or zcat), then you can look at the XML text elements. Nil Einne (talk) 14:55, 25 September 2015 (UTC)[reply]
Right, thanks, I forgot about the compression. zipgrep or zgrep will then be the better choice for OP. I think these threads are also applicable to .docx searching [4] [5], and GnuWin [6] has a windows version of zgrep. SemanticMantis (talk) 15:31, 25 September 2015 (UTC)[reply]

Thanks for all of the help. I didn't read them all because I figured out how to do it. (1) right-click in the search field in the upper right. This will bring up another set of options in the upper left. (2) select "advanced options". (3) make sure "file contents" is checked (once you have checked this, it seems to save it), then (4) enter the search term in the upper right. (It will even do PDFs, if they have the info.) Bubba73 You talkin' to me? 00:58, 26 September 2015 (UTC)[reply]

As mentioned, that option is only used to turn on file content searching for non indexed locations (if you look at the text when turning it on, this is clear). That suggests to me either the location isn't indexed as you expected it was, or the index isn't complete yet for some reason. (Note that default indexing doesn't general index a whole drive, only I believe the "Users" "Offline Files" and "StartMenu".) If you still believe the location you are searching is indexed, I would check your indexing settings since if you are willing to index the location, it will potentially greatly speed up the search. In addition, you can often lead yourself to great confusion when you believe something which isn't the case. If you're certain indexing is on for that location, and the index doesn't seem to being rebuilt, it may be worth forcing a rebuild since something is apparently wrong with the index. Nil Einne (talk) 08:15, 26 September 2015 (UTC)[reply]
Resolved
Will, under "this PC", for that drive, properties, I had "allow files on this drive to be indexed... ", but then it wasn't findint things in Word files. It used to be the case that you could select folders and subfolders to index, but now it seems that it applies to the whole drive. I copied all of these files over at once, so maybe it didn't really index them. How can you rebuild the index? Bubba73 You talkin' to me? 00:54, 27 September 2015 (UTC)[reply]
@Bubba73: The property you refer to has always been there, in XP it was called "Allow Indexing Service to index this disk for fast file searching" but I think it's had the description you mentioned since Vista and definitely since 7. The property is on every drive, directory and file at least on NTFS and AFAIK has always been on by default. You can easily change the property for one particular file. The property will allow the contents of that particular to be indexed (if the contents of the particular file type are set up to be indexed as mentioned earlier). I believe the property will prevent the indexing service from indexing the contents of files, directories or drives if it's disabled for whateve. It's possible it will also affect searches (since as mentioned, since Vista they are largely done the same was as the indexer), although my testing suggests it doesn't.

What the property doesn't and IIRC has never told you, is whether the drive, directory or file is actually being indexed. (Remember, even if the property is disabled, you can still index the properties.) You will need to check the indexing service's options for that. Access the "indexing options" from either the Control Panel or the Start Menu (just type index into the Start Menu should find it), and check what's actually being indexed. Unless you've actually changed the indexing options, or it was set up differently by someone else, it will be following the default locations. I'm pretty sure the default locations don't include the whole drive and this has also been the case since Vista or I think forever. If you have Microsoft Office, then Word files should have their contents indexed, but only if located in "Users", "Offline Files" and "StartMenu". All drives, directories and files and probably allow to have their context indexed by the indexer because the property is enabled by default, but they won't actually be indexed (either contents or properties) because they aren't one of the designated locations in the indexing service.

See [7] (linked earlier) for rebuilding the search index, but it now sounds to me like the most likely propblem is you're mistaken and the location where the file is isn't set up to be indexed.

Nil Einne (talk) 17:34, 30 September 2015 (UTC)[reply]

I'll have to read all of this later, but after I told it to reindex, it is working.

Dependencies of Qubits

[edit]

If (that is, is the output of the "and" Quantum gate on the inputs and ) and we measure and we get the result , and we measure and we get then necessarily a measurement on yields too? That is, do the measurements on and on affect the measurement on ?

Similarly, if we measure and we get , does this yield that a measurement on and on both give  ? 80.246.136.31 (talk) 09:33, 25 September 2015 (UTC)[reply]

The answer is "yes" to both questions, otherwise the "and" quantum gate didn't do its job. I don't like your use of the word "affect", though. The measurements on and on give you information about the measurement on , but they don't "affect" it because measuring first would have given the same result. Egnau (talk) 21:24, 25 September 2015 (UTC)[reply]
Why "measuring first would have given the same result"? If and are in some superposition of and , then measuring first would have given in some probability , while measuring after measuring and and after getting in both, yields that a measurement on will give the result in probability . So, some "effect" did happened, isn't it?. 80.246.136.183 (talk) 06:34, 26 September 2015 (UTC)[reply]
No, if x is 1 with probability p and y is 1 with probability q, then measuring z alone will give you 1 with probability pq. If you measure x and y, both will be 1 with probability pq, and z will be 1 too in that case; this is the same probability as if you'd only measured z. It's true that after you measure x and y, you know z's value with certainty, but that isn't a (causative) effect, just a correlation. This classical probabilistic answer is correct for your example because you didn't use any nonclassical gates (such as the Hadamard gate). See also my answer below. -- BenRG (talk) 07:47, 26 September 2015 (UTC)[reply]
There is no quantum gate like the one you envision. Gates must be unitary, which means in particular that they have as many outputs as inputs. A gate that takes (x, y, z) to (x, y, z xor (x and y)) can exist, and might be called a quantum "and" gate. In that case, if you find that the first two outputs (x and y) are both 1, that's not enough to know what the third output is. But if you also measured the third input (z) before applying the gate and it was 0, or you prepared it in the state 0, then you can be sure that the third output will be 1 when you measure it.
This example has nothing to do with quantum mechanics, really. It is just classical reversible computing. A quantum superposition, in this context, is no different from a classical bit that might be 0 or 1 and you just don't know which yet. Only when you use intrinsically quantum gates (such as the Hadamard gate) do you need to treat quantum superpositions as something more than that. -- BenRG (talk) 07:47, 26 September 2015 (UTC)[reply]
Thank you for your explanations! 31.154.92.144 (talk) 19:53, 26 September 2015 (UTC)[reply]

Do modern spacecraft still need to save every byte they can?

[edit]

As in, do they use chars, shorts and floats instead of ints and doubles where the values are expected to stay in the range of the former smaller types? 20.137.7.64 (talk) 14:08, 25 September 2015 (UTC)[reply]

Your question is a bit confusing. It appears that you are asking if digital storage space in spacecraft is extensive or minimal. More simplified, do the computers in spacecraft have gigabytes of storage or kilbytes of storage? If they have a lot, we don't have to worry about it. If they have extremely limited space, we need to work hard at conserving storage space. Storage space on space-bound computers is growing (very slowly). The mars rovers have <400MB of storage (flash and RAM). Curiosity broke 2GB. New Horizons is a bit more complicated. At 8GB, it seems like it has a lot of storage, but it actually has 4 computers, 2 of which are spares. So, it has around 2GB per computer.
Why oh why would spacecraft have less storage capacity than my phone? Memory chips don't weigh much. That is if they are ones we use on Earth. In space, computer components must be "rad-hard". That means that the bits don't go haywire when they get pounded by radiation. So, they are physically larger and weigh more. A good way to think of it is that the spaceships are using technology from the 80's. In reality, they are using systems the size of 80's technology, but much more advanced. So, yes, the storage capacity in spacecraft is a concern. However, I doubt they fret over every byte. It is more likely that they write their programs, compile them, and then use the minimum amount of storage for whatever space the resulting programs require. 209.149.113.66 (talk) 14:25, 25 September 2015 (UTC)[reply]

Deleting a really big number of files: rm /* vs find ./* -type f -delete

[edit]

I've tried recently to delete several 100,000s of files from a directory. They were logging events, if you need to know how they came into existence. Anyway, the rm option does not work apparently because it can not delete a file, forget, and go for the next one. The find command, on the other hand, didn't have any problem. As it was explained to me, because it does not start a new thread each time.

Why do these tools work differently? And advantage is there in the way rm does things? Is there another context where the rm way is the better way? --Llaanngg (talk) 17:37, 25 September 2015 (UTC)[reply]

You could be hitting a limitation of the shell. As I understand it, "rm * is first transformed by the shell into rm file0 file1 file2 file3 ... (hopefully my meaning is clear), and then executed. If there are too many files, the transformed command may be too long, and fail to be created or parsed. --Trovatore (talk) 17:42, 25 September 2015 (UTC)[reply]
Correct. The glob * is simply expanded to include every matching filename. It is not sent to the command. So, the rm command never sees the *. That is why looping works in this case, but it takes longer to type. 209.149.113.66 (talk) 17:44, 25 September 2015 (UTC)[reply]
So, let me know if I am understanding this correctly. The * of the find is dealt with by find itself, but the * of the rm is dealt by the shell, and that's too much for it. Right? --Llaanngg (talk) 18:10, 25 September 2015 (UTC)[reply]
I didn't see the typo. You don't use a * with find. You use: find ./ -type f -delete. 209.149.113.66 (talk) 18:15, 25 September 2015 (UTC)[reply]
You can use the * with ./*, it will expand into all directories. If there are 500,000 but only 10 directories, that won't make any difference. --Llaanngg (talk) 18:32, 25 September 2015 (UTC)[reply]
If you do find ./* , I think what's going to happen is that the shell is going to expand that into find ./file0 ./dir1 ./file2 ./dir3.... I don't know whether that's valid "find" syntax or not, but if it works for you, then I think that's what's going on.
If you want find to see the star, you have to escape it, as in find . -name foo\*bar, which will find all files whose name matches the regex foo*bar in or below the current directory. --Trovatore (talk) 19:21, 25 September 2015 (UTC)[reply]
You are right about having to escape metacharacters from the shell, but it's not a regex; it's a glob. They are two completely different things. Newbies trip over this all the time. No, just because they both use * doesn't mean they're the same thing. Not to mention, * doesn't even have the same meaning in globs and regexes. --71.119.131.184 (talk) 01:41, 26 September 2015 (UTC)[reply]
Huh? I said the name matched the regex. There's no such thing as "what * means in a regex" because it depends on which regular-expression syntax you're talking about. A regular-expression language is any language that can be parsed by a deterministic finite automaton, which the filename globbing certainly can. --Trovatore (talk) 01:50, 26 September 2015 (UTC)[reply]
That's a really semantic argument. In the real world, when people talk about "regexes", 99% of the time they're talking about either POSIX regexes or the various flavors ultimately descended from Henry Spencer's regex library, which include Perl regexes and PCRE. --71.119.131.184 (talk) 02:45, 26 September 2015 (UTC)[reply]
The above answers diagnose the problem correctly. Read this for a more in-depth answer. The Wooledge wiki is freenode #bash's wiki, and it's a fantastic resource for Unix shell information. Notably, you can generally count on the information being stringently correct, which is most emphatically not the case for things you will find randomly through a search engine. A ton of information on the Web related to Unix shells is flat-out wrong. --71.119.131.184 (talk) 01:41, 26 September 2015 (UTC)[reply]

Literature about programming

[edit]

I'd like to find some comprehensive published textbooks about programming in general, or, programming language concepts. How they became what they are, why meta-programming, OO, compilation vs interpretation, how they are implemented, primitives, and so on.

PS: do not point me to links to blogs which are "quite good" or forum where I "can ask questions." I have plenty of those. In the same vein, I have already found Category:Programming_language_topics and Outline of computer programming. --Llaanngg (talk) 17:49, 25 September 2015 (UTC)[reply]

The art of computer programming is widely recommended. Donald Knuth also happens to be an amazing mathematician and the inventor of LaTeX. (I seem to recall he's also an excellent organist, but I've never heard him play). Anyway, here's an overview of the books, editions, publishers, from the author himself [9]. SemanticMantis (talk) 18:09, 25 September 2015 (UTC)[reply]
Strictly speaking, Knuth is the creator of TeX, not LaTeX. LaTeX, the most popular way of using TeX, was originated by Leslie Lamport.
LaTeX is sort of a wrapper around TeX that takes care of countless typographical conventions you most likely don't want to re-invent. I don't know how much Knuth has had to do with it. I wouldn't be surprised if he's had a hand in it somewhere, but I also wouldn't be surprised if he's more or less ignored it. In any case it's not primarily his project. --Trovatore (talk) 19:00, 28 September 2015 (UTC) [reply]
Knuth's book is the best for programming. We program for the operating system, not the computer. So, understanding how operating systems work is also important. I suggest Tanenbaum's Operating Systems: Design and Implementation. It gave Linus the push to create Linux. 209.149.113.66 (talk) 18:30, 25 September 2015 (UTC)[reply]
"We program for the operating system, not the computer." - I strongly disagree with that statement. There are plenty of programmers who work on embedded systems that don't have operating systems - and there are programs written for portability that use libraries to completely hide the operating system from consideration. SteveBaker (talk) 18:54, 25 September 2015 (UTC)[reply]
True enough, but from what OP describes, they are looking for concepts and higher level stuff, more than the nitty gritty of embedded systems programming. But the question is still very broad and open ended; do you happen to have a good book suggestion for embedded systems? SemanticMantis (talk) 19:21, 25 September 2015 (UTC)[reply]
I literally meant We program for the Operating System, not All programmers program for the Operating System. We develop applications for specific operating systems, be it Windows, Android, iOS, etc... We don't do any embedded systems or device drivers or anything similar. We strictly program for the Operating System. Assuming that we aren't the only people on Earth who do such a thing, I assume it is important to understand how the operating system works. 209.149.113.66 (talk) 19:26, 25 September 2015 (UTC)[reply]
It didn't occur to me that you were using the editorial we. Just FYI, it's not traditional at the refdesk and is not likely to be understood. Or maybe you meant "my company", but that wasn't clear either. --Trovatore (talk) 18:28, 26 September 2015 (UTC) [reply]
I program not for the OS, but for the JVM. --31.177.98.43 (talk) 21:03, 25 September 2015 (UTC)[reply]
The art of computer programming is the best at analyzing algorithms in excruciating detail. Given that, it is not the best book for programming or even for algorithms. For algorithms, I think the best first book is Algorithms by Sedgewick, and as a second, more advanced book, Introduction to Algorithms by Coremen, Leiserson, and Rivest. I can't think of an equivalent book for programming, but three books about programming that I like are Programming Pearls, by Jon Bentley, Code Complete, by Steve Mcconnell, and maybe Algorithms + Data Structures = Programs, by Dijkstra. But none of these are introductory books about programming. Bubba73 You talkin' to me? 01:14, 26 September 2015 (UTC)[reply]
"Algorithms + Data Structures..." is by Wirth. Asmrulz (talk) 19:18, 26 September 2015 (UTC)[reply]
For a more LISP-y approach, there are also HtDP and the venerable SICP. Asmrulz (talk) 19:18, 26 September 2015 (UTC)[reply]
In terms of programming well, I've been strongly influenced by The Elements of Programming Style by Kernighan and Plauger, The Psychology of Computer Programming by Gerald Weinberg, and The Practice of Programming by Kernighan and Pike. From these I've evolved my own philosophy about programming which boils down to the fact that good code must not only work and work well, it must work for the right reasons. —Steve Summit (talk) 14:03, 3 October 2015 (UTC)[reply]