Jump to content

Wikipedia:Reference desk/Archives/Computing/2011 September 3

From Wikipedia, the free encyclopedia
Computing desk
< September 2 << Aug | September | Oct >> September 4 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


September 3

[edit]

Change IE default download folder?

[edit]

How can I change the default download folder in IE9? (I found some instructions about modifying the registry, but it didn't work.) Bubba73 You talkin' to me? 00:48, 3 September 2011 (UTC)[reply]

Just press Ctrl-J and the downloads window will appear. Click "Options" in the bottom left and you'll see an option to change the "Default download location". Hope this helps!  ZX81  talk 02:14, 3 September 2011 (UTC)[reply]
Thanks! They could make that easier to find, for instance from the main menu options. Bubba73 You talkin' to me? 02:19, 3 September 2011 (UTC)[reply]
Resolved

Digital Reader

[edit]

What should I look for in a digital reader? I think I finally came to the conclusion that I need one of my own. Have put getting one of these off for a quite awhile now because I still prefer the book itself. The cost of getting one of them has also put me off. — Preceding unsigned comment added by Mybodymyself (talkcontribs) 02:19, 3 September 2011 (UTC)[reply]

Comparison of e-book readers. Basically your primary concerns are going to be (in order): screen contrast, weight, format support, wireless usage, internet browser. I would mention OS choice, but there are so few that aren't using some form of Linux (and even fewer that are popular), it's not really a choice you'll likely have to make. Lately I would suggest the Nook Simple Touch. ¦ Reisio (talk) 05:19, 3 September 2011 (UTC)[reply]
I would add that type of screen is a big difference — there is a large difference between an LCD and an eInk reader. --Mr.98 (talk) 15:23, 4 September 2011 (UTC)[reply]
Definitely. :) Though I can't help but think that someone considering anything but eInk being hopeless, heh. ¦ Reisio (talk) 07:11, 5 September 2011 (UTC)[reply]

Thank you for your answer to my question here and it really helped me. At the moment I have went ahead downloaded and installed Amazon Kindle because its more stable in Barnes and Noble Nook (beta) for my computer. Unsure if I would switch to Nook if became a regular release or what.--Jessica A Bruno 19:20, 4 September 2011 (UTC)Also, the rest of my family have Kindles and love. Then, some family friends as well and so that also went into my decision regarding it.--Jessica A Bruno 19:22, 4 September 2011 (UTC)

A fine choice. ¦ Reisio (talk) 07:11, 5 September 2011 (UTC)[reply]

removing scratches

[edit]

how can i remove /clean the scratches on my galaxy s2 front screen ? — Preceding unsigned comment added by 175.110.242.217 (talk) 02:24, 3 September 2011 (UTC)[reply]

You put scratches in gorilla glass? http://www.google.com/search?q=gorilla%20glass%20scratch%20removal ¦ Reisio (talk) 05:32, 3 September 2011 (UTC)[reply]
You probably want to fill them in with some type of clear resin. Or, as a temporary fix, you could apply a clear, thin oil to fill them in. StuRat (talk) 05:49, 3 September 2011 (UTC)[reply]

IP Address and Using a Generic Text Only Printer Over Network

[edit]

Hi, I wanted to know if it is possible to send jobs to a genric text only printer over a network. More specifically, can I assign it an ip adress; I know that there is no such physical device, but need an ip address to use it. Thank you for any and all help209.252.235.206 (talk) 06:29, 3 September 2011 (UTC)[reply]

To make everythning more sensible, let me explain my situation. I am accessing linux server over our network that prints various reports, I am not able to directly access the database on the server that these reports are generated from, but I am able to add printers by IP address. I really need to print these reports to a text file since I need to use that data to fill in spreadsheets in excel, and I could automate doing it if I had it in txt format. Thus, I want to find a way to assign an ip address on our network to a print to text printer; any help would be much appreciated. I apologize if some of this is poorly explained, I do not know much about the networking end of computing, but am more than happy to supply any info needed to get this working; I am also more than able to figure this out from some basic sources if anyone could point me in the right direction (if you don't fill like going into a great deal of detail.) Again, thank you. 209.252.235.206 (talk) 06:42, 3 September 2011 (UTC)[reply]

If the printer doesn't have its own network interface, the way to get access to it on the network is: 1. hook up the printer normally to any computer on the network (with the USB cable, or parallel or serial or whatever), 2. install a print server on that computer, which becomes the print server, 3. give the IP address of the print server to the others. On the other hand, if you can plug the printer into the same computer that generates the reports, 127.0.0.1 might work. It sounds like this report-generator is badly designed. Applications should allow you to specify a printer by name, and leave the configuration of IP addresses and other access methods to the print spooler. 67.162.90.113 (talk) 07:33, 3 September 2011 (UTC)[reply]
Badly designed is something of an understatement, unfortunately, I am left to find a way to workaround this. While what you are saying makes sense, I do not have an actual physical device to plug in. The generic text only printer takes jobs and saves them as a text file (rather, it extracts the text out of them and saves that) Since this isn't a physical device, I don't know how to give it an ip address. We are connecting to the server over SSH2, the interface on the server to connect a printer just asks us to input the ip; so, there doesn't seem to be a way to feed it the pc's ip, then get to the text printer that way. Sorry if all of this seems so convoluted, it's not my setup...However, I do not neccesarily require using the text printer; I need a way to get the data into excel that doesn't involve having employees print the report, then doing it by hand. I know they had crystal reports connected to the db at on point from another comp, but that machine is long gone and I don't know how it was setup (I would not be opposed to accessing the db with a program that can extract the info, I'm sure I could write one if given an outline of the networking aspect [this is my weak point, so I'd like to avoid something like this since I am not super informed...]) Again, thanks for any and all help :-) 209.252.235.206 (talk) 07:58, 3 September 2011 (UTC)[reply]
Not sure I understand the situation correctly, but I think PDFCreator can install as a network printer. Then you could install that on your local computer, and then point the remote server to your local computer as a "printer". Not sure PDFCreator will print to text, if it does not you will have to find some tool to extract text from PDF files (which does not always work, of course). Jørgen (talk) 08:15, 3 September 2011 (UTC)[reply]
As far as I can tell from your confusing terminology, where "printer" doesn't mean "printer" (can we stop doing that? it sucks) you've got some kind of print spooler already that accepts print jobs and dumps them to a file, and you can't get the report generator to talk to it. What is the physical location of this printer-that-is-not-a-printer? When you say there doesn't seem to be a way to feed it the correct IP address, why not? What happened when you tried it? 67.162.90.113 (talk) 08:21, 3 September 2011 (UTC)[reply]
It is called a printer, though. You set it up using the same method as a printer, it is in the printer folder, and it lists as a printer; I understand your point, but this is a legitimate printer, it just outputs to a file. Again, physical location doesn't make sense, it is on a pc, there is no actual device that is recieving this. The pc has an ip address, if I give it the pc's ip, it just sits in a queue that goes to nowhere; which is exactly the problem. Let me stress this again, the printer does not have an ip address since it is not an actual device, it is on a computer that has an ip, but that is not the same thing. You, the ip, seem condescending and I wish you would stop, I understand that you are trying to be helpful, but your response seems to be, "Your using terms I don't like and you should be able to explain why this situation isn't working." However, my whole problem is that I don't understand what's going on enough to make it work.
Jorgen, thank you very much, I'll look into that:-) 209.252.235.206 (talk) 08:57, 3 September 2011 (UTC)[reply]
Since I'm not doing so well at explaining myself, this is exactly the situation. When I go to add a printer, it says enter the IP address, then asks me to name the printer, then asks if I want to use it as a primary, this is it. I need to find something I can put in for the IP address part that will let it use a virtual printer on a pc on our network, putting in the pc's address does not suffice. — Preceding unsigned comment added by 209.252.235.206 (talk) 09:05, 3 September 2011 (UTC)[reply]
We're progressing slowly because of a lack of useful vocabulary; this is not a reason to give up. Is "on a pc", does that mean a MSWindows machine is doing your emulated printer thing? Maybe it'd be better if you used another Linux box, to increase the chance of them speaking the same network printing protocol. By default, a recent Linux distribution would be using CUPS for network printing, and an MSWindows box is going to be using SMB. On the other hand, if your report generator is more of an old-fashioned unix thing, it could be looking for an lpd server. And let's get a few more pieces of information out into the open: assuming that you've got some kind of print setup that does work with a real printer, where is that real printer attached and how do you instruct the report generator to use it? Be as specific as possible 67.162.90.113 (talk) 09:07, 3 September 2011 (UTC)[reply]
And here's another question: where did you find the "queue that goes to nowhere"? Which computer showed you this queue, and how did you find it? If you know how to display the contents of the queue, you're halfway to the answer already. 67.162.90.113 (talk) 09:12, 3 September 2011 (UTC)[reply]
Unfortunately, I am in a situation where I cannot modify any of the computer's OS's. Also, I can only access the server through the report generating/data entry program (nobody around seems to be able to do more either...) Thus, I'm stuck doing this kind of stuff to make it work. As for the real printers, they were set up before me, but essentially, they were attached to the network, given an ip, then the ip was given to the program and it took care of the rest. To instruct it to print a report, you type in the report name, then the name of the printer you want to use (this name is selected after you give it the ip address at setup and can be anything.) As for the queue, it was shown via the report program. However, you can give it any numbers for the ip portion and it will not raise any flags (it doesn't check to see if there is actually something with that address connected) So, if I give it a random ip address and try to print to it, it will show that there is a job for that printer, but it will never print. So, sadly, I am not looking at the virtual printer's queue via the program, the program is just making assumptions and showing them to me. Thank you209.252.235.206 (talk) 09:21, 3 September 2011 (UTC)[reply]
When I try to install pdfcreator, it keeps giving me errors overwritting MSCOMCTL.OCX and cannot install... — Preceding unsigned comment added by 209.252.235.206 (talk) 09:24, 3 September 2011 (UTC)[reply]
Sounds more and more like a protocol mismatch. Printers that attach to the network directly tend to support a lot of printing protocols so that all kinds of machines can print. Microsoft boxes tend to support only other Microsoft boxes, unless you beg. So you have to figure out what protocol the Linux box is using, and then figure out how to make your virtual printer accept jobs from that protocol. For the first part, you could use wireshark to inspect the packets sent during a failed attempt. 67.162.90.113 (talk) 09:47, 3 September 2011 (UTC)[reply]
I'm stupid when it comes to network stuff, let me ask you a few questions that might make this a little more accessibe to me. 1.) If I give the program the pc's ip, would it be reasonable to assume that it is, indeed, transmiting the report to the windows box, but that the box just doesn't know what to do with them? 2.) In the past, we had a program that accessed the database on the server, but I don't know how. Is there anyway to do this via program that you can think of? I could definitely make this all work if I could write a program and connect to the db (Programming is not problem, networking is the part of computing that I'm dumb with...should work on that) Thank you:-) Sorry if I seemed snarky earlier, this is a frustrating experience and I don't think it needed to be... 209.252.235.206 (talk) 09:52, 3 September 2011 (UTC)[reply]
One other detail I forgot to mention, there is a print to file printer that is installed on the server and accessible to the report program. However, when you use it, it prints to a prn (I think) file on the server, which we cannot access. At any rate, it would seem like this should be doable on that basis, but I don't know. 209.252.235.206 (talk) 10:03, 3 September 2011 (UTC)[reply]
You've said basically nothing about the database (some kind of SQL server I could only wildly guess?) so I couldn't know how you connect to it. Does the ssh server put you immediately into the database tool, or is there a command you have to type at the shell prompt to get into it? You appear to have such a small amount of information about the Linux box that it's almost as if we're trying to figure out how to break into it. Which might not be difficult since there's apparently nobody in charge of installing security updates on it! 67.162.90.113 (talk) 10:12, 3 September 2011 (UTC)[reply]
Sadly, I have no details about it, except that it is Linux (this only because somebody said so) When I connect over ssh, it takes me to a login screen, when I login, it takes me to the report generating program. If I had to guess, it looks like the report generating program might be a modified version of an OS, but like I said I don't really no anything about it except that we can't directly (meaning physically) access it. However, I am allowed to do anything (except change os's or access the server directly...) as long as I don't break the program. So, if we can break into it, I'm allowed to...I know, bizarre. I downloaded wireshark, but I'm not sure how to use it; or rather, what I should try and what I should look for. Is there anything you can think of besides using a virtual printer? Is there a way I could capture the report as it was being sent to another printer? Also, I am able to set the report to display on the terminal, which shows me the output in a really poorly formatted version that splits across multiple screens (I keep hitting enter to move on); would it be possible to capture the data when doing this? The solution does not need to be elegant, I just need to get the info into some electronic form that I can manipulate; even if it is badly formatted, etc. Again, thank you. *As far as security is concerned, there is someone who maintains that kind of stuff remotely. Unfortunately, they are not very helpful (I'm not sure if it is a matter of can't or won't), but they are not on site and we lack any means to compel them (I'm not really able to go into details about the who and why, suffice it to say the situation is a little screwy, but this is not a case of doing something that we don't have permission to do; just a case of nobody having a reasonable way...for some reason.) 209.252.235.206 (talk) 10:21, 3 September 2011 (UTC)[reply]
Strange story. What you'd be doing with wireshark is: 1. get it running on machine X, 2. tell the report generator to print using machine X's IP address, 3. wait a few seconds then stop the wireshark capture and scroll through it looking for key words like "print", "lp", "CUPS", "IPP". this is reverse engineering work basically, so it's hard to put into a simple recipe. Or here's another idea: you have the real printers that do accept jobs from the Linux box, so you can go look at them, and find out what protocols are enabled. Pick one of the network-attached printers, and start disabling stuff. First "lpd", then "IPP", then anything else that sounds slightly like a network protocol. Disbable a protocol, and try to print a report. When the printing fails, the last thing you disabled was the protocol it needs. If you can get your virtual printer configured like the real printer, you win!
For the other idea, surely there's a way you can enable logging of what gets printed during the ssh session. If the client was even a little bit unix-ish, you could just run the ssh inside of script. On an MSWindows box you're probably using PuTTY (or should be), and doesn't that have a logging option amongst its menus? 67.162.90.113 (talk) 10:52, 3 September 2011 (UTC)[reply]
Were using secruecrt, I would imagine that has a logging option too, though. My only point of concern with the above method is this: since I could run as many virtual printers on the pc as I want and all I am giving to the program is the IP, how would I make sure it got routed to the right one. Sorry if that's a stupid question...I plan on messing with this stuff over the weekend, I just haven't had a chance and its all new to me. I won't be able to attempt anything new with the situation until sunday night, I'll update here if you'd still be interested in helping (more monday morningish.) One final note, crt has an sftp option availible, but when I do it, the screen is just blank; is there anyway to force the server to transmit what I want? Again, I really want to thank you for discussing this with me, I'm going to try all of this out:-) 209.252.235.206 (talk) 11:03, 3 September 2011 (UTC)[reply]
sftp is probably disabled on purpose, since they didn't even give you a usable shell on the server. The question of how it decides which remote printer to use on the given IP address is a good one. You don't need any fancy virtual printers to have that problem. The possibility of multiple printers attached to a single server goes way back. Even in the old-fashioned unix configuration, you don't just specify a remote host to print on, you specify the host and the queue name. But there's a tradition of naming the default printer "lp" so that might be what it's looking for. The printer name would be one of the things you might find in the wireshark listing. 67.162.90.113 (talk) 11:14, 3 September 2011 (UTC)[reply]
Alternatively, could you set up a script that logged into the server via SSH, displayed one record at a time (or a list) on-screen through some standard keypress sequence, logged whatever appeared on the screen and then parsed the log file to get the "raw" records? Jørgen (talk) 13:05, 3 September 2011 (UTC)[reply]
Hi, again. Thank you all so much for your help. I tried using wireshark, but all of the packets are encrypted and I'm not sure how to go about decrypting them without using the ssh client. As for using a script with logging, I like that idea, except the log is just a bunch of gibberish too...I'm beginning to think this is a lost cause. Though, if you have any more ideas, please let me know:-) Thank you both again for all of your help and time, it is much appreciatedPhoenix1177 (talk) 03:50, 5 September 2011 (UTC)[reply]

Using/converting m2ts files from a camcorder

[edit]
Resolved

Hello,

I recently bought a Panasonic HD camcorder and, like most consumer-grade HD camcorders these days, its output is stored using the m2ts container format. Within this format, the video stream is encoded using MPEG-4 part 10 (i.e., H.264) and audio is encoded using A52/AC-3. I'd like to convert these files to something that can reliably be used in video editing software (few programs support m2ts import) but there is apparently no ideal way to do so (many different methods are suggested online). My method of choice would be to use MEncoder, but unfortunately the MPlayer bundle I have doesn't appear to include an A52 codec (I can play and presumably - although I haven't tried it yet - convert the video when I explicitly omit the audio track using -nosound, but when attempting to access the audio track I'm informed that no A52 codec is available).

In light of these facts, I have a two questions. Answers to either of them would likely resolve this m2ts-related unpleasantness.

  • Can I install an A52 codec for MPlayer/MEncoder without building it myself? I've found the liba52 library online, but it only comes as source code that is clearly intended to be compiled on a *nix machine (I'm running Windows). I'm okay with using MinGW or something else to compile it, but even then I haven't been able to find instructions on how to add codecs to MPlayer after it's already been built (I have a binary release of it downloaded from its site).
  • Does anyone have recommendations for a relatively "clean" method (by which I mean one that doesn't involve a multitude of different programs being used for the conversion process) for putting m2ts files of the type I've described into a format where most video editing programs will be able to import them while preserving the quality of the source files?

Any suggestions or comments regarding these matters would be much appreciated, particularly from anyone familiar with MPlayer/MEncoder and/or the use of camcorder-generated m2ts files.

Thanks!

Hiram J. Hackenbacker (talk) 13:02, 3 September 2011 (UTC)[reply]

It's possible you can use liba52's executables (a52dec, extract_a52) and then mplayer to process them, but personally I would just rebuild mplayer/mencoder with the support you want. ¦ Reisio (talk) 03:23, 4 September 2011 (UTC)[reply]
How old is the version of mplayer you're using? Recent ones (since about 4 years ago) should support AC-3 via ffmpeg, making liba52 unnecessary. What exactly does mencoder (or mplayer) say about the audio? Is there a "codec not found" error message specifically referencing liba52? You could tell it to use the ffmpeg driver with -ac ffac3 or -afm ffmpeg, maybe that'll cause something different to happen. 67.162.90.113 (talk) 09:59, 4 September 2011 (UTC)[reply]
I'm using version Sherpya-SVN-r31170-4.2.5 (from 2010). Previously, I was explicitly specifying -ac a52, which generated the message Requested audio codec family [a52] (afm=liba52) not available. Following your suggestion, I tried the -ac ffac3 and -afm ffmpeg options (separately), both of which generated the error MPlayer interrupted by signal 11 in module: init_audio_codec. However, I just downloaded the latest version (r33883-4.2.5) and it is able to play the file with sound, although the audio track is only found when -tsprobe is set to 10000000 or higher (which isn't a problem, but is different from the detection behaviour of the earlier version). From there, I was able to successfully convert a test m2ts file to a more useful format using MEncoder, so I suppose it was simply a matter of having the latest version. Anyway, thanks for your help, everyone; I'm glad it was relatively simple to figure this one out. Hiram J. Hackenbacker (talk) 13:45, 4 September 2011 (UTC)[reply]

Firefox 4

[edit]

I'm currently running Windows XP SP3 with the latest release of Firefox (6.0.1, I think). It's fine, but ever since Firefox 5 came out, I've been unable to use the add-on for the Free Download Manager, forcing me to open the program manually and constantly enter username/password information (to access the site I'm downloading from). In short, it's a pain in the ass. When 6 got released, I hoped it would fix this issue, but no luck.

I really like FDM, but I don't want to keep juggling things the way I am now. I can think of three workarounds. The most straightforward would be to find a copy of Firefox 4 somewhere and install it so that I'd have both versions on my machine. I would use 4 strictly for accessing the site I'm downloading from (EasyNews). Problem there is that I can't find a copy of FF 4 (at least, from a site I would trust). Could someone recommend a download location?

Another workaround would be to get a different add-on. I've searched, but I don't seem to be seeing what I want. I want something that will work with my browser, will remember my password, and which will help manage downloads from a site that seems to have spotty connectivity. FDM did all those things; could someone suggest another?

Finally, I guess I could switch to another browser. Before going through all that entails, can I know in advance whether it will work with FDM (or an equivalent add-on)? Thoughts? Suggestions? Matt Deres (talk) 20:14, 3 September 2011 (UTC)[reply]

This Mozilla page discusses where to get 3.6 these days. This github page discusses the Firefox lifecycle, and has pointers to the Mercurial repository where Firefox source code is maintained — I have never done this before, but if I were in your situation, I would probably start there and find out how to grab Firefox 4 from the repository — maybe the Windows builds are stored in the repository also. If not, I suppose you could download all the FF 4 source and compile it yourself. Comet Tuttle (talk) 20:59, 3 September 2011 (UTC)[reply]
Download FDM 3.8 RC. It supports Firefox 6. -- BenRG (talk) 03:32, 4 September 2011 (UTC)[reply]
Or use FlashGot (which supports FDM) instead of FDM's own Firefox extension. -- BenRG (talk) 04:38, 4 September 2011 (UTC)[reply]
Thank you both for the help; I ended up going with BenRG's second suggestion. Because 3.8 isn't a full release it wasn't showing up on either Firefox's or FDM's "check for updates" search, so I was out of the loop. I've installed 3.8 and it's working fine. Thanks again! Matt Deres (talk) 12:31, 5 September 2011 (UTC)[reply]

Multiple approximate string matches against a large dictionary

[edit]

I'm working on the project LFMonitor, a multi-dimensional Bayesian-style classifier that uses approximate string matching to identify unrecognized words that may be misspellings or variants of a known word. The current algorithm separately compares the given word to every word in the dictionary, and I'm concerned the performance will become unacceptable as the dictionary (currently around 300 words) continues to grow.

Are there any algorithms I can look into that have less-than-linear time complexity with the dictionary size, will perform faster on dictionaries of around 1000 words, and that wouldn't incur a huge overhead cost in Lua?

I know an algorithm such as bitap would gain a constant-factor speedup by not tokenizing the string, but I need to consider each word separately to avoid false positives when two words in the dictionary are very similar (e.g. "mag" and "mage"). NeonMerlin 21:00, 3 September 2011 (UTC)[reply]

Here's how I would do it:
1) Let's say the word typed in was "phaer". Take it apart into pieces, like "ph" and "aer". For each, try other possibilities:, like "f" for "ph" and "air" or "are" for "aer".
2) Combine these to get all the possibilities: "phaer", "faer", "phair", "fair", "phare", "fare".
3) Do a binary search of the dictionary to find each word, or determine that they aren't present (this requires that the word list be in alphabetical order, of course). You could also speed things up by having an index for, say, the first 3 letters, so it would know to do the binary search for "fair" only between "faia" and "faiz". You need to be careful in how you handle special characters, like apostrophes, dashes, and spaces, within words/terms, and also must decide if you need a case-sensitive dictionary or not.
This approach will be slower for a small dictionary, especially if a long word is entered, with many possible spellings. However, it should be quicker than your method, with a very large dictionary.
Note that I'm assuming you meant that the person doesn't know how to spell the word. If you include typos, like typing the wrong letter, or an extra letter, or omitting a letter, or transposing two letters, then the number of possibilities to look up becomes too large for this approach. StuRat (talk) 21:24, 3 September 2011 (UTC)[reply]
The standard method to do this is to index the dictionary by a soundex (or similar) index. Take the word given and perform the same soundex function on it to see which indexed group to search. Once you get inside the indexed group, you have two choices: perform a longer soundex (since it usually only looks at the first syllable) for a deeper index or perform your fuzzy match. The problem with fuzzy matches is that they tend to be O(n2) algorithms. So, you want to limit your group size as much as possible before resorting to them. Some implementations never do a fuzzy match. They just keep expanding whatever soundex function they are using until they get a group of 10 or so words and then assume you want one of those. -- kainaw 21:37, 3 September 2011 (UTC)[reply]
Does "approximate string matching" mean that the dictionary consists of Lua patterns like "%sony", most of which specify the first letter of the word? If so, split the dictionary into 26 volumes and just check the volume matching the word's initial (volume "o" here). You may need one more volume, always checked, for patterns where the first character can vary or isn't a letter. Performance is still linear but much faster. If the dictionary grows much further, split it on the first two or more letters. You may want to lowercase the dictionary entries and the words once each before the multiple comparisons. Certes (talk) 21:57, 3 September 2011 (UTC)[reply]
That splitting is similar to the indexing I proposed, although I'd go beyond 1 or 2 letters, more like 3 or 4. The indexing allows your program to jump to the proper location without having to have dozens to thousands of files. However, if the file size is too large for the system to handle, then splitting it up is appropriate, and once you get them down to sizes the system can handle, then use indexing to break it up further. StuRat (talk) 05:53, 5 September 2011 (UTC)[reply]
A soundex won't help at all, because most of the words I'm searching for are common words, abbreviations, or proper names that are frequently mispronounced. (If you can't spell Orgrimmar, you probably can't pronounce it either.) The spelling errors will almost all be due to mistyping or non-standardized abbreviations (e.g. "ToFW" vs "To4W"). I need to be able to find the set of all good guesses about what an incorrect or nonstandard abbreviation means (e.g. "AQ30" could be either "AQ10" or "AQ40"). NeonMerlin 20:49, 6 September 2011 (UTC)[reply]
If you only plan to have 1000 words in your dictionary, then going through every one and comparing it with the typed-in word, calculating a "closesness rank", then sorting (maybe bin sort, definitely not bubble sort !) by that rank, is probably the way to go. So, if this is how you're doing it now, that's probably best. (The methods I described previously are more for searching a full dictionary of the English language, with very limited variations on the typed-in word allowed.) StuRat (talk) 22:18, 6 September 2011 (UTC)[reply]