Wikipedia:Reference desk/Archives/Computing/2011 September 1
Computing desk | ||
---|---|---|
< August 31 | << Aug | September | Oct >> | September 2 > |
Welcome to the Wikipedia Computing Reference Desk Archives |
---|
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages. |
September 1
[edit]Acrobat bug or feature?
[edit]We upgraded Adobe's Acrobat Standard (i.e., not just Reader) from V7 to V9 in June, and are suffering from a new behaviour which we are unable to define as a bug, a feature, or a something else. Perhaps one of you has experienced this also and has advice for us.
We're on WinXP, and I'm fairly confident that that's irrelevant.
We frequently use Ctrl-C, Ctrl-V to copy-and-paste a couple dozen individual values (one at a time) from a local application into a fillable PDF form -- back and forth between two windows, clicking on title bar or taskbar icons to swap focus. With this recent update, the PDF form appears to retain focus "where we left it" -- the cursor IS in the desired field -- but in fact the form refuses to accept data at the cursor.
- This is the case both with keyboard entry and click-to-focus-plus-ctrl-C.
- This is the case both with PDF-in-a-browser and PDF-in-a-standalone-Acrobat-window.
In order to paste the copied data in the form, it is necessary to click the cursor in a different field than the most recent one, then click back into the desired field -- at which time Ctrl-C works as expected.
So, Is this a bug, or a feature?
- I've gotten nowhere on Adobe's Support site, but I'll admit my problem description may be the worst I've ever written. Perhaps better search keywords would give better results!
- Tech support for the source application says they've never heard of it, so "it can't be our problem".
Might it be something configurable within Acrobat?
Thanks! --DaHorsesMouth (talk) 00:22, 1 September 2011 (UTC)
P.S. Anyone who would like to suggest we move to laserapp as a solution to ths problem, please keep it to yourself :-)}
- I suppose you've already tried doing this on a new build and profile? Where I work we get all sorts of strange adobe issues which are fixed by wiping the adobe folders in the user's profile. Vespine (talk) 04:42, 1 September 2011 (UTC)
Can't add a font on a Mac
[edit]I'm trying to install a font, Decker specifically, on a Mac running OS X 10.6. No matter how I do it, the font won't install. I've tried double clicking, I've tried opening Font Book and installing through the menu. Everything. This font will not be listed anywhere and can't be used in Photoshop or anywhere else. Any ideas what's going on? Dismas|(talk) 03:27, 1 September 2011 (UTC)
- Copy font files to ~/Library/Fonts. Start a program you wish to use the font in. Fifelfoo (talk) 03:35, 1 September 2011 (UTC)
- It's already there. The bold version as well. Neither one of them show up in any program (Word, Photoshop, etc). Dismas|(talk) 03:47, 1 September 2011 (UTC)
- Logout/login or rebooted? What format is the font file? Fifelfoo (talk) 03:49, 1 September 2011 (UTC)
- Rebooted. ttf file. Dismas|(talk) 03:53, 1 September 2011 (UTC)
- Weird. this article on cache clearing might be of help? Fifelfoo (talk) 04:39, 1 September 2011 (UTC)
- Done. Still no joy. Dismas|(talk) 11:59, 1 September 2011 (UTC)
- Weird. this article on cache clearing might be of help? Fifelfoo (talk) 04:39, 1 September 2011 (UTC)
- Rebooted. ttf file. Dismas|(talk) 03:53, 1 September 2011 (UTC)
- Logout/login or rebooted? What format is the font file? Fifelfoo (talk) 03:49, 1 September 2011 (UTC)
- It's already there. The bold version as well. Neither one of them show up in any program (Word, Photoshop, etc). Dismas|(talk) 03:47, 1 September 2011 (UTC)
- Any possibility that the file is corrupted? --Mr.98 (talk) 11:49, 1 September 2011 (UTC)
- I suppose but I downloaded the font from the same site to my Windows XP machine at work and it installed fine. Dismas|(talk) 11:59, 1 September 2011 (UTC)
- I just reloaded it. Re-installed it. And it still does not work. And if anyone is wondering, yes, the project requires this font. Dismas|(talk) 12:27, 1 September 2011 (UTC)
- I suppose but I downloaded the font from the same site to my Windows XP machine at work and it installed fine. Dismas|(talk) 11:59, 1 September 2011 (UTC)
- I did a little searching for you - and it seems that many other people on the internet cannot get Decker to work on macs - period. It doesn't matter where they download it from, it just doesn't work. I also tried it myself with three different download sources and it did not work. The consensus seems to be that for some unbeknownst reason, it's simply incompatible. The best answer I can give you is to perhaps use Century Gothic, which is very similar. ῤerspeκὖlὖm in ænigmate(talk)(spy) 03:51, 2 September 2011 (UTC)
- I've been seeing the same thing. Well, arg. Thanks for the help, everyone! Dismas|(talk) 18:33, 2 September 2011 (UTC)
Nice programming language to play with?
[edit]Hmmm... I guess I just need a new hobby. Anyways can you name a programming language that is:
- Free and can be freely downloaded from the net (bonus points for having a small size)
- Manuals and tutorials are available online (bonus points for offline books)
- Always updated and used by the software industry
- I can use to write games with (not Blizzard level of course)
- can be run and used in Windows (I know everything runs on windows but I want to be sure ;) )
--Lenticel (talk) 03:51, 1 September 2011 (UTC)
- C# satisfies all the above. I like it a lot. I need to note with regard to your first bullet that every "programming language" is by definition free, because languages can't be copyrighted. Any particular company's compiler or development environment or IDE or whatever can be closed-source, of course. Comet Tuttle (talk) 05:28, 1 September 2011 (UTC)
- Python and Racket/Scheme, my two favorite teaching languages, both meet all those criteria. They're not widely used by the software industry, and while Python has Pygame, writing games in Racket will be kind of a pain (but don't let that dissuade you; learning multiple languages is really important). Peter Norvig's essay on learning programming has resources for both of those languages at the end. Paul (Stansifer) 13:09, 1 September 2011 (UTC)
- Python is widely used by the Open source community, in fact... As a programmer transforming from Cpp, after been confused by so many inconvenience about packaging in C#, I gradually accepted these new features in Python, such as seamlessly convention between different data types. Learning Python (4th Edition) covering Python3 is a good and funny beginning. You can download the PDF version of it freely somewhere, Oops... You can never publish your game in C# without paying the license fee (cheapest, $799 for VS2010Pro) to M$. --LunarShaddowღIvy (talk) 08:00, 6 September 2011 (UTC)
- Every reasonably well written "how to program" book that I've seen comes with a language compiler that you can install on your computer. So, it is a matter of picking a language, picking the book, and you got it. -- kainaw™ 12:48, 1 September 2011 (UTC)
- Perl is similar to Python. It is not the latest technology but major companies use it commercially and it is actively being developed. ActiveState provide a Windows installation here. The CPAN library here has 897 open source modules to assist with writing various games, mainly aimed at strategy rather than shoot-em-up. Certes (talk) 15:53, 1 September 2011 (UTC)
- Is Perl ever used to write games? This is one of Lenticel's requirements. I'd say that C# or Java (they are closely related) would be a better choice - I'd go for Java myself, but then I'm adverse to using anything that Microsoft are involved with if there is an alternative. AndyTheGrump (talk) 16:31, 1 September 2011 (UTC)
- Perl is an uncommon choice for games, but there's no reason why one couldn't use it. Frozen Bubble, which is a quite respectable little games, is written in Perl. 2.122.75.122 (talk) 22:24, 1 September 2011 (UTC)
- Cool, thanks guys. I think I'll be busy for a long time--Lenticel (talk) 00:41, 2 September 2011 (UTC)
- To clarify my previous point: 897 was the number of Perl modules specifically aimed at writing games. (It's now up to 898!) CPAN has almost 100,000 modules in total. Despite that, I agree that Perl is an uncommon choice for games. Certes (talk) 19:29, 2 September 2011 (UTC)
- Perl is an uncommon choice for games, but there's no reason why one couldn't use it. Frozen Bubble, which is a quite respectable little games, is written in Perl. 2.122.75.122 (talk) 22:24, 1 September 2011 (UTC)
- Oh, I wasn't clear enough when I wrote about C# — XNA Game Studio is Microsoft's framework to write games in C#. It's quite full-featured. It is not open-source, but it's free as in beer for Windows games. If you want to use it for the Xbox 360, there's a $99 fee per year. Comet Tuttle (talk) 21:33, 2 September 2011 (UTC)
Is Microsoft Office shareware?
[edit]Is Microsoft Office shareware? The article says that it is. Bubba73 You talkin' to me? 05:33, 1 September 2011 (UTC)
- Yes - you can download a 60 day trial version from Microsoft. Nanonic (talk) 06:59, 1 September 2011 (UTC)
- It isn't Retail software? (It is listed there.) Or Commercial software? Perhaps the free trial can be considered freeware, but what about after the free trial period? Bubba73 You talkin' to me? 15:32, 1 September 2011 (UTC)
- It is many things. These marketing names are not exclusionary. -- kainaw™ 17:06, 1 September 2011 (UTC)
- Shareware is a wide field and can cover everything from limited-time and limited-functionality demos (sometimes called crippleware) to fully-functioned software with few limits on use (it approaches some nagware or donationware, where you have to pay to remove a nagging message but can otherwise enjoy full functionality), and includes both software that can be freely shared and software whose redistribution is prohibited. The term is pre-internet and doesn't exactly reflect modern commercial models. --Colapeninsula (talk) 11:03, 2 September 2011 (UTC)
- No, Microsoft Office does not fit the classic definition of "shareware". It's standard retail software. There is a trial version which can be upgraded to the full retail version. Comet Tuttle (talk) 21:31, 2 September 2011 (UTC)
HTPC servers
[edit]How do you actually get HTPC servers like MediaPortal to be displayed on a TV? I got a Toshiba 42VL863. So far all I could manage is to listen to a single song through XBMC after finally realizing that it should be added to a library, but still no idea how the software itself is displayed on a TV (I assume it runs on a PC, and not directly installed on a TV). It's connected through ethernet, but the PC and TV aren't directly connected through anything else, and I'm not sure if that's needed. I'm assuming it's not a networking problem as I was able to listen to the song on the TV, and the same on my smartphone which only has wifi. 62.255.129.19 (talk) 11:24, 1 September 2011 (UTC)
- The normal way to connect a HTPC to a TV is via the video out of your HTPC, nowadays probably via HDMI, to your TV (although analog VGA, YPbPr component, s-video, composite may be options depending on the TV and video out). Similar to the way you'd connect a BluRay or DVD player or STB or a more specialised PVR; or on the PC side, similar to the way you'd connect a LCD monitor (which is all your TV really is likely to be if you're using a HTPC). Depending on what the TV supports, you may be able to stream some video from the HTPC to the TV via only the networking. But unless the TV has built in support as a client to whatever your HTPC server software is, it is unlikely you will have much control over the HTPC except perhaps via any web interface if your HTPC server software has a web interface. In the unlikely event your TV has Android and there is some control Android app for your HTPC server software you could perhaps use that. Even better if you have a more fully fledged Linux or Windows (which incidentally the MediaPortal client is written for) on your TV but these are even more unlikely. A better solution if the location of your HTPC precludes connecting it to your TV, you could get a thinish client. Either way, remember the client would need to be able to decode whatever you want to display and also do any post processing you desire and be able to run whatever HTPC client you need, which also means your HTPC software needs to support a server/client config. As I mentioned this means you need Windows in the case of MediaPortal. Nil Einne (talk) 13:45, 2 September 2011 (UTC)
- What do you mean by thinnish client? My PC is indeed far away from the TV, so I wonder whether a really long HDMI cable is my only option... but how would the client/server architecture operate there? Or is that used through ethernet (which most client/servers are fine with)... I'm thinking whichever the solution the client is going to be the problem, since the TV isn't even seemingly able to go to a specified web address, so that excludes controlling HTPC software through the web (which seems to be a common option). 62.255.129.19 (talk) 16:04, 6 September 2011 (UTC)
- I mean you can use a fairly small and low power computer as the client. If you want to use MediaPortal then you need Windows on the client (and install MediaPortal in client mode). However your client computer isn't a real thin client, all the video decoding etc is still going to take place on the client. (In fact in most ways except for harddisk space, the power of the client is much more important then the server so perhaps I should have just avoided the thin part.) So it needs to be powerful enough to be able to decode any video you want to display which particularly since you live in the UK may include 1080i H.264 broadcast. I personally prefer to use the GPU/graphics card (including integrated ones) for that purpose that being the case any fairly recent (released in the past 2 or so years) AMD/ATI and Nvidia graphics chipset should be fine, including integrated ones. However be more careful with the Intel ones, particularly those used in netbooks or nettops as they often lack full H.264 decoding acceleration. If you want high quality deinterlacing you may also want to check a bit more although you still shouldn't need anything that powerful. When it comes to the CPU it shouldn't matter much but I do recommend a dualcore at minimum. Hard disk size doesn't matter since all the content should be stored on the server. Any client does need a network connection to the server, I would recommend a wired ethernet connection to reduce issues but a wireless connection may work (for HD you'd probably want 802.11n). The TV remains largely irrelevant unless the client is on the TV, it's still just functioning as a monitor. Personally I recommend giving up on it being anything else. Note that MediaPortal only supports a client/server for the TV service for recording and playback of broadcast TV. Stuff like music, downloaded files, DVDs etc is all in the MediaPortal client although you can of course access files over the network.
- A long HDMI cable may work but bear in mind the highest you can go without an amplified cable is probably about 45 feet [1]. More importantly, if you use a long HDMI cable you still need to control the server. Normally you'd use a remote control or wireless keyboard or similar but if the receivers are near the computer you may have poor or no reception (anything infrared in particular is not going to work) so you'd need to find some way to move the wireless control receivers close to the TV, perhaps via an active USB extension cable something like [2] or [3] (for a passive USB cable length limit is 5 metres IIRC). Client/server becomes mostly irrelevent in a case like this since your client will be on the same PC as the server, as I said your TV is just a monitor and displaying what's displayed by the PC (the MediaPortal client interface would be what you want to see on the TV). If you plan to watch DVDs, BluRays or whatever remember you'll need to go to the PC to put them in. If you plan to record broadcast TV, I presume you've already sorted the connection between your HTPC server and antennas or STBs.
- I'm a bit confused as to your set up but I would recommend you connect some monitor, speakers, keyboard and mouse to the HTPC server if there is none currently connected and start the MediaPortal client and play around with it a bit (e.g. listen to music, watch videos) to get an idea of what you're doing. You basically want to be doing the same thing but with your TV as the monitor and obviously without running to the computer everytime to use the keyboard/mouse.
- Nil Einne (talk) 17:51, 6 September 2011 (UTC)
- Currently my TV is only connected through to my router through ethernet (I do happen to have a long ethernet cable which seems to be able to access youtube etc. adequately, which is why I hoped I could set up an HTPC using just ethernet...) - I don't understand which should be the client and server, do you think I should get a second computer closer to the TV? But will that then be able to somehow access the content on my main PC (through the network)? I was originally thinking of just connecting the TV to my main PC, regardless of how far away they are (they are connected at least through networking). Mostly I want to be able to access content on my main hard drive and external drive, and so far I've only been able to access music through UPnP using HTPC servers. 62.255.129.19 (talk) 19:04, 6 September 2011 (UTC)
- What do you mean by thinnish client? My PC is indeed far away from the TV, so I wonder whether a really long HDMI cable is my only option... but how would the client/server architecture operate there? Or is that used through ethernet (which most client/servers are fine with)... I'm thinking whichever the solution the client is going to be the problem, since the TV isn't even seemingly able to go to a specified web address, so that excludes controlling HTPC software through the web (which seems to be a common option). 62.255.129.19 (talk) 16:04, 6 September 2011 (UTC)
Excel Macro
[edit]I have an Excel 2007 database with a different tab for each month (Jan, Feb, Mar, etc.) and one tab for "Data Entry". I want to be able to enter all my data in one column on the Data Entry tab and have a macro copy it from the Data Entry tab to the appropriate month tab. For example, on the Data Entry tab, I will have a column of text from B2 through B40. I will enter the month "Jan" in B2 and I want the macro to copy B3 through B40 from the Data Entry tab and paste to the Jan tab. To make it more complicated, I can have several columns of data on each month tab, so when it pastes to the month tab, it needs to go to the next blank column to paste the data. Can anyone with coding expertise help me out? Tex (talk) 13:07, 1 September 2011 (UTC)
- I am sure there will be situations where you will want to change your data after you have entered it. So why not just do data entry straight onto each month's worksheet ? Otherwise, to allow data amendments, you will have to build a macro to retrieve a particular column for a particular month - and by the time you have done that, you have almost built a database engine in Excel, which is not what it is intended for. Gandalf61 (talk) 13:23, 1 September 2011 (UTC)
- Things are much more complicated than what I put in the original post. There are numerous reasons why I want to do exactly what I asked about. Instead of typing out every little detail, I just asked for help with the coding to get the macro to do what I want it to do. I'm not really sure why you chose to respond without answering the question, but I have found the information elsewhere, so I guess it doesn't matter now. Tex (talk) 18:48, 1 September 2011 (UTC)
- Well, excuse me for suggesting a much simpler way of achieving the same end. The mind reading attachment on my crystal ball must have malfunctioned. Gandalf61 (talk) 21:15, 1 September 2011 (UTC)
Where to start with creating graphics?
[edit]Riiight... I fail massively at creating any sort of graphic art or animation, on paper or on screen. Literally not capable of producing a stick man in Windows Paint, it's that bad. But it seems like everyone has their own deviantart, or Newgrounds, or Youtube channel.... and I'm thinking I might be missing something. I don't think I have any hidden creative talents, so feel free to tell me I just shouldn't bother. But. I would like to know how to draw some basic things on a computer. And perhaps very gently molest other people's artwork. Can you guys suggest a good place for a complete newb to start? I can read and follow instructions but that's it. — Preceding unsigned comment added by 78.105.228.237 (talk) 17:35, 1 September 2011 (UTC)
- Personally, I would start with the most complicated software - then all the other software will be child's play by comparison. If you want pixel-based artwork, you can get Photoshop or download Gimp (Gimp is almost like Photoshop, but free). You will need to Google for some online tutorials just to get started. Once you get the hang of it, you can easily manipulate other people's work. If you are interested in logos or things like that, you will want a vector-based editor like Inkscape (free like Gimp). I feel it gives more control over your artwork because you aren't changing pixels. You are changing the shapes. Again, you'll need to get a tutorial just to get started. -- kainaw™ 18:00, 1 September 2011 (UTC)
- Yes, I would second Kainaw's recommendations there. It's been a while but both GIMP and Inkscape were excellent when I last used them. Both will have been developed a fair bit since then, but hopefully they will now be even better rather than having gone downhill. And as Kainaw says, you should be able to find plenty of tutorials online with little more than a quick Google. I can't actually remember whether either Gimp or Inkscape had built in tutorials but it would certainly be worth clicking 'help' in the menu bar of each program and seeing what they offer. Good luck, graphics work is fun. I really ought to get back into it. --bodnotbod (talk) 18:44, 1 September 2011 (UTC)
- I think before looking for software, you should determine what type of drawing you want to do. I can think of several types:
- 1) "Cartooning". This rather "artistic" approach is free-form drawings like you see in cartoons, and is usually two-dimensional. MS Paint could be used for this, although there are better products, and, of course, you can also draw things by hand, and maybe scan them in later.
- 2) "Drafting". This more mathematical and logical approach is creating drawings "scientifically". Can be useful for architecture and engineering. For this you want software which can snap to grid points, etc. This can be either 2D or 3D. Drafting can be done with perspective views, to create a nice sense of depth. Drafting can be done by hand, or using a computer aided design package. Surfaces and solid models can ultimately be constructed, and shaded, while moving, if desired.
- 3) "Computer animation". This somewhat combines the artistic approach of cartooning with the mathematical and logical approach of drafting. Here you use software to typically create 3D images, like a nice shaded sphere, and combine them to make full pictures. Recent advances allow applying textures to surfaces, etc. This is what's used in modern movies (although some traditional cartooning remains). StuRat (talk) 20:05, 1 September 2011 (UTC)
- If you lack any pre-existing drawing skills, you will probably be better off with tools that are more about technical arrangement than analogs to photography or painting. So vector based tools (like Illustrator or Inkscape; personally I think Inkscape would be better for someone of your skill set, because it is less "freehand" than Illustrator), or rendering-based tools (like Blender, which takes some time to learn but has nothing whatsoever to do with painting or drawing) are probably more up your alley. Photoshop/GIMP knowledge is useful for image manipulation but you won't be able to create much from scratch without preexisting artistic skills. (Which are learnable, but I wouldn't try to learn them with a computer, personally.) --Mr.98 (talk) 14:34, 2 September 2011 (UTC)
Searching files
[edit]On Windows 7 I have approximately 1080000 html/text files of about 40 KB each. I want a way to search them for different words and strings. Using windows grep goes through each file one by one and is very slow. Is there a better way? 82.43.90.90 (talk) 19:39, 1 September 2011 (UTC)
- Have you tried the built-in Windows Search ? Note that going through the files one by one is the only option, unless you've previously created an index of which words are found in which files (this is how Google works). Also, I've found that the actual search doesn't take nearly so long as printing out a list of all the files where the string wasn't found, so any search which allows you to suppress those prints will be much faster. A more modest gain in speed can be had by stopping after the first string is found, versus searching the remainder of that file and any remaining files, but, of course, sometimes you need to find all occurrences. StuRat (talk) 19:44, 1 September 2011 (UTC)
- Also note that "grep" is faster than "grep -i" (case insensitive search). The insensitive search has to do roughly twice the work. -- kainaw™ 19:55, 1 September 2011 (UTC)
- I doubt this is true. Grep doesn't work by feeding each character into a test like "if c=='a' or c=='A' or c=='b' or c=='B' or …"—instead, it simulates a deterministic finite-state machine which processes each character in constant time, no matter how many "possibilities" are specified in the search pattern. —Bkell (talk) 12:12, 2 September 2011 (UTC)
- No need to doubt it. Just do it. At least once a year, I get someone who guarantees me with tons of theories that grep and grep -i are not significantly different. Then, when I finally get them to do a side-by-side speed check, they find out that grep -i is almost twice as slow. I let them argue with the computer after that if they still don't believe it. -- kainaw™ 21:41, 3 September 2011 (UTC)
Thanks for the answers so far. I have tried Windows search but it's slower than grep, and I'm already running grep in case sensitive mode. Since there are so many files and I would like to search them often for many different strings, the index option sounds good to me. If I am understanding correctly, I would only need to process all the files once to make an index and then after than just search the index instead which would be a lot faster than searching all the files every time? How would I build an index and with what programs? 82.43.90.90 (talk) 19:59, 1 September 2011 (UTC)
- Yes, that's true, provided the files to be indexed never change, but note that the first time you build the index it will take much longer than doing an unindexed search. You may find it necessary to build the index over several sessions, processing a portion of your files each time. The simplest index might be an index directory with a file for each word, containing a list of the files which contain one or more copy of that word. This type of index would probably be about the same size as the data, though, and could be quite a bit larger, if you have unique, short words and long file names. So, there are more efficient way to do this. For example, each file name could be assigned an integer number, starting from 1, and those numbers could be used in the index.
- You also said you wanted to be able to search for "strings". Is this a portion of a word or several words or what ? This could complicate the index generation. StuRat (talk) 20:14, 1 September 2011 (UTC)
- Ideally I'd like to be able to search for anything; long words, short words, parts of words, an entire sentence, etc, like a google search can. I don't have a defined list of search terms, it would basically be whatever pops into my head when I decide to search for it. Would that be possible? 82.43.90.90 (talk) 20:29, 1 September 2011 (UTC)
- Well, what I proposed would still be of some use:
- 1) If looking for a part of a word, you'd do a grep or search on the index file names (the word list), which would be far quicker.
- 2) For a sentence, you'd need to find the Boolean intersection of the files containing each word, then do a grep on that list, to only match those which contain the words in the desired order. StuRat (talk) 21:43, 1 September 2011 (UTC)
- If you have another 40GB to spare, you could say
grep "" * > all-files
and then grepall-files
instead of the original files. (You could gzip it to keep the size down, but I think gunzip is slower than grep.) To avoid matching the filename prefix instead of the file text you could prepend:.*
to all search regexes. Defragmentingall-files
might reduce the search time substantially. On a quad-core machine it might be faster to grep four 10GB files in parallel (or not, depending on how well the disk scheduling works). -- BenRG (talk) 21:22, 1 September 2011 (UTC)- This works really well, thank you! Thanks everyone else for the great answers too :) 82.43.90.90 (talk) 22:38, 1 September 2011 (UTC)
- Hmm, not really a problem but for 1.2GB of original files "all-files" takes up 1.7GB. What is causing such an increase in filesize, and is there any way to reduce it (without compression)? 82.43.90.90 (talk) 22:55, 1 September 2011 (UTC)
- Take a look at "all-files" to try to figure it out. A couple possibilities are it including the file names, or padding lines with blanks. BTW, can you give us an idea of how long it was taking originally and how long it takes using this method ? Also, your numbers don't add up. You said "1080000 html/text files of about 40 KB each". 1080000×40 KB = 43.2 GB, not 1.2 GB. StuRat (talk) 23:23, 1 September 2011 (UTC)
- Would the filenames really account for an additional 500MB? That's almost half the size of files themselves. Searching "all-files" took about 20 seconds, whereas searching the individual files (and creating "all-files") takes about 10 mins. It's only 1.2GB because this was just a test, I didn't try it on all the files yet. 82.43.90.90 (talk) 23:38, 1 September 2011 (UTC)
- Filenames alone wouldn't do it, but in conjunction with padding with blanks, it might. I'd advise you to do the full test, because this solution doesn't necessarily scale up. For example, you may hit a file size limit, requiring you to break it into pieces. StuRat (talk) 23:48, 1 September 2011 (UTC)
- The filename is prepended to every line, so it could add a lot, depending on the average line length. I forgot to think about that in my 40GB estimate. Also, I forgot that the bottleneck when searching a 40+GB file will probably be the hard disk, which means that compressing the file could actually reduce the search time. Try lzop, which can decompress at >200MB/sec. Or, if you want to parallelize across 4 or more cores, you could try gzip or xz (not bzip2, which decompresses very slowly). -- BenRG (talk) 00:20, 2 September 2011 (UTC)
- Every line ? Ouch. Yep, that'll do it, alright. StuRat (talk) 00:42, 2 September 2011 (UTC)