Wikipedia:Reference desk/Archives/Computing/2017 September 7
Computing desk | ||
---|---|---|
< September 6 | << Aug | September | Oct >> | September 8 > |
Welcome to the Wikipedia Computing Reference Desk Archives |
---|
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages. |
September 7
[edit]Stupid censorship question
[edit]For the past few weeks I've been following the troubles of Daily Stormer, which has been repeatedly denied basic DNS service. What I don't understand is that each time it actually has a DNS address, you can just "nslookup a dailystormer.whatever" and get some location like 198.251.90.113, which is a typical IP address that has a typical whois with some random company official who can and presumably is being hit with the typical tide of complaints, DDOS attacks, death threats etc. until the last quixotic believer in freedom of expression is purged from the internet. So the part I don't get is why do I keep reading about the attacks on the DNS servers and cave-ins by their maintainers and not attacks on the actual IP addresses of the magazine? Wnt (talk) 02:26, 7 September 2017 (UTC)
Respondent appears to have admitted their comment had zero to do with the topic Nil Einne (talk) 14:57, 9 September 2017 (UTC)
|
---|
|
- Are you sure only their DNS servers were DDoSed? For example, DreamHost was one of the targets [9] and while some of the sources confirm their DNS servers appeared to be affected it's not clear to me their web servers weren't. In any case, many of their recent hosts seem to have been like DreamHost and provided both DNS and web hosting. CloudFare also provided both DNS and CDN [10]. So ultimately whether you target their DNS hosting or web hosting or both you have the same company targeted. Their current host is definitely new [11] [12] [13]. I'm not sure if their current host provided DNS hosting but it's not like finding DNS hosting has been the issue anyway. The problem is they keep losing their domains, for reasons nothing to do with DNS hosting. Nil Einne (talk) 07:59, 7 September 2017 (UTC)
- P.S. I've perhaps oversimplified the difference between the CDN component and the reverse proxy component of something like CloudFare, although despite the pointless diversion above about read only sites, I'm not sure if it's that important in the context of this question. Nil Einne (talk) 05:57, 8 September 2017 (UTC)
- @Wnt: See https://imgs.xkcd.com/comics/free_speech.png and https://blog.cloudflare.com/why-we-terminated-daily-stormer/ (((The Quixotic Potato))) (talk) 08:07, 7 September 2017 (UTC)
- Also, if the site has simple text-only pages, as opposed to many pics and, even worse, videos, it won't take much data transfer to load the page, so you would need a lot more page requests to cause a DoS. If they have some simple protections, like not allowing more than one page load request from an IP per second, this should help. StuRat (talk) 14:38, 7 September 2017 (UTC)
- Except as I already mentioned before your reply, many DDoS don't rely on requests the site even understands. They may use DNS amplification or other forms of amplification to simply flood the site with traffic. Do you have some statistics on how many DDoS actually rely on HTTP Flood or otherwise requests to generate responses from the target site? I've looked and couldn't really find any. I can find stuff like [14] which has general statistics, but it's hard to say from them how much of a component they are. Because there are various mitigation measures, which can significantly reduce the effects depending on the sophistication of the attacker and defender [15], I'm not sure how common these are nowadays compared to other types of attacks except perhaps when you know they will work (e.g. because whoever set up the site isn't very good) or there's limited alternative (e.g. possibly attacks to Tor sites). Although I'm pretty sure this type of thing is also very clustery. Someone figures out a method of attack that works well, and everyone uses it until it stops working, perhaps because enough people fix their services that it no longer works. HTTP Floods are to one extent something where that doesn't apply however as just mentioned, actually generating and countering them is an arms race including the source of the attacks [16] [17] Nil Einne (talk) 06:30, 8 September 2017 (UTC)
- You can also perform DDOS attacks s l o w l y.[18] A Quest For Knowledge (talk) 16:36, 8 September 2017 (UTC)
- Except as I already mentioned before your reply, many DDoS don't rely on requests the site even understands. They may use DNS amplification or other forms of amplification to simply flood the site with traffic. Do you have some statistics on how many DDoS actually rely on HTTP Flood or otherwise requests to generate responses from the target site? I've looked and couldn't really find any. I can find stuff like [14] which has general statistics, but it's hard to say from them how much of a component they are. Because there are various mitigation measures, which can significantly reduce the effects depending on the sophistication of the attacker and defender [15], I'm not sure how common these are nowadays compared to other types of attacks except perhaps when you know they will work (e.g. because whoever set up the site isn't very good) or there's limited alternative (e.g. possibly attacks to Tor sites). Although I'm pretty sure this type of thing is also very clustery. Someone figures out a method of attack that works well, and everyone uses it until it stops working, perhaps because enough people fix their services that it no longer works. HTTP Floods are to one extent something where that doesn't apply however as just mentioned, actually generating and countering them is an arms race including the source of the attacks [16] [17] Nil Einne (talk) 06:30, 8 September 2017 (UTC)
I remain confused. I should note however that the nslookup still goes to the same IP address as yesterday. Going to the IP address gets a "domain not in our systems" message from BitMitigate. Going to dailystormer.at gets to a continually-reloading page. If scripts from dailystormer.at and bitmitigate are enabled (I hope I didn't catch anything...) this will currently crank for a bit and display the Daily Stormer site. (Note that Tor not only can display the same site without scripts, but despite media unanimity to the contrary seems to do it faster, at least if you don't count the 30-second wait when you first start the Tor browser before you can type the address) Now I have to admit that obviously BitMitigate is doing something very clever and fancy whose understanding is the difference between typing on Wikipedia and being the CEO of a tech company. But I don't get how the people listed in the whois for the IP address, and the machines that reach them, have managed to hold out against hostile hordes. Wnt (talk) 17:58, 8 September 2017 (UTC)
I don't understand why it surprises you by going to the IP directly doesn't work. That's the case for a very large number of webservers. There's nothing special, unique, or hard about it. (Try visiting https://198.35.26.96 for example.) It's especially the case for anything set-up with complicated back end, like you may have if you want to resist DDoS. Although to be clear, having the IP itself not work is not really a major component of countering DDoS, rather it's simply reflectiv of the set-up.
I'm not entirely sure if dailystormer.at has held out entirely against the 'hostile hordes' since when I visited it earlier to check for my answer above, it didn't seem to work although I didn't try particularly hard. But in any case, I'm not sure why it's so hard to believe if it has happened. For starters, it's not clear to me there is actually that much DDoSing going on. Sure people are still getting their domains shut down, but I didn't actually seen any talk about DDoS (or DoS) of Daily Stormer that is recent. (Actually most of the talk about DDoSing seems to be just after CloudFare dropped them, whoever was hosting them then, when they moved to DreamHost, and also on their Tor dark service. And actually, now that I look at it, I'm not sure the possible DDoS after CloudFare dropped them was a DDoS of the DNS servers.)
This is nearly always what happens. Someone gets sufficiently bored/annoyed/whatever and uses some botnet they have (or buys one) to DDoS a site. Then they get bored of that and move on to something else. I mean I'm not saying there are no minor attacks, there may be, but there's a very good chance they haven't actually faced major challenges since late August. (I mean okay, the BitMitigate CEO or whatever seemed to be almost trying to get DDoSed with their comments, but people don't always care enough about such things.)
In any case, even if they have had major DDoS attempts, it's not like it's always impossible to survive a major attack. I'm fairly sure CloudFare for example probably could survive the majority of attacks, that was after all what they said, those who wanted to DDoS dailystormer were asking CloudFare to abandon them. And while yes, this may be a bit of grandstanding the history strongly suggests that CloudFare is fairly resilient. (Also I'm fairly sure that CloudFare wasn't the only one who said it.)
It is true, as CloudFare said, that if it is a dedicated enough DDoS the number of services besides CloudFare which could survive it isn't very high. BitMitigate themselves may not be that, but if you read the links I provided above, it seems the people they're getting services from (or rather the people they in turned hired) may or may not be able to do so. Incidentally, I'm not sure why the website doesn't work for you with scripts disabled, it works fine for me.
A url for producing the raw text of a random featured article?
[edit]Hello,
For a machine learning project I need a url that would produce the raw text of a random featured article in Wikipedia as a .txt file. By raw text I mean only the article itself (not the entire html page) and with square brackets for internal links etc. Preferably I want an option to specify the language as well. Does something like that exist?
For example - the following link produces the raw text of a random article (not featured article) in Hebrew Wikipedia: https://he.wikipedia.org/wiki/Special:Random?action=raw
Thanks! — Preceding unsigned comment added by 77.127.95.225 (talk) 08:17, 7 September 2017 (UTC)
- I am not aware of such a URL, but is that really necessary? If all you want is for the program to pick out a random article, you can do the randomness yourself.
- Featured articles are listed at Wikipedia:Featured articles. I suggest you parse the latter page to generate a list of the page titles of featured articles, pick out a random item from the list by your own program's pseudorandom generator, and pull that article via the API. (I imagine you could host a web server that gives a request URL that redirects to a random FA by this process, though I fail to see the point.)
- If you intend many queries (as seems to be the case in "machine learning project"), please see mw:API:Etiquette before starting. TigraanClick here to contact me 08:36, 7 September 2017 (UTC)
- Thanks for your answer. I Just thought this approcah would be faster. Could you direct me to instructions specifying how do I get the raw source code text of an article via the API (I'm working in Python)? — Preceding unsigned comment added by 77.127.95.225 (talk) 09:13, 7 September 2017 (UTC)
- With http://tools.wmflabs.org/erwin85/randomarticle.php I produced this link to get a random article in Category:Featured articles: https://tools.wmflabs.org/erwin85/randomarticle.php?lang=en&family=wikipedia&categories=Featured%20articles&namespaces=-1. Category:Featured articles (Q4387444) shows the category name for featured articles in many languages. PrimeHunter (talk) 10:41, 7 September 2017 (UTC)
- Well, PrimeHunter's solution looks good if you can extract the code from the page it redirects you to. But in case you need to work with the API in Python...
- It is probably better-coded elsewhere, but you can take inspiration / copy-paste from this (I encourage you to pull the user-agent identification snippet as well, and populate it with your own info). Using
api_call
with parameters{'foo1': 'bar1','foo2': 'bar2',...}
in a Python dict will request the URLen.wikipedia.org/w/api.php?foo1=bar1&foo2=bar2&...
. Then see mw:API:Revisions withaction=raw
. - This produces stuff such as [19]. Is that the format you are looking for? TigraanClick here to contact me 15:59, 7 September 2017 (UTC)
- Thanks for your answer. I Just thought this approcah would be faster. Could you direct me to instructions specifying how do I get the raw source code text of an article via the API (I'm working in Python)? — Preceding unsigned comment added by 77.127.95.225 (talk) 09:13, 7 September 2017 (UTC)
- You could do it as a two-stage process. Go to https://en.wikipedia.org/wiki/Special:RandomInCategory/Featured_articles figure out the url that you were redirected to, and suffix it with "action=raw".
- ApLundell (talk) 17:01, 7 September 2017 (UTC)
Excel Formula sought
[edit]I have columns with titles “Unit(s)”, “Costs” and “Unit(s) bought”, “Unit(s) left”, “Spent”, “Leftover”. I filled “Unit(s)” and “Costs” column manually and “Leftover” column automatically (integrated with "Spent" column), now formulas are required for the “Unit(s) left” and “Spent” columns so that every time I insert number(s) in the “Unit(s) bought” column, auto result displays on both… Could you help me please? 103.67.156.84 (talk) 17:11, 7 September 2017 (UTC)
- To make sure that we have understood what you need, could you give an example with numbers, and are we correct in assuming that the columns are A to F (and that there are no hidden columns)? Dbfirs 18:48, 7 September 2017 (UTC)
- Column A: Unit – pre-set data available here.
- Column B: Costs – pre-set data available here.
- Column C: Units bought – data will be inserted here.
- Column D: Spent – require a formula that understands column “A”, “B”, “C”.
- Column E: Units left: require a formula that understands column “A”, “B”, “C” and “D”.
- Column F: Leftover: require a formula that understands column “A”, “B”, “C” “D” and “E”.
::Note:
- I’ve used ‘minus’ sign on column “E” and “F”. Formula looked something like:
- A–C for column “E”, and
- B–D for column “F”.
- What I wish for is, to insert value in column “C” so that “D”, “E” and “F” are displayed automatically.
- 116.58.200.14 (talk) 14:58, 9 September 2017 (UTC)
- If I've understood correctly, then you've already got your formulas for columns E and F, and I assume that D is just B multiplied by C, but perhaps you meant something different (how does it involve A?) which is why I asked for an example with numbers.
- If row 2 is the first row with numbers, then in cell D2 you type =B2*C2, and in cell E2 you type =A2-C2, and in cell F2 you type =B2-D2. You then highlight these three cells and replicate (copy) them down as many rows as you need.
- This is all very basic usage of Excel, so perhaps there are some subtleties that I've missed? Dbfirs 20:26, 9 September 2017 (UTC)
- Dbfirs:
- My English is not very well. I hope the following is clearer Thank you for your time…:
- Column A: Unit(s) – pre-set data is available here in this column’s row’(s) cell(s), say for example, a number from “1” to “10”.
- Column B: Costs – pre-set data available here in this column’s row’(s) cell(s), say for example, a number from “10” to “20”.
- Column C: Unit(s) bought – data will be inserted here manually as when required, say for example, a number from “1” to “10” or “1” to “50”, depending on the requirements.
- Column D: Spent – formula required for auto result display.
- Column E: Units left – formula required for auto result display.
- Column F: Leftover money – formula required for auto result display.
- Note:
- Formula required for column “D”, “E”, “F”’s row(s) cell(s) so that auto result is displayed altogether, in each cell (D, E, F), whenever I insert value(s) in column “C” only. Using one cell manually to display 3 cells result basically.
- 103.67.157.67 (talk) 03:37, 12 September 2017 (UTC)
- Did my suggestion not produce the results that you needed? Could you explain what displays incorrectly? Do you want to suppress the display until values are entered in C? Dbfirs 19:39, 12 September 2017 (UTC)
Recovering pendrive’s datas
[edit]When you recover a pendrive’s datas, what do you recover?:
1) only the last inserted files and folders.
2) everything since the beginning of time.
103.67.156.84 (talk) 17:11, 7 September 2017 (UTC)
- Same as most drives. Everything since the start, so long as it hasn't been overwritten since. This is why, when doing any recovery work, it's important to not write anything new to the drive (such as downloading the recovery software to it) while you're working on it. Andy Dingley (talk) 17:46, 7 September 2017 (UTC)
- Free advice: back up your data multiple places so that the next time you don't have to do any data recovery. --Guy Macon (talk) 06:23, 8 September 2017 (UTC)
- While data recovery from USB flash drives is often like from magnetic hard drives, I have found this isn't always the case. USB flash drives don't generally support TRIM but they do generally have some sort of wear leveling implemented on the controller, and precisely how this interacts with deletion seems to vary, even if the data isn't actually zeroed or nominally overwritten I have found odd behaviour. That said, all you can do with try, although even more so than hard disks, I would recommend imaging rather than dealing directly with the USB flash drive, and maybe even keep a copy of the image to reduce the possibility of screw ups (although realistically most recovery software doesn't even have the option to write). And imaging ASAP even, if the device is plugged in, even if not mounted so you think nothing should should happen. I do agree with Guy Macon that the best solution to data recovery is to never actually need it. Nil Einne (talk) 10:12, 8 September 2017 (UTC)
- "Data" is the plural, not "datas". "Datum" is the (rarely used in this context) singular. StuRat (talk) 16:52, 8 September 2017 (UTC)
- Wrong.[20] :) AQFK (talk) 16:50, 11 September 2017 (UTC)
I understand the part of overwriting files, what about folders? A file will be inserted or kept in a folder, should I overwrite a folder(s), or a file(s)? And what do you mean by "Everything since the start, so long as it hasn't been overwritten since." Let me give you an example for a better understanding for myself:
Today is the first day I bought a pendrive and used it, inserted file name "A", "B", "C", kept it for few days then deleted them all. After deletion, I inserted file name "D", "E", "F", kept it for few days then deleted them all. Will I still possess the trace of "A", "B", "C"? If yes then I'll overwrite...
How long will it take to recover data from a 16GB USB Flash drive/pendrive? Someone will give me something but desires my absence for 20-30 minutes…
- The speed it takes will depend on the speed of the pen drive and what you're trying to recover. As I said above, you really should image the whole thing before doing anything in which case the time taken will at a minimum be how long it takes to read the whole drive. Also no matter a hard disk or with a USB flash drive, whether or not something will be overwritten depends entirely on the OS (or whatever is writing) and file system. Trying to predict whether it has happened is generally pointless, instead just image the drive and try to recover. A for the directory vs file point, well you're getting into the additional issue that even if the data hasn't been overwritten, the file system references may have been. More sophisticated recovery software can look for signatures for various known file types and try and recover them even if the file system reference is gone, but whether that will work, will depend again on too many things to give a general answer like how fragmented the drive was (since USB devices are still presented to the system as simple block devices), the size of the file, the type of the file, precisely what has been written where etc. Nil Einne (talk) 14:44, 9 September 2017 (UTC)
- I've no idea on how to do imaging data. Could you give me a 'step by step guide' please, on how to imaging files and folders please? 103.67.156.6 (talk) 15:02, 9 September 2017 (UTC)
- Sorry but if you're asking "how to imaging files and folders", you don't even understand the basics so it will take a lot of work. You may be able to find someone to help you on some forum, but I suggest you either try someone you know in real life, or just suck it up and pay someone if you're really that desperate for the data, and consider it an expensive lesson on why you should always keep lots of backups. Nil Einne (talk) 15:30, 9 September 2017 (UTC)
- I've no idea on how to do imaging data. Could you give me a 'step by step guide' please, on how to imaging files and folders please? 103.67.156.6 (talk) 15:02, 9 September 2017 (UTC)