The Register® — Biting the hand that feeds IT

Comments on: Large databases are not safe enough, says stats boffin

Are you doing this on purpose? 

Posted Thursday 6th September 2007 09:56 GMT

Since the comments in a previous article asking for the end of the use of the word "Boffin", I seem to notice it cropping up more and more in titles and tag lines...

What's the rule about flaming and feeding trolls??

Shock horror - data modelling allows conclusions to be drawn 

Posted Thursday 6th September 2007 13:38 GMT

Data modelling is all about storing information in a way that allows it to be reconstructed usefully. If the database is storing information about people then you should be able to reconstruct it at will. As such access to such systems should be carefully controlled. This is nothing new.

If the guy has anything interesting to say on how you might cut up your data and access privileges so that no single user can access everything at once that might be more interesting but otherwise this non-news and not worthy of the boffin title.

on flaming trolls... 

Posted Thursday 6th September 2007 14:06 GMT

Fatter trolls burn more. Maybe troll candles are the way The Reg plans to cut spending on lightbulbs?

"The question is, how can data be made useful for research purposes without compromising the confidentiality of those who provided the data?"

"...further user-specific restrictions on the use of information in databases would go some way to solving the problem"

So what you mean to say, is that the confidential information stored should only be viewable by the people who need it for legitimate purposes?

Covered already by our Data Protection Act.

The real issue that people have with privacy could easily be solved with a big group unbunching-of-panties. We already have Big Government departments that have excessive, often incorrect information about us stored. It can be expensive (overdue council tax bill for house you moved out of, resulting in a CCJ because you didn't get the mail) and dangerous (allergy to penicillin omitted from medical record) but it's something that we seem to have quickly just taken as the normal condition.

I would welcome a single, coherent, accurate database that could be used by all Government departments, if I felt certain it would be planned, implemented and maintained with some degree of competence.

Guess what? It's not possible to feel that way given the current state of this country. Every IT project in the news is overbudget, badly run, and often just dead-in-the-water after millions/billions of pounds have been blown.

ID cards are not necessary, we already carry enough information like that. It's probably wrong (I know that most provisional driving licences have the wrong address on them, given the number of calls I used to field whilst on a DSA contract) but it's the same information that would be collected.

The real worry is... 

Posted Thursday 6th September 2007 14:54 GMT

Quote: "What's the rule about flaming and feeding trolls?"

It's not the trolls that worry me, it's the government's compiling, storage, usage and cross-referencing of data that keeps me awake at night.

Boffinry? Really? 

Posted Thursday 6th September 2007 15:50 GMT

Agree. In all probability a Professor of statistics actually knows enough about his subject that to call him a "boffin" would be gratuitously insulting and false. Leave "boffin" for the junk scientists, please.

Re: Boffinry? Really? 

Posted Thursday 6th September 2007 16:36 GMT

Is the term boffinry really reserved for junk scientists? That's not the impression I got from reading Register articles (the slang's not hard to figure out if you pay attention). Does anyone have an OED citation?

re. Boffinry? Really? 

Posted Thursday 6th September 2007 17:36 GMT

A professor of statistics probably knows nothing but makes it up as he goes along!

He'd be a boffin if he came up with a new statistical approach, product or device (Richard Dimbleby's Swingometer was obviously developed by a proper boffin) related to the data worked with. Simply worrying about the practicalities of enforcing data protection doesn't count.

For statistical purposes: data can indeed be reliably anonymised, although there is actually no need to collect any nature of a distinctly (ie. uniquely) personal nature in any statistical exercise. Not sure if I can think of a foolproof approach in any data management scenario. Abuse of data protections laws is per definition illegal, except if your the US government, but so is theft, murder or grievous bodily harm. So are we going to get a rerun of the story of bullets that can be traced back to their owners or knives that can't be used to stab people with? Or rayguns to be more topical.

What we need is Boffin Pride! ;) 

Posted Thursday 6th September 2007 18:31 GMT

My Concise OED (1990) says

boffin. n. esp. Brit. colloq. a person engaged in scientific (esp. military) research. [20th c.: orig, unkn.]

But that's largely irrelevant since the OED simply describes how _we_ are using words, rather than prescribing how we should use them.

I always took /boffin/ to be perjorative, and it always seems to used negatively in El Reg. (But maybe they mean it affectionately?) And maybe the term can be reclaimed from -ve connotations in the same way geek seems to be a much more +ve word thesedays.

Anonymous doesn't mean useless 

Posted Friday 7th September 2007 08:59 GMT

Further to Charlie Clark's point (above), data can be gathered with personally-identifying information yet be effectively anonymised while retaining its value for statistical analysis.

My business recently conducted on behalf of a client a survey of 16,000 people at an event. Paper forms were distributed and a prize draw formed the inducement. To distribute the prizes, the form requested respondents' names, addresses and emails.

We received a surprisingly high response, mainly because our client is a trusted brand and had been very generous with the prize pot.

The form was designed so that the section containing the personally identifiable information could be readily cut off.

Thus we ended up with two piles of paper: the answers to the twenty-odd questions (which contained no personally identifiable information) and the names and addresses of the respondents.

The name-and-address portions were divided into those who had ticked the opt-in for email marketing and those who had opted out. The opt-outs were shredded: the opt-ins were added to an OpenOffice Calc (our client's preferred spreadsheet format) data file of email-shot recipients.

The statistical information was entered into an entirely separate MySQL database for analysis. It will provide a very rich source of information which will allow our client to improve the event and target its market spend more cost-effectively in future.

A good result for us and our client - and one achieved without anyone's privacy or identity being compromised.

All this - and no boffins involved.

conclusions may be drawn without identification 

Posted Friday 7th September 2007 11:09 GMT

Consider for a moment general practice medical records, which are presently stored in 10 000 systems of around a dozen different sorts in a like number of places.

A question such as "How many people have Diabetes, of which types, by age and sex distribution and what medicines are they prescribed?" can be approached in at least two ways.

One way is to construct a large computer system notionally placed in Richmond House, Whitehall, suck all information from the 10 000 systems into it, and then make an SQL query against it.

Another way is to write two lines of Perl for each of those dozen sorts, which launch a (possibly SQL, possibly M, possibly procedural) query against the system to produce an answer, a small table of figures, ship that to a rather smaller computer notionally in RIchmond House and with another two lines of Perl aggregate them into a table of figures.

The first is more popular with the suppliers of large, and rather fanciful, computer systems, the civil service, and allegedly MI5. The latter has certain advantages, such as being known to be possible, easy even, cheap and as a small but topcially relevant feature, of not transferring identities from here to there or concentrating them into one place.

re: on flaming trolls... 

Posted Friday 7th September 2007 14:41 GMT

Noooo! Burning trolls will result in the emission of massive quantities of greenhouse gas!

Don’t Miss

Warning: roadworksNetbooks and Mini-Laptops

Buyer's Guide They're little and we love 'em. But which ones are best?

SSL covers security embarrassments with EV figleaf

Whitepaper Helping you know scammers from Adam

Emails show journalist rigged Wikipedia's naked shorts

Overstock's Byrne vindicated amidst economic meltdown

Warning StopYours truly, angry mob

Book extract Bringing Nothing To The Party: Cleaning up the net, one satirical vigilante page at a time