Pyxaron ([info]pyxaron) wrote,
@ 2008-12-24 11:58:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
I'm sure you're not supposed to search a DB like that...

... $code is the var sent to the script, it's supposed to return all the rows matching that code.

-------

while ( in1_code != $code )
(
$try = random(0, 65535)
select in1_code, in1_seq from inv1 where in1_seq = $try;
)
select * from inv1 where in1_seq = $try into temp invres;

------

... so basically, it picks a row at random, checks to see if it's the one it wants, and repeats 'till it finds it.

Gah...



(9 comments) - (Post a new comment)


[info]athaleon
2008-12-24 05:34 pm UTC (link)
This probably stems from a habit I've even had a hard time breaking. It's all because of web scripting, and PHP in particular.

When you first learn PHP/mySQL you're always lead through VERY simplistic tutorials where you pull single rows or very simple sets from the DB. This happens even if you're taking classes or studying a well-written textbook, I think, but it's even worse if you're "self-taught" by online tutorials. There's enough complexity to learning PHP and the average beginner script contains so little in the way of database access that the only SQL you're forced to learn is the simplest SELECT syntax.

SQL is kind of an intimidating thing (ever try to read the mySQL documentation? how about understanding the differences between "WHERE" and "HAVING", which not even the developers of the DBMS can explain without multiple pages of complex examples?) so I doubt many people are encouraged to learn it on its own. Given the incredible power of SQL I imagine that in a former era there were people whose sole job - or at least most of their job - was to write efficient SQL queries, but at this point I expect the increasing speed of computers has caused the same damage to efficiency in database programming as increasing memory capacities caused to memory usage in generalized programming.

In the seventies and eighties software programmers had to be incredibly careful with mallocs, meticulously deallocing any unnecessary memory. They had to learn obscure memory management tricks and they had to learn to rigorously review their code. New software leaks and overuses memory almost universally because most systems have hundreds of megs (at least!) to spare and very few people (theoretically) will miss the extra. When you let this mentality run unchecked you get Mozilla Firefox using over a gig of memory after running for a week without being restarted (or crashing). When you let the mentality that execution time is irrelevant run unchecked you get PHP scripts running SELECT queries in loops that step through a database, pull out all the data for every row, spool it into an array by hand, then scan the array with PHP string manipulation functions to find the appropriate data. It's unfortunate but I can only imagine it'll get worse from here.

(Reply to this) (Thread)


[info]pyxaron
2008-12-24 11:05 pm UTC (link)
I see your point...

But in this case, this IS from the 80s, and their sole job was to write extremely efficient inventory, accounting, payroll and sales software for our hardware store chain.

Informix SQL (v4, not sure), SCO Unix System V 4.0, running on a network of Wyse 60 dumb terminals, with a P1-60Mhz as server. All stuff from the early 90s, or late 80s. Quite a few scripts and programs still say copyright 1976-1988 at startup. The last informix update we've got is from 1991...

That random thing up there is something i found digging thru code trying to fix some bugs in the software that outputs the data for financial verification by the governement at the end of the year. I was wondering why it ran so bad even now that we`ve got a dual quad Xeon at 2.66Ghz each instead of a single 500Mhz P3 as server.

(Reply to this) (Parent)(Thread)


[info]athaleon
2008-12-25 06:19 am UTC (link)
Okay, wow. I WOULD expect better than this. Ick.

(Reply to this) (Parent)


[info]skyjay
2008-12-24 07:18 pm UTC (link)
I R Programar
Random approach is very random

(Reply to this) (Thread)


[info]pyxaron
2008-12-24 11:08 pm UTC (link)
Indeed.

(Reply to this) (Parent)


[info]athauglas
2008-12-24 11:28 pm UTC (link)
[Disregard that, I'm drunk or something :p]

Did you change the search method to something less hilarious?

(Reply to this) (Thread)


[info]pyxaron
2008-12-24 11:37 pm UTC (link)
... aw, i had a reply already written to that post, then when i pressed post comment, yours was already gone!

And well, i was bored, so at first i just tried to see if it would be much faster if i made it create a temp table where it would store the $try values that didn't match the $code we're looking for, then check if the $try it randomly generated hasn't been already tried by doing a select try from temp try where try = $try.

Didn't work much better.

Replaced it with the proper code later.

(Reply to this) (Parent)


[info]sukuriant
2008-12-26 08:02 am UTC (link)
Wow... random... Maybe it was an attempt at AI? I mean, the no-free-lunch theorum says that any AI technique will, against some test data, be no better than a random search... maybe this was trying to prove that? v_v ugh, non-determinism has it's place... and that place is nooooooot in SQL queries. Question, was this software supposed to be able to handle many different databases so that if someone changed the design, they could still write for it, or something?

Also, just using their design, wouldn't a for loop have been better?
OR! Maybe another table called... try? with two parts: "tryNum"... wait, no... Why didn't they just do: "SELECT in1_code, in1_seq FROM inv1 WHERE in1_code = $code"

If they're concerned about that taking too much, I expect the 2 table design would be better, with unique in1_code's and in1_seq's... or... something...

[Please note, I've not actually taken a DB class... and have just worked with one major project that used them... and even then, I spoke with one of the database researchers to make sure that what I thought I should do and how I should do it were right, as well as any changes he might suggest]

(by the way, Merry Christmas!)

(Reply to this) (Thread)


[info]pyxaron
2008-12-27 10:30 pm UTC (link)
select * from inv1 where in1_code = $code into temp invres;

That's what it should have been all along.

in1_seq is just an autoincrement value generated by the db on insert. in1_code is an indexed value... No reason not to do a search by in1. And even there, with in1_code being unique, there's almost no reason for in1_seq to even exist.

And the data is already spread from table from inv1 to inv8, with other tables joined in the mess. Seemingly split according mostly to when the feature was added instead of any actual logic.

They all join on the in1_code, (select where in2_in1_code = in1_code and in3_in1_code = in1_code and in4_in1_code = in1_code ... goes on like that)

If not an act of total random stupidity, i'm guessing that the guy either 1- wanted to make it take longer so it would seem harder to run, to discourage the verificators from asking for more, or 2- used random to select a random row from the table for debugging, it went into production like that, then someone without a clue fixed it that way.

(Reply to this) (Parent)


(9 comments) - (Post a new comment)

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…