execution progress: How to get missing rows?

Hello,
I've got a table in SQL Server 2005 that contains a column of unique id's
that range between something like 1123454 and 2985763. What I need to do is
order by this column (easy to do) and then find the quickest way to loop
through each column and get just the missing numbers. Such as after ordering
my uniqueid the first column would look like:
1345874
1345879
1345883
and so on...
Assuming that this table has about 2 million rows populated out of a
possible 10 million sequential unique id's (but not sequentially populated),
my program needs to check if a number exists and if so, do nothing. If the
number doesn't exist, use that missing number to process code and insert a
row with the missing unique id and data.
I can do all of this, but I'm just wondering what might be the best
(fastest) way to loop through this table and find the missing unique id's. I
t
is a remote database and my program runs locally.
Thanks for any suggestions.First suggestion is to rethink your logic. What do you plan to do with the
"missing" numbers?
Plus, this is a trivial exercise, which makes me question the design again.
loop while @.curr_id < @.max_id
begin
Select @.curr_id = min(id) where id > @.last_id...
if @.curr_id > @.last_id + 1
...
set @.last_id = @.curr_id
etc
"John Riddle" <JohnRiddle@.discussions.microsoft.com> wrote in message
news:432D5D43-43E0-4760-AEDD-531D32921F57@.microsoft.com...
> Hello,
> I've got a table in SQL Server 2005 that contains a column of unique id's
> that range between something like 1123454 and 2985763. What I need to do
> is
> order by this column (easy to do) and then find the quickest way to loop
> through each column and get just the missing numbers. Such as after
> ordering
> my uniqueid the first column would look like:
> 1345874
> 1345879
> 1345883
> and so on...
> Assuming that this table has about 2 million rows populated out of a
> possible 10 million sequential unique id's (but not sequentially
> populated),
> my program needs to check if a number exists and if so, do nothing. If the
> number doesn't exist, use that missing number to process code and insert a
> row with the missing unique id and data.
> I can do all of this, but I'm just wondering what might be the best
> (fastest) way to loop through this table and find the missing unique id's.
> It
> is a remote database and my program runs locally.
> Thanks for any suggestions.|||Here's a sample. You can modify it to your table. The #Numbers table
population SELECT ... INTO statement is courtesy of Steve Kalis:
SELECT TOP 1000000 Num = IDENTITY(INT, 1, 1) INTO #Numbers
FROM sysobjects s1
CROSS JOIN sysobjects s2
CREATE TABLE #TempA (IDCol INT PRIMARY KEY NOT NULL)
INSERT INTO #TempA (IDCol)
SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 20
UNION SELECT 18
UNION SELECT 11
SELECT *
FROM #TempA ta
RIGHT JOIN #Numbers n
ON ta.IDCol = n.Num
WHERE ta.IDCol IS NULL
DROP TABLE #TempA
DROP TABLE #Numbers
"John Riddle" <JohnRiddle@.discussions.microsoft.com> wrote in message
news:432D5D43-43E0-4760-AEDD-531D32921F57@.microsoft.com...
> Hello,
> I've got a table in SQL Server 2005 that contains a column of unique id's
> that range between something like 1123454 and 2985763. What I need to do
> is
> order by this column (easy to do) and then find the quickest way to loop
> through each column and get just the missing numbers. Such as after
> ordering
> my uniqueid the first column would look like:
> 1345874
> 1345879
> 1345883
> and so on...
> Assuming that this table has about 2 million rows populated out of a
> possible 10 million sequential unique id's (but not sequentially
> populated),
> my program needs to check if a number exists and if so, do nothing. If the
> number doesn't exist, use that missing number to process code and insert a
> row with the missing unique id and data.
> I can do all of this, but I'm just wondering what might be the best
> (fastest) way to loop through this table and find the missing unique id's.
> It
> is a remote database and my program runs locally.
> Thanks for any suggestions.|||google up
"Islands and Gaps in Sequential Numbers"
by Alexander Kozak|||Yes, I can think of several ways to do it as well. First being a cursor
movement. However, I was curious as to what you database guys thought would
be the FASTEST method.
To let you know, my program uses the unique id field to check a website
which uses the field in the url. I have been manually incrementing the id in
my program and building the url. About half of the urls work (actually have
a
page associated with them). The others will be used at some unkown time in
the future until all are used up. I have no way of knowing after I've alread
y
travelled through a million or so id's which ones have since been used for m
e
to travel back through.
Since the utility takes time to set a get request to the webserver and wait
for a response to find out if there is new content, going through the same
million rows again is very time consuming just to find the new id's that now
have content. I wanted to skip straight through those id's already processed
by the utility and just check the unused id's. This would speed up the
re-checking of a block of id's considerably. Make sense?
I had initially thought of using a cursor to compare the table id against a
int variable that is incremented by one each time and processing if the
cursor id <> int. However, a friend had said that cursor operations on
millions of rows are slow. I thought there might be some sql method to
restrict by unused id's and simply loop straight trhough a set of
known-to-unused id's.
Thought I'd post it to the group for general ideas. I can easily implement
the concept of one of you guys knows that such and such method would be the
fastest.
Thanks.
"Jeff Dillon" wrote:

> First suggestion is to rethink your logic. What do you plan to do with the
> "missing" numbers?
> Plus, this is a trivial exercise, which makes me question the design again
.
> loop while @.curr_id < @.max_id
> begin
> Select @.curr_id = min(id) where id > @.last_id...
> if @.curr_id > @.last_id + 1
> ...
> set @.last_id = @.curr_id
> etc
>
> "John Riddle" <JohnRiddle@.discussions.microsoft.com> wrote in message
> news:432D5D43-43E0-4760-AEDD-531D32921F57@.microsoft.com...
>
>|||That's not the problem. I could think of at least two ways of doing. But
since this will be performed on millions of number, I was just looking for
suggestions as to what would be the highest performing approach, not a code
sample.
However, Mike's response was very good and I think I'll be using that
approach.
Thanks.
"Alexander Kuznetsov" wrote:

> google up
> "Islands and Gaps in Sequential Numbers"
> by Alexander Kozak
>|||You have a separate web page for each id? Wow. Have you considered a single
page with appropriate logic? Primary keys should never be used like this.
What are you trying to do? And why try to "fill in the gaps". Just use the
next one?
Options:
* Use a GUID
* Use an Identity column
* Use a table that stores the last number..then use Select @.NewID =
Max(LastID) + 1
"John Riddle" <JohnRiddle@.discussions.microsoft.com> wrote in message
news:229308E3-4EDF-4BB2-AD39-20CFC472AB45@.microsoft.com...
> Yes, I can think of several ways to do it as well. First being a cursor
> movement. However, I was curious as to what you database guys thought
> would
> be the FASTEST method.
> To let you know, my program uses the unique id field to check a website
> which uses the field in the url. I have been manually incrementing the id
> in
> my program and building the url. About half of the urls work (actually
> have a
> page associated with them). The others will be used at some unkown time in
> the future until all are used up. I have no way of knowing after I've
> already
> travelled through a million or so id's which ones have since been used for
> me
> to travel back through.
> Since the utility takes time to set a get request to the webserver and
> wait
> for a response to find out if there is new content, going through the same
> million rows again is very time consuming just to find the new id's that
> now
> have content. I wanted to skip straight through those id's already
> processed
> by the utility and just check the unused id's. This would speed up the
> re-checking of a block of id's considerably. Make sense?
> I had initially thought of using a cursor to compare the table id against
> a
> int variable that is incremented by one each time and processing if the
> cursor id <> int. However, a friend had said that cursor operations on
> millions of rows are slow. I thought there might be some sql method to
> restrict by unused id's and simply loop straight trhough a set of
> known-to-unused id's.
> Thought I'd post it to the group for general ideas. I can easily
> implement
> the concept of one of you guys knows that such and such method would be
> the
> fastest.
> Thanks.
> "Jeff Dillon" wrote:
>|||Its not my web page. Its an outside web page that I'm populating my data bas
e
with data from. I'm parsing the page and populating the database.
As I described before, my utility navigates to the page. In the url of the
page is an "id". The website that I'm getting the data from does "just fill
in the gaps" from time to time and hence I need to re-traverse all the
possible id's to find out which ones are now being used and populating my
database with the additional info recently posted.
Since I don't want to re-traverse id's that I've already got data for, I
need a fast way to only go to the id's that I don't yet have data for. So I
need to "restrict" my table by id's that are <not> in the table yet. I want
my utility to re-traverse a set of id's about 10million long and fill in the
gaps of missing data with id's that were not yet used at the time of the
first traversal but are now being used by the site and have data associated
with them. About 80% are used in my database, but nearly 100% are now being
used in that same id block on the remote site (that I have no control over).
Now do you understand? Its not a design issue. I can only get information as
it becomes available and associated with a certain id. The outside website
seems to have no rhyme nor reason in how they assign id's to results tables,
so I'm left with having to constantly re-check urls that had no data in them
at the last traversal.
"Jeff Dillon" wrote:

> You have a separate web page for each id? Wow. Have you considered a singl
e
> page with appropriate logic? Primary keys should never be used like this.
> What are you trying to do? And why try to "fill in the gaps". Just use the
> next one?
> Options:
> * Use a GUID
> * Use an Identity column
> * Use a table that stores the last number..then use Select @.NewID =
> Max(LastID) + 1
>
> "John Riddle" <JohnRiddle@.discussions.microsoft.com> wrote in message
> news:229308E3-4EDF-4BB2-AD39-20CFC472AB45@.microsoft.com...
>
>|||
This message is for Alexander Kozak. I am trying to reach Alexey
Ostrovsky, who I believe you know. Alexey has been out of contact for
over a month and I am worried about him. If you are the correct
Alexander Kozak, please contact me at bobemail1s-alexey@.yahoo.com
(displosable email in case of spam).
My apologies for posting to the group off subject. But I do not have
Alexander's direct email.
Thanks,
Bob Flanagan
*** Sent via Developersdex http://www.examnotes.net ***|||Bob,
Alex Kozak has recently published an article "Powerful, Flexible
Text-Formatting Solutions in SQL Server " on devx.com. There is an
"E-Mail the author" button on page 3.

Wednesday, March 7, 2012

How to get missing rows?

No comments:

Post a Comment

execution progress

Blog Archive

About Me