Advice on how to improve query efficiency when Filtering on nvarchar(4000)

ions · Sep 23, 2025

Hello,

I have a table with 31 million records. So far most queries return within 2-4 seconds but there is one important query that is taking 14 seconds.

There is a field called FEDERAL_WASTE_CODES and 75% of the time it contains <= 3 waste codes (D001, F002, F003) and 95% of the time it contains <= 10 waste codes (D004, D005, D006, D007, D008, D009, D010, D011).

However, there are times it contains 40+ Waste Codes (see end of post). When I filter for a specific waste code (LIKE '%K061%') the query becomes too slow. I did not index this field.

What are your recommendations? Thank you!

This query takes ~14 seconds when done in C#

Code:

return await dbContext.ManifestLines
    .Where(x => x.SHIPPED_DATE.Year == 2024
             && x.GENERATOR_LOCATION_STATE == "NY"
             && x.FEDERAL_WASTE_CODES != null
             && x.FEDERAL_WASTE_CODES.Contains("K061"))
    .ToListAsync();

D001, D004, D005, D006, D007, D008, D009, D010, D011, D012, D013, D014, D015, D016, D017, D018, D019, D020, D021, D022, D023, D024, D025, D026, D027, D028, D029, D030, D031, D032, D033, D034, D035, D036, D037, D038, D039, D040, D041, D042, D043, F007, F008, F009, F010, F011, F012, F019, F024, F025, F037, F038, F039, K001, K002, K003, K004, K005, K006, K007, K008, K009, K010, K011, K013, K014, K015, K016, K017, K018, K019, K020, K021, K022, K023, K024, K028, K029, K030, K031, K032, K033, K034, K035, K036, K037, K038, K040, K041, K042, K043, K044, K045, K046, K047, K048, K049, K050, K051, K052, K060, K061, K062, K069, K071, K073, K083, K084, K085, K086, K087, K088, K093, K094, K095, K096, K097, K098, K099, K100, K101, K102, K103, K104, K105, K106, K111, K115, K117, K118, K131, K132, K136, K141, K142, K143, K144, K145, K147, K148, K149, K150, K151, K156, K157, K158, K159, K161, P004, P006, P009, P010, P011, P012, P013, P020, P021, P022, P024, P029, P030, P031, P033, P034, P036, P037, P038, P039, P047, P048, P050, P051, P056, P059, P060, P063, P065, P068, P071, P073, P074, P077, P081, P082, P089, P092, P094, P096, P097, P098, P099, P101, P103, P104, P105, P106, P110, P112, P113, P114, P115, P119, P120, P121, P122, P123, P127, P128, P185, P188, P189, P190, P191, P192, P194, P196, P197, P198, P199, P201, P202, P203, P204, P205, U005, U009, U012, U018, U019, U022, U023, U024, U025, U027, U028, U029, U030, U031, U032, U036, U037, U039, U043, U044, U045, U047, U048, U050, U051, U052, U057, U060, U061, U063, U066, U067, U068, U069, U070, U071, U072, U075, U076, U077, U078, U079, U080, U081, U082, U083, U084, U086, U088, U096, U098, U099, U101, U102, U103, U105, U106, U107, U108, U109, U111, U112, U115, U117, U118, U120, U121, U124, U127, U128, U129, U130, U131, U133, U134, U135, U136, U137, U138, U140, U141, U142, U144, U145, U146, U151, U152, U154, U155, U157, U158, U159, U160, U161, U162, U165, U169, U170, U172, U174, U179, U180, U181, U183, U184, U185, U187, U188, U189, U190, U192, U196, U203, U204, U205, U207, U208, U209, U210, U211, U214, U215, U216, U217, U220, U225, U226, U227, U228, U235, U239, U240, U243, U246, U247, U249, U271, U278, U279, U280, U364, U367, U372, U373, U387, U389, U394, U395, U404, U409, U410, U411

tvanstiphout · Sep 23, 2025

I would normalize this data after import. The new table would have a FK pointing to the parent record, and then one WasteCode.
Yes, you may have 100M records in this table, but now you can index it and retrieval will be so fast your head will spin

isladogs · Sep 23, 2025

For SELECT queries, indexing the filter / join fields will significantly improve performance. From my own tests anything up to 250 times faster.

CJ_London · Sep 23, 2025

Why are you using like? And why not indexed? The use of the initial % will negate the use of an index anyway

BlueSpruce · Sep 23, 2025

ions said:
I did not index this field.

What are your recommendations?

Index that field. Matching is also expensive in queries. You can store all those WASTE_CODES in a lookup table. Convert all the string values to their corresponding integer Lookup ID values and query the ID value of K061. e.g. "K061" = integer ID 120.

tvanstiphout · Sep 23, 2025

isladogs said:
For SELECT queries, indexing the filter / join fields will significantly improve performance. From my own tests anything up to 250 times faster.

The performance gain is exponential, not linear.

But the OP said: "When I filter for a specific waste code"
And such code can be in the middle of the field value. Indexing won't help (it works from the left).

ions · Sep 23, 2025

tvanstiphout said:
The performance gain is exponential, not linear.

But the OP said: "When I filter for a specific waste code"
And such code can be in the middle of the field value. Indexing won't help (it works from the left).

Correct. The user will be filtering this at Runtime. I will be building the Where Clause with the criteria they provide.

I guess I will normalize unless any other options are presented. I was hoping it wouldn't have to come to this.

Thank you.

ions · Sep 23, 2025

isladogs said:
For SELECT queries, indexing the filter / join fields will significantly improve performance. From my own tests anything up to 250 times faster.

Yes the Join fields are both Indexed. I get good performance when I filter on any other field.

tvanstiphout · Sep 23, 2025

Another option would be "Full Text Index", but I very much doubt it will perform better. It was created to solve a different problem.
There *may* also be a way to run an XPath query, but again, hard to see how that would perform better.

Isaac · Sep 24, 2025

ions said:
Hello,

I have a table with 31 million records. So far most queries return within 2-4 seconds but there is one important query that is taking 14 seconds.

There is a field called FEDERAL_WASTE_CODES and 75% of the time it contains <= 3 waste codes (D001, F002, F003) and 95% of the time it contains <= 10 waste codes (D004, D005, D006, D007, D008, D009, D010, D011).

However, there are times it contains 40+ Waste Codes (see end of post). When I filter for a specific waste code (LIKE '%K061%') the query becomes too slow. I did not index this field.

What are your recommendations? Thank you!

This query takes ~14 seconds when done in C#

Code:

return await dbContext.ManifestLines .Where(x => x.SHIPPED_DATE.Year == 2024 && x.GENERATOR_LOCATION_STATE == "NY" && x.FEDERAL_WASTE_CODES != null && x.FEDERAL_WASTE_CODES.Contains("K061")) .ToListAsync();

D001, D004, D005, D006, D007, D008, D009, D010, D011, D012, D013, D014, D015, D016, D017, D018, D019, D020, D021, D022, D023, D024, D025, D026, D027, D028, D029, D030, D031, D032, D033, D034, D035, D036, D037, D038, D039, D040, D041, D042, D043, F007, F008, F009, F010, F011, F012, F019, F024, F025, F037, F038, F039, K001, K002, K003, K004, K005, K006, K007, K008, K009, K010, K011, K013, K014, K015, K016, K017, K018, K019, K020, K021, K022, K023, K024, K028, K029, K030, K031, K032, K033, K034, K035, K036, K037, K038, K040, K041, K042, K043, K044, K045, K046, K047, K048, K049, K050, K051, K052, K060, K061, K062, K069, K071, K073, K083, K084, K085, K086, K087, K088, K093, K094, K095, K096, K097, K098, K099, K100, K101, K102, K103, K104, K105, K106, K111, K115, K117, K118, K131, K132, K136, K141, K142, K143, K144, K145, K147, K148, K149, K150, K151, K156, K157, K158, K159, K161, P004, P006, P009, P010, P011, P012, P013, P020, P021, P022, P024, P029, P030, P031, P033, P034, P036, P037, P038, P039, P047, P048, P050, P051, P056, P059, P060, P063, P065, P068, P071, P073, P074, P077, P081, P082, P089, P092, P094, P096, P097, P098, P099, P101, P103, P104, P105, P106, P110, P112, P113, P114, P115, P119, P120, P121, P122, P123, P127, P128, P185, P188, P189, P190, P191, P192, P194, P196, P197, P198, P199, P201, P202, P203, P204, P205, U005, U009, U012, U018, U019, U022, U023, U024, U025, U027, U028, U029, U030, U031, U032, U036, U037, U039, U043, U044, U045, U047, U048, U050, U051, U052, U057, U060, U061, U063, U066, U067, U068, U069, U070, U071, U072, U075, U076, U077, U078, U079, U080, U081, U082, U083, U084, U086, U088, U096, U098, U099, U101, U102, U103, U105, U106, U107, U108, U109, U111, U112, U115, U117, U118, U120, U121, U124, U127, U128, U129, U130, U131, U133, U134, U135, U136, U137, U138, U140, U141, U142, U144, U145, U146, U151, U152, U154, U155, U157, U158, U159, U160, U161, U162, U165, U169, U170, U172, U174, U179, U180, U181, U183, U184, U185, U187, U188, U189, U190, U192, U196, U203, U204, U205, U207, U208, U209, U210, U211, U214, U215, U216, U217, U220, U225, U226, U227, U228, U235, U239, U240, U243, U246, U247, U249, U271, U278, U279, U280, U364, U367, U372, U373, U387, U389, U394, U395, U404, U409, U410, U411

Make a full text index and use CONTAINS. If you can afford the cost of the index broadly speaking, Contains can move lightning quick in certain cases where LIKE was slow as molasses.
Interestingly and annoyingly, (perhaps), Contains is a deep subject. You'd think it's a simple page of documentation, ok, give me 5 minutes I can digest this........No. Contains has many different ways it can be used and the sumtotal of the implications of what you can do is really staggeringly huge. I've only scratched the surface and used it in the simplest of ways, but it's proven its value to me many times in the right circumstances. Just by itself it's capabilities are equal to an entire suite of python (or whatever) libraries to mine text data

ions · Sep 24, 2025

Isaac said:
Make a full text index and use CONTAINS. If you can afford the cost of the index broadly speaking, Contains can move lightning quick in certain cases where LIKE was slow as molasses.
Interestingly and annoyingly, (perhaps), Contains is a deep subject. You'd think it's a simple page of documentation, ok, give me 5 minutes I can digest this........No. Contains has many different ways it can be used and the sumtotal of the implications of what you can do is really staggeringly huge. I've only scratched the surface and used it in the simplest of ways, but it's proven its value to me many times in the right circumstances. Just by itself it's capabilities are equal to an entire suite of python (or whatever) libraries to mine text data

Index a nvarchar(4000) field?

ions · Sep 24, 2025

I took ~20 minutes to run a simple Update Query? Is this normal for 31 million records? I've never worked with Tables over 100K records. New experience for me.

Minty · Sep 24, 2025

If the fields being updated are indexed then the index also needs updating, which when it involves millions of records can take a considerable amount of extra time.

When doing enormous updates it can be beneficial to drop, update, and then recreate the indexes.

Isaac · Sep 24, 2025

ions said:
Index a nvarchar(4000) field?

Yes. It's called a full text index and it's meant especially for your situation.

In fact your size is kind of small, people usually have Max length when they've used one but you can certainly use one at 4,000.

GPGeorge · Sep 24, 2025

I thought about suggesting a full text index on this field, but like Tom said, I'm not convinced it would be all that helpful in this scenario because it's intended for a different kind of problem. It couldn't hurt to try, if only for your own learning.

Isaac · Sep 24, 2025

Not sure I agree with that. It's intended exactly for this situation, to be able to effectively and efficiently search text beyond what like can efficiently do

ions · Sep 24, 2025

Minty said:
If the fields being updated are indexed then the index also needs updating, which when it involves millions of records can take a considerable amount of extra time.

When doing enormous updates it can be beneficial to drop, update, and then recreate the indexes.

The field wasn't being indexed and it still took 20 minutes.

I learned about ALTER DATABASE UMA SET RECOVERY BULK_LOGGED;

I am about to Normalize the Federal Waste Codes. I hope this takes under 3 hours.

-- 1. Switch to BULK_LOGGED recovery
ALTER DATABASE UMA SET RECOVERY BULK_LOGGED;

-- 2. Run the normalization stored procedure
EXEC dbo.NormalizeManifestDetailsFWC_Batched;

-- 3. Switch back to FULL recovery
ALTER DATABASE UMA SET RECOVERY FULL;

Code:

USE [UMA]
GO
/****** Object:  StoredProcedure [dbo].[NormalizeManifestDetailsFWC_Batched]    Script Date: 2025/09/24 9:53:04 AM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER   PROCEDURE [dbo].[NormalizeManifestDetailsFWC_Batched]
AS
BEGIN
    SET NOCOUNT ON;

    DECLARE @BatchSize INT = 250000;
    DECLARE @MinID INT, @MaxID INT, @UpperBound INT;

    -- Drop the unique index if it exists
    IF EXISTS (
        SELECT 1
        FROM sys.indexes
        WHERE name = 'IX_tblManifestDetailsFWC_NaturalKey'
          AND object_id = OBJECT_ID('dbo.tblManifestDetailsFWC')
    )
    BEGIN
        DROP INDEX IX_tblManifestDetailsFWC_NaturalKey ON dbo.tblManifestDetailsFWC;
        PRINT 'Dropped index IX_tblManifestDetailsFWC_NaturalKey';
    END

    -- Start fresh
    TRUNCATE TABLE dbo.tblManifestDetailsFWC;

    -- Find ID bounds
    SELECT
        @MinID = MIN(ID),
        @MaxID = MAX(ID)
    FROM dbo.tblManifestDetails
    WHERE FEDERAL_WASTE_CODES IS NOT NULL;

    SET @UpperBound = @MinID + @BatchSize - 1;

    WHILE @MinID <= @MaxID
    BEGIN
        ;WITH Batch AS (
            SELECT ID, FEDERAL_WASTE_CODES
            FROM dbo.tblManifestDetails
            WHERE ID BETWEEN @MinID AND @UpperBound
              AND FEDERAL_WASTE_CODES IS NOT NULL
        ),
        Tokens AS (
            SELECT
                d.ID AS ManifestLineID,
                UPPER(LTRIM(RTRIM(s.value))) AS FederalWasteCode
            FROM Batch d
            CROSS APPLY STRING_SPLIT(d.FEDERAL_WASTE_CODES, ',') s
        )
        INSERT INTO dbo.tblManifestDetailsFWC (ManifestLineID, FederalWasteCode)
        SELECT DISTINCT ManifestLineID, FederalWasteCode
        FROM Tokens
        WHERE FederalWasteCode <> '';

        PRINT CONCAT('Processed IDs ', @MinID, ' to ', @UpperBound);

        -- Advance to next batch
        SET @MinID = @UpperBound + 1;
        SET @UpperBound = @MinID + @BatchSize - 1;
    END

    -- Recreate the unique index
    CREATE UNIQUE NONCLUSTERED INDEX IX_tblManifestDetailsFWC_NaturalKey
        ON dbo.tblManifestDetailsFWC (ManifestLineID ASC, FederalWasteCode ASC);

    PRINT 'Recreated index IX_tblManifestDetailsFWC_NaturalKey';
END

ions · Sep 24, 2025

Isaac said:
Not sure I agree with that. It's intended exactly for this situation, to be able to effectively and efficiently search text beyond what like can efficiently do

I am going assess the results of the normalization but I will check this out. I encounter Codes that are "," separated in a field often.

Thank you for providing an alternative suggestion. This is why I made the original post... to get different approaches on how to solve this.

ions · Sep 24, 2025

Good News!

Normalization took under 18 minutes. I believe RECOVERY BULK_LOGGED made the difference in speed.

The normalized table has 100,158,139 records.

The below LINQ C# query took under 4 seconds which is decent.

Thank you for everyone's advice!

return await dbContext.ManifestLines
.Where(x => x.SHIPPED_DATE.Year == 2024
&& x.GENERATOR_LOCATION_STATE == "NY"
&& x.FederalWasteCode == "K061")
.ToListAsync();

tvanstiphout · Sep 24, 2025

> -- Recreate the unique index<br>
> CREATE UNIQUE NONCLUSTERED INDEX IX_tblManifestDetailsFWC_NaturalKey<br>
> ON dbo.tblManifestDetailsFWC (ManifestLineID ASC, FederalWasteCode ASC);

What I don't see here is an index on FederalWasteCode, so your lookup "where FederalWasteCode = 'K061') is not using this index. You can prove that by running the query in SSMS (bypassing for a moment any smarts that LINQ might add) and turning on the Execution Plan.

Mind you, an index on ManifestLineID is important for joining with the parent table, but you need an additional index.

Advice on how to improve query efficiency when Filtering on nvarchar(4000)

Access User

Active member

Access MVP / VIP

Super Moderator

Active member

Active member

Access User

Access User

Active member

Lifelong Learner

Access User

Access User

AWF VIP

Lifelong Learner

George Hepworth

Lifelong Learner

Access User

Access User

Access User

Active member

Users who are viewing this thread