Table of Contents

Class WebContentSourceService

Namespace
FoundationaLLM.Vectorization.Services.ContentSources
Assembly
FoundationaLLM.Vectorization.Engine.dll

Extracts text from a web page.

public class WebContentSourceService : IContentSourceService
Inheritance
WebContentSourceService
Implements
Inherited Members
Extension Methods

Constructors

WebContentSourceService(ILoggerFactory)

Creates a new instance of the vectorization content source service that scrapes web pages.

public WebContentSourceService(ILoggerFactory loggerFactory)

Parameters

loggerFactory ILoggerFactory

Logger factory that generates loggers for the class.

Methods

ExtractTextAsync(ContentIdentifier, UnifiedUserIdentity, CancellationToken)

Reads the content of a data source item.

public Task<string> ExtractTextAsync(ContentIdentifier contentId, UnifiedUserIdentity userIdentity, CancellationToken cancellationToken)

Parameters

contentId ContentIdentifier

The ContentIdentifier providing the unique identifier of the item being read.

userIdentity UnifiedUserIdentity

The UnifiedUserIdentity providing information about the calling user identity.

cancellationToken CancellationToken

The cancellation token that signals that operations should be cancelled.

Returns

Task<string>

The string content of the item.

Remarks

Expected MultipartId: contentId[0] = Protocol: either http or https contentId[1] = the web URL without the protocol contentId[2] = CSS classes to filter by