Class WebContentSourceService
- Assembly
- FoundationaLLM.Vectorization.Engine.dll
Extracts text from a web page.
public class WebContentSourceService : IContentSourceService
- Inheritance
-
WebContentSourceService
- Implements
- Inherited Members
- Extension Methods
Constructors
WebContentSourceService(ILoggerFactory)
Creates a new instance of the vectorization content source service that scrapes web pages.
public WebContentSourceService(ILoggerFactory loggerFactory)
Parameters
loggerFactory
ILoggerFactoryLogger factory that generates loggers for the class.
Methods
ExtractTextAsync(ContentIdentifier, UnifiedUserIdentity, CancellationToken)
Reads the content of a data source item.
public Task<string> ExtractTextAsync(ContentIdentifier contentId, UnifiedUserIdentity userIdentity, CancellationToken cancellationToken)
Parameters
contentId
ContentIdentifierThe ContentIdentifier providing the unique identifier of the item being read.
userIdentity
UnifiedUserIdentityThe UnifiedUserIdentity providing information about the calling user identity.
cancellationToken
CancellationTokenThe cancellation token that signals that operations should be cancelled.
Returns
Remarks
Expected MultipartId: contentId[0] = Protocol: either http or https contentId[1] = the web URL without the protocol contentId[2] = CSS classes to filter by