Normal view MARC view ISBD view

Yüksek boyutlu vektörlerin dagıtık olarak yüksek performans ile aranması / (Record no. 200467603)

MARC details
000 -LEADER
fixed length control field	06580nam a2200409 i 4500
001 - CONTROL NUMBER
control field	200467603
003 - CONTROL NUMBER IDENTIFIER
control field	TR-AnTOB
005 - DATE AND TIME OF LATEST TRANSACTION
control field	20260313090342.0
007 - PHYSICAL DESCRIPTION FIXED FIELD--GENERAL INFORMATION
fixed length control field	ta
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	171111s2026 tu ab e mmmm 000 0 tur d
035 ## - SYSTEM CONTROL NUMBER
System control number	(TR-AnTOB)200467603
040 ## - CATALOGING SOURCE
Original cataloging agency	TR-AnTOB
Language of cataloging	eng
Description conventions	rda
Transcribing agency	TR-AnTOB
041 0# - LANGUAGE CODE
Language code of text/sound track or separate title	Türkçe
099 ## - LOCAL FREE-TEXT CALL NUMBER (OCLC)
Classification number	TEZ TOBB FBE BİL YL’26 KOÇ
100 1# - MAIN ENTRY--PERSONAL NAME
Personal name	Koç, Aykut Alparslan
Relator term	author
9 (RLIN)	152913
245 10 - TITLE STATEMENT
Title	Yüksek boyutlu vektörlerin dagıtık olarak yüksek performans ile aranması /
Statement of responsibility, etc.	Aykut Alparslan Koç ; thesis advisor Mehmet Burak Akgün.
246 11 - VARYING FORM OF TITLE
Title proper/short title	High-performance distributed search of high dimensional vectors
264 #1 - PRODUCTION, PUBLICATION, DISTRIBUTION, MANUFACTURE, AND COPYRIGHT NOTICE
Place of production, publication, distribution, manufacture	Ankara :
Name of producer, publisher, distributor, manufacturer	TOBB ETÜ Fen Bilimleri Enstitüsü,
Date of production, publication, distribution, manufacture, or copyright notice	2026.
300 ## - PHYSICAL DESCRIPTION
Extent	xvii, 71 pages :
Other physical details	illustrations ;
Dimensions	29 cm
336 ## - CONTENT TYPE
Content type term	text
Content type code	txt
Source	rdacontent
337 ## - MEDIA TYPE
Media type term	unmediated
Media type code	n
Source	rdamedia
338 ## - CARRIER TYPE
Carrier type term	volume
Carrier type code	nc
Source	rdacarrier
502 ## - DISSERTATION NOTE
Dissertation note	Tez (Yüksek Lisans)--TOBB ETÜ Sosyal Bilimler Enstitüsü Şubat 2026.
520 ## - SUMMARY, ETC.
Summary, etc.	Benzerlik araması ya da k-en yakın komşu araması veri tabanlarının çözmesi gereken önemli bir problem olarak karşımıza çıkmaktadır. Örüntü tanıma, semantik arama gibi çok çeşitli kullanım alanlarının olması problemi önemli kılan temel sebeptir. Bugün bu problemin çözümü için çok sayıda açık kaynaklı kütüphane ve vektör veri tabanı olarak tasarlanmış veri tabanı sistemi bulunmaktadır. Yaygın olarak kullanılan veri tabanı sistemlerine ise benzerlik araması yeteneğinin kazandırıldığı görülmeye başlanmıştır. Araştırmalar ilk başlarda problemi kesin bir doğrulukla çözmeye odaklanmış iken önerilen yöntemlerin 10-15 boyutlu vektörler söz konusu olduğunda dahi efektif olarak doğrusal bir aramaya eşdeğer olması sonraki araştırmaları problemi yaklaşık olarak çözmeye itmiştir. Nitekim bugün gösterim öğrenimi için kullanılan birçok yöntem 384 ya da 512 gibi çok yüksek boyutlu vektörler üretmektedir. Bu durum boyut sayısının laneti olarak da bilinen olgu sebebiyledir ve benzerlik araması yöntemleri bu olgudan etkilenirler. Önerilen yöntemler ağaç tabanlı, özet tabanlı, vektör nicelemesi tabanlı, ters dizin, ya da çizge tabanlıdır. Ağaç tabanlı yöntemler özellik uzayını ya da veriyi bölerler. Özet tabanlı yöntemler birden fazla özet fonksiyonu kullanarak birbirine yakın olan verilerin aynı kovalara düşmesini hedefler. Vektörler nicelendiğinde veri seti ana belleğe daha kolay sığmakta ve yapılacak işlem sayısı düştüğünden performans artışı sağlanmaktadır. Kümeleme yapıldığında sadece sorgulanacak vektörün yakınına düşen kümelerin aranması gerekir. Çizge tabanlı yöntemler ise Delaunay çizgesi ve görece komşuluk çizgesi gibi yapıların tahmin edilmesi ile oluşturulan çizgelerin üzerinde yapılan arama ile yaklaşık en yakın komşuların bulunmasını içerir. Veri setlerinin göreceli olarak büyük oluşu bugün bu yöntemlerin birlikte nasıl kullanılması ve veri ile sorguların hesaplama düğümlerine nasıl dağıtılması gerektiği sorularını ortaya çıkarmıştır. Problemin dağıtık olarak çözülmesi için çeşitli yöntemler ve sistemler önerilmiş olsa da veri kümelerinin ve sorguların hesap düğümlerine dağıtılmasının daha efektif yapılabileceği görülmüştür. Bu tez hangi yöntemlerin dağıtık çalışmaya daha uygun olduğunu ve hangi yöntemlerin birlikte kullanılabileceğini incelendikten sonra literatürde var olan yöntemlerden yola çıkarak yeni bir bölümleme ve sorgu yönlendirme stratejisi önerir ve bu yöntemin dağıtık sistemler için daha efektif çalışabileceğini gösterir.

Summary, etc.	Today, we are faced with similarity search or k-nearest neighbors search as an important problem that needs to be solved by databases. Having diverse applications such as pattern recognition and semantic search is what makes the problem important. There are many open-source libraries and database systems that are designed as a vector database today. Widely used database systems are observed to have added similarity search capabilities. Even though research had focused on solving the exact version of the problem at first, the fact that proposed methods effectively were equalivent to a linear search even with cases of 10-or-15-dimensional vectors directed later research at solving the approximate version of the problem. As a matter of fact many methods used for representation learning today produce vectors with dimensionality as high as 384 or 512. This is a result of a phenomenon known as the curse of dimensionality and similarity search methods are affected by this phenomenon. Proposed methods are tree-based, hash-based, quantization-based, inverted index or graph based. Tree-based algorithms divide the feature space or the data. Hash-based methods aim for the data points that are closer together to be in the same buckets makin use of multiple hash functions. When vectors are quantized, dataset fits more easily into the main memory and reduced calculations result in an increase in performance. When the vectors are clustered, only the clusters that are located closer to the query vector needs to be searched. Graph-based methods include finding nearest neighbors by searching graphs constructed by the approximation of structures such as Delaunay graph and relative neighborhood graph. The relatively large size of the datasets raises questions about how these methods should be used together and how the data and the queries should be spread across the nodes. Even though several methods and systems were proposed for solving the problem in a distributed manner it was observed that datasets and queries can be spread across the nodes more effectively. This thesis proposes a new partitioning and query routing strategy by building on existing methods, after analyzing which methods are more suitable for distributed execution and which can be used together and demonstrates that the proposed method can perform more effectively in distributed systems.
650 #7 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element	Tezler, Akademik
Source of heading or term	etuturkob
9 (RLIN)	32546
653 ## - INDEX TERM--UNCONTROLLED
Uncontrolled term	Benzerlik araması

Uncontrolled term	Dağıtık sistemler

Uncontrolled term	En yakın komşular

Uncontrolled term	Similarity search

Uncontrolled term	Distributed sytems

Uncontrolled term	Nearest neighbors
700 ## - ADDED ENTRY--PERSONAL NAME
Personal name	Akgün, Mehmet Burak
Relator term	advisor
9 (RLIN)	73312
710 2# - ADDED ENTRY--CORPORATE NAME
Corporate name or jurisdiction name as entry element	TOBB Ekonomi ve Teknoloji Üniversitesi.
Subordinate unit	Fen Bilimleri Enstitüsü
9 (RLIN)	95247
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Koha item type	Thesis
Source of classification or shelving scheme	Other/Generic Classification Scheme

Holdings
Withdrawn status	Lost status	Source of classification or shelving scheme	Damaged status	Not for loan	Collection code	Home library	Current library	Shelving location	Date acquired	Source of acquisition	Total Checkouts	Full call number	Barcode	Date last seen	Copy number	Date shelved	Koha item type
		Other/Generic Classification Scheme	Yeni / New	Ödünç Verilemez-Tez / Not For Loan-Thesis	Tezler	Merkez Kütüphane	Merkez Kütüphane	Tez Koleksiyonu / Thesis Collection	13/03/2026	Bağış / Donation		TEZ TOBB FBE BİL YL’26 KOÇ	TZ01910	13/03/2026	1	13/03/2026	Thesis