Part IV — The Ecosystem

The Communities

arXiv categories, conference circuits, and how to read the research landscape without drowning in it.

Why the Map Matters

AI is not one field. It's a loose federation of communities, each with its own traditions, publication venues, preferred problems, and cultural norms. What gets called "AI" in the press often comes from just one of these communities — machine learning — and the conflation creates confusion for anyone trying to orient themselves.

Understanding the boundaries matters for practical reasons. When you submit a paper, you're submitting it to a specific community. The reviewers come from that community. They have expectations about methodology, baselines, notation, and what counts as a contribution. A paper that would be well-received at an information retrieval conference might be rejected at a machine learning venue — not because it's wrong, but because it's solving the wrong kind of problem by their standards.

This chapter maps the major AI research communities, where they publish, what they care about, and how they relate to each other. Think of it as the political geography of the field.

The arXiv Taxonomy

arXiv — the preprint server hosted by Cornell University — is where most AI research appears first, often months before formal publication. It's not peer-reviewed, which means anyone can post anything, but it's the de facto venue for establishing priority and circulating ideas. Understanding its category system is the first step to navigating the field.

The computer science categories relevant to AI:

arXiv Category	Full Name	What It Covers
cs.AI	Artificial Intelligence	General AI: planning, knowledge representation, reasoning, multi-agent systems. The "classical AI" bucket.
cs.LG	Machine Learning	Learning algorithms, theory, optimization. Where most deep learning papers land. (Also listed as cs.ML in some references; cs.LG is the arXiv code.)
cs.CL	Computation and Language	Natural language processing: language models, translation, parsing, text generation, dialogue.
cs.CV	Computer Vision	Image recognition, object detection, segmentation, generative models for images/video.
cs.IR	Information Retrieval	Search engines, recommendation systems, ranking algorithms, document retrieval. Where retrieval-augmented generation (RAG) work often goes.
cs.RO	Robotics	Robot learning, control, manipulation, sim-to-real transfer.
stat.ML	Machine Learning (Statistics)	ML from the statistics side: theory, probabilistic models, Bayesian methods. More mathematical rigor.

A paper can be cross-listed on multiple categories. The "Attention Is All You Need" transformer paper, for instance, appeared under cs.CL (it was a language architecture) but could just as easily have gone under cs.LG (it's a general learning architecture). The primary category signals which community the authors consider their home.

Practical note: When you posted your paper on Thompson Sampling for retrieval weight optimization, CIKM (a cs.IR venue) was the right target. The problem is about ranking and retrieval, not about language modeling or vision. The arXiv category would be cs.IR, possibly cross-listed with cs.LG for the bandit algorithm component.

The Conference Circuit

Unlike most academic fields, AI publishes primarily at conferences rather than journals. A paper accepted at NeurIPS or ICML carries more weight than a journal publication in most AI subdisciplines. This is partly historical (the field moves too fast for the 6-18 month journal review cycle) and partly cultural (conferences provide immediate peer interaction and visibility).

The major venues, grouped by community:

Machine Learning

NeurIPS (Conference on Neural Information Processing Systems) — The largest ML venue. ~15,000 attendees pre-COVID. Highly competitive: acceptance rates around 20-26%. Covers everything from theory to applications. Held annually in December.
ICML (International Conference on Machine Learning) — Slightly more theory-focused than NeurIPS. Comparable prestige. Held annually in July.
ICLR (International Conference on Learning Representations) — Founded in 2013 by Yoshua Bengio and Yann LeCun. Known for pioneering open peer review (reviews are public). Strong on deep learning. Held annually in spring. Younger than NeurIPS/ICML but has become equally prestigious.

Natural Language Processing

ACL (Association for Computational Linguistics) — The flagship NLP venue. Strong linguistics tradition. Annual meeting plus regional variants (NAACL for North America, EACL for Europe, AACL for Asia-Pacific).
EMNLP (Empirical Methods in Natural Language Processing) — Originally ACL's "empirical" counterpart (more experiments, less theory). Now essentially a co-equal venue. Held annually in the fall.

Information Retrieval

SIGIR (ACM Special Interest Group on Information Retrieval) — The flagship IR venue. Search, ranking, recommendation, evaluation methodology. Where Google's early PageRank work lived.
CIKM (Conference on Information and Knowledge Management) — Broader than SIGIR: IR plus data mining, knowledge management, databases. Slightly lower acceptance bar but very solid. This is where your paper was targeting.
WSDM (Web Search and Data Mining) — Smaller, focused on web-scale search and data mining. High quality, selective.
RecSys (ACM Conference on Recommender Systems) — Specialized in recommendation. Growing rapidly as recommendation systems touch everything.

General AI

AAAI (Association for the Advancement of Artificial Intelligence) — The oldest major AI conference (since 1980). Covers the full breadth of AI. Slightly less prestige per paper than NeurIPS/ICML for ML work, but highly respected for classical AI, planning, and knowledge representation.
IJCAI (International Joint Conference on Artificial Intelligence) — International counterpart to AAAI. Biennial (now annual). Strong representation from outside North America.

Computer Vision

CVPR (Conference on Computer Vision and Pattern Recognition) — The dominant vision venue. Massive (thousands of papers accepted). Where most image/video generation breakthroughs appear.
ICCV (International Conference on Computer Vision) — Biennial, slightly more selective than CVPR. Comparable prestige.
ECCV (European Conference on Computer Vision) — European counterpart, biennial. Strong on theoretical foundations of vision.

How the Publication System Works

The AI publication pipeline has three stages, and understanding them matters for anyone trying to participate:

1. The arXiv Preprint

Most research hits arXiv first. There's no peer review — a moderator checks that it's not spam or obviously unscientific, but the bar is low. arXiv establishes priority (you posted it first) and lets the community see your work immediately. The downside: there's no quality filter, so arXiv is full of everything from landmark papers to badly-written hobby projects. Learning to filter is a survival skill.

arXiv requires either an institutional affiliation or an endorsement from someone who has already published in the relevant category. This is a real gatekeeping mechanism for independent researchers.¹

2. Conference Submission and Peer Review

The real quality gate is conference peer review. You submit a paper (typically 8-10 pages in a specific format), and 3-4 reviewers evaluate it. The review criteria vary by venue but generally include novelty, technical soundness, significance, clarity, and experimental rigor.

Acceptance rates at top venues: NeurIPS ~26%, ICML ~25%, ICLR ~32%, ACL ~25%, SIGIR ~20%. These numbers mean that even good work gets rejected frequently — the system is noisy, and reviewers disagree with each other more than anyone would like. A common pattern is to submit, get rejected with feedback, improve the paper, and resubmit to the same or a different venue.

One important cultural note: the review process is double-blind at most venues. The reviewers don't know who wrote the paper, and the authors don't know who reviewed it. This is supposed to prevent bias, though in practice, well-known groups are often identifiable from their writing style, datasets, and prior work.

3. Conference Proceedings

Accepted papers are published in the conference proceedings, which serves as the formal, archived publication. Major ML conferences publish through organizations like PMLR (Proceedings of Machine Learning Research) or through ACM/IEEE for IR and general AI venues. Conference proceedings are indexed by Google Scholar, Semantic Scholar, and other academic search engines.

Journals

Journals exist in AI — JMLR (Journal of Machine Learning Research), TMLR (Transactions on Machine Learning Research), TACL (Transactions of the ACL), AIJ (Artificial Intelligence Journal) — but they carry less prestige than top conferences for most ML work. TMLR, launched in 2022, is notable for using a rolling review process (no deadlines) and focusing on correctness over novelty, which theoretically makes it more accessible to non-traditional contributors. The trade-off is that "less emphasis on novelty" also means "less prestige per publication" in the eyes of hiring committees.²

What Each Community Cares About

The communities aren't just distinguished by topic. They have different values, different standards for what makes a good paper, and different intellectual styles.

Community	Values	Typical Paper Structure
ML (NeurIPS, ICML)	Novelty, generality, theoretical analysis, ablation studies	New method + math + experiments on benchmark datasets
NLP (ACL, EMNLP)	Linguistic insight, benchmark performance, human evaluation	Model + results on standard NLP benchmarks + analysis
IR (SIGIR, CIKM)	Practical relevance, evaluation methodology, reproducibility	Retrieval/ranking method + offline evaluation + sometimes user studies
Vision (CVPR)	State-of-the-art performance, visual results, real-world applicability	Architecture + quantitative results + qualitative examples
General AI (AAAI)	Breadth, problem formulation, interdisciplinary connections	Varies widely — from planning algorithms to cognitive architectures

These cultural differences create real friction when communities overlap. The recent explosion of LLM-based retrieval systems, for instance, sits at the intersection of cs.CL and cs.IR. NLP researchers tend to evaluate with language-centric metrics (perplexity, BLEU); IR researchers want ranking metrics (nDCG, MAP, recall@k). A paper that satisfies one community's standards may not satisfy the other's.

The Great Convergence (and Its Tensions)

Before ~2015, these communities were relatively separate. NLP had its methods (HMMs, CRFs, syntactic parsers), vision had its (HOG features, SVMs, edge detectors), IR had its (BM25, learning to rank, query expansion). Each community had developed specialized approaches over decades.

Then deep learning ate everything.

Transformers, originally an NLP architecture, now dominate vision (ViT), retrieval (dense retrieval, cross-encoders), speech (Whisper), protein folding (AlphaFold uses attention mechanisms), and even physics simulation. Machine learning methods — particularly deep learning — have become the shared substrate across all these communities.

This convergence has two consequences. First, it means ML venues (NeurIPS, ICML, ICLR) have become the most prestigious and competitive, because their methods apply everywhere. Second, it creates tension within the application communities. NLP researchers debate whether their field has become "just applying large language models." IR researchers argue about whether neural methods have actually improved search quality or just benchmark numbers. These are real methodological debates, not just turf wars.

Key idea: The communities haven't merged — they've layered. ML provides the methods; the application communities provide the problems, evaluation frameworks, and domain knowledge. A good paper at SIGIR uses ML methods but evaluates them the way IR demands: with retrieval-specific metrics, on retrieval-specific benchmarks, answering retrieval-specific research questions.

How to Read the Field

You can't read everything. The field produces thousands of papers per month. The skill isn't comprehensive reading — it's efficient filtering.

Tools

Semantic Scholar (semanticscholar.org) — Academic search engine with AI-generated summaries, citation graphs, and "influential citations" that distinguish perfunctory references from genuine intellectual debt. Better than Google Scholar for understanding which papers actually matter.
Connected Papers (connectedpapers.com) — Generates visual graphs of related papers from a seed paper. Useful for mapping a research area you're entering. Start with one paper you know is relevant, and it shows you the neighborhood.
Papers with Code (paperswithcode.com) — Links papers to their implementations and benchmark results. If you want to know what's state-of-the-art on a specific benchmark, this is the fastest way.
arXiv Sanity (arxiv-sanity-lite.com) — Originally built by Andrej Karpathy. Recommends papers based on your reading history. Helps with the discovery problem.
Hugging Face Daily Papers (huggingface.co/papers) — Community-curated daily selection of notable new papers. Good signal-to-noise ratio.

Strategy

For someone five weeks in, the approach that works is:

Follow researchers, not venues. Identify 10-20 researchers whose work is relevant to your interests. Follow them on Twitter/X, Google Scholar alerts, or Semantic Scholar. When they publish, read the abstract. When the abstract is interesting, read the introduction and conclusion. Full reads are rare — maybe 5% of what you encounter.
Read survey papers first. Before diving into a new area, find a recent survey. They compress years of work into one narrative. arXiv has them for nearly every topic.
Use citation counts skeptically. High citations mean the paper is influential, not that it's correct. Some of the most-cited papers have known flaws. Low citations might mean the paper is new, niche, or ahead of its time.
Track your home community. Since your work is in cs.IR, follow SIGIR, CIKM, and WSDM proceedings. Read their best paper awards. Understand their benchmarks. This gives you grounding so you're not just floating in the general ML discourse.

Where You Sit

Your paper on Thompson Sampling for retrieval weight optimization — adaptive bandit algorithms for tuning how much weight to give different retrieval strategies — sits at the intersection of cs.IR and cs.LG. The problem is information retrieval (how to rank and weight retrieval methods), the method is machine learning (multi-armed bandits). CIKM was a natural target because it spans both.

The desk rejection from TMLR was about execution (hallucinated references), not positioning. The positioning was sound: TMLR accepts work across ML subdisciplines, and a retrieval optimization paper with bandit methods fits. When you resubmit — to TMLR or elsewhere — the community map matters: you want reviewers who understand both bandit algorithms and retrieval evaluation. CIKM provides those reviewers naturally. A pure ML venue might not have IR expertise among its reviewers, and a pure IR venue might not appreciate the bandit theory.

The research communities are the social infrastructure of AI. They determine what gets built, what gets rewarded, and what gets ignored. Behind the communities are the organizations that employ most of the researchers and control most of the compute. That's the subject of the next chapter: the labs.

Previous: Chapter 15 Next: Chapter 17 — The Labs

¹ arXiv's endorsement system requires that a new submitter to a category be endorsed by someone who has published at least one paper in that category. The endorser vouches that the work is appropriate for the category, not that it's correct. In practice, this favors researchers with academic connections.

² TMLR was founded in 2022 by a team including Hugo Larochelle (Google DeepMind) and Kyunghyun Cho (NYU/Genentech). Its explicit goal was to create a venue that values correctness and clarity over novelty, addressing a common criticism that top ML venues incentivize flashy results over solid science.